Fpga based system design wayne wolf pdf


















This gives two cases: — rise time, pullup on; — fall time, pullup off. Ignores saturation region and mischaracterizes linear region, but results are acceptable. Height is fixed, so width determines current limit. Higher resistance increases metal migration, leading to destruction of wire. Different branches of tree can be set to different lengths to optimize delay.

A spanning tree has segments that go directly between sources and sinks. A Steiner point is an intermediate point for the creation of new branches. Each buffer is of size h. Sense amp. Lookup 1 table out configuration bits. Wiring channel. Configuration memory FPGA. Metal 2 antifuse. From LE. Response testbench. All variables used must be initialized. Uninitialized variables cause latches to be introduced: BAD.

Must take into account rules for driving large loads. Must speed up all equivalent paths to speed up circuit. Accepts 2n data inputs and n control signals, producing n data outputs. Must use two vectors. Two parts to testing: — controlling the inputs of possibly interior gates; — observing the outputs of possibly interior gates.

Redundancies can introduce delay faults and other problems. Transistive fanin determines a cone of logic. Less expensive then Boolean division. Estimate wire length without actually performing routing. Order in whch nets are routed determines quality fo result. Net ordering is a heuristic. Route in two passes: — Estimate congestion. ADR2 c12[7] ,. ADR3 row12[8] ,. O row13[7] ;.

I Tiopi 0. Y Tilo 0. X Tilo 0. Chip floorplan LEs. Moore machine. Internal connections External connections. Combinational Combinational. Q output is determined by the D input at the clocking event. Generic global wiring is implemented using segmented wiring channels. Routing tracks run across the entire chip both horizontally and verti- cally. Although most of the wires are segmented with segments of sev- eral different lengths, a few wires run the length of the chip.

The chip provides three types of global signals. Four routed clocks can drive the clock, clear, preset, or enable pin of an R-cell or any input of a C-cell. Example The ProASIC K provides local wires that allow the output of each tile to be directly connected to the eight adjacent tiles. The chip also provides very long lines that run the length interconnect of the chip.

The voltage is applied through the wires connected by the antifuse. The FPGA is architected so that all the anti- fuses are in the interconnect channels; this allows the wiring system to be used to address the antifuses for programming. The gates of the pass transistors are controlled by program- ming signals that select the appropriate row and column for the desired antifuse, as shown in Figure The programming voltage is applied across the row and column such that only the desired antifuse receives the voltage and is programmed [ElG98].

Because the antifuses are permanently programmed, an antifuse-based FPGA does not need to be configured when it is powered up. No pins need to be dedicated to configuration and no time is required to load the configuration. The pins on an FPGA must be programmable to accommodate the requirements of the configured logic.

A standard FPGA pin can be con- figured as either an input, output, or three-state pin. Pins may also provide other features. Registers are typically provided at the pads so that input or output values may be held. The slew rate of out- puts may be programmable to reduce electromagnetic interference; lower slew rates on output signals generate less energetic high-fre- quency harmonics that show up as electromagnetic interference EMI. Example The Spartan-II 2. The pins on the chip are divided into eight banks, with each bank sharing the reference voltage pins.

Pins within a bank must use standards that have the same VCCO. The IOB has three registers, one each for input, output, and three-state operation. These registers in the IOB can function either as flip-flops or latches. The programmable delay element on the input path is used to eliminate variations in hold times from pin to pin.

Propagation delays within the FPGA cause the IOB control signals to arrive at different times, causing the hold time for the pins to vary. The programmable delay element is matched to the internal clock propagation delay and, when enabled, eliminates skew-induced hold time variations.

The circuit monitors the output value and weakly drives it to the desired high or low value. The weak keeper is useful for pins that are connected to multiple drivers; it keeps the signal at its last valid state after all the drivers have disconnected. The size of a logic element determines how many can be put on a chip; the delay through a wire helps to determine the interconnection architecture of the fabric.

We will rely heavily on the results of Chapter 2 throughout this section. A CMOS gate needs to implement only one cho- sen logic function. The logic element of an FPGA, in contrast, must be able to implement a number of different functions.

Antifuse-based FPGAs program their logic elements by connecting var- ious signals, either constants or variables, to the inputs of the logic ele- ments. The logic element itself is not configured as a SRAM-based logic element would be. As a result, the logic element for an antifuse-based FPGA can be fairly small. Figure shows the schematic for a multi- plexer-based logic element used in early antifuse-based FPGAs.

Table shows how to program some functions into the logic element by connecting its inputs to constants or signal variables. The logic element can also be programmed as a dynamic latch. Example compares lookup tables and static gates in some detail. Lookup table vs.

The number of transistors in a static CMOS gate depend on both the number of inputs to the gate and the function to be implemented. In contrast, the SRAM cell in the lookup table requires eight transistors, including the configuration logic.

In addition, we need decoding circuitry for each bit in the lookup table. A straightforward decoder for the four-bit lookup table would be a multiplexer with 96 transistors, though smaller designs are possible. The delay of a static gate depends not only on the number of inputs and the function to be implemented, but also on the sizes of transistors used.

By changing the sizes of transistors, we can change the delay through the gate. The slowest gate uses the smallest transistors. The delay of a lookup table is independent of the function implemented and dominated by the delay through the SRAM addressing logic. The power consumption of a CMOS static gate is, ignoring leakage, dependent on the capacitance connected to its output. The CMOS gate consumes no energy while the inputs are stable once again, ignoring leakage.

The SRAM, in contrast, consumes power even when its inputs do not change. The stored charge in the SRAM cell dissipates slowly in a mechanism independent of transistor leakage ; that charge must be replaced by the cross-coupled inverters in the SRAM cell.

As we can see, the lookup table logic element is considerably more expensive than a static CMOS gate. Because the logic element is so complex, its design requires careful attention to circuit characteristics. The lookup table for an SRAM-based logic element incorporates both the memory and the configuration circuit for that memory.

SRAMs for LEs There are two possible organizations for the lookup table as shown in Figure a demultiplexer that causes one bit to drive the output or a multiplexer that selects the proper bit. These organizations are logically equivalent but have different implications for circuitry. The demultiplexer selects a row to be addressed, and the shared bit lines are used to read or write the memory cells in that row.

The shared bit line is very efficient in large memories but less so in small memories like those used in logic elements. Most FPGA logic elements use a mul- tiplexer to select the desired bit. SRAM multiplexer Should that multiplexer be made of static gates or pass transistors? The design alternatives for the case of a two-input multiplexer are shown in Figure But as the number of series pass transistors grows the delay from the data input to the data output grows considerably. The delay through a series of pass transistors, in fact, grows as the square of the number of pass transistors in the chain, for reasons similar to that given by Elmore.

The choice between static gates and pass transistors therefore depends on the size of the lookup table. The next example compares the delay through static gate and pass transistor multiplexers. Example We want to build a b-input multiplexer that selects one of the b possible input bits.

We will call the data input bits i0, etc. In our drawings we will show four-input multiplexers; these are multiplexer smaller than the multiplexers we want to use for lookup tables but they circuits are large enough to show the form of the multiplexer. Circuit Design of FPGA Fabrics Here is a four-input mux built from NAND gates: i0 s0' s1' i1 s0 s1' i2 s0' s1 i3 s0 s1 s0 s0' s1 s1' This multiplexer uses two levels of logic plus some inverters that form a third level of logic.

Each of the NAND gates in the first level of logic have as inputs one of the data bits and true or complement forms of all the select bits. The inverters are used to generate the complement forms of the select bits. Each NAND gate in the first level determines whether to propagate the input data bit for which it is responsible; the second- level NAND sums the partial results to create a final output.

We can analyze the delay through a b-bit multiplexer as a function of b using logical effort [Sut99]. The delay through the inverters on the select bits is proportional to 1. This means that the total delay through the b-bit static gate multiplexer grows as blgb. The gates must be driven by decoded address signals generated from the select bits. This circuit is not good for large multiplexers because it combines the worst aspects of static gate and pass transistor circuits.

Circuit Design of FPGA Fabrics A better form of circuit built from pass transistors is a tree: s0' i0 s0 s1' i1 s0' s1 i2 s0 i3 The gates of these pass transistors are driven directly by select bits or the complements of the select bits generated by inverters , eliminating the decoder NAND gates. However, because the pass transistors can be roughly modeled as resistors, the delay through a chain of pass transis- tors is proportional to the square of the number of switches on the path.

We analyzed delay through RC chains in Section 2. The tree for a b-input multiplexer has lg b levels of logic, so the delay through the tree is proportional to lg b2. One question that can be asked is whether transmission gates built from parallel n-type and p-type transistors are superior to pass transistors that is, single n-type transistors.

While transmission gates are more egalitarian in the way they propagate logic 0 and 1 signals, their layouts are also significantly larger. Chow et al.

It is possible to build a mux from a combination of pass transistors and static gates, using switches for some of the select stages and static gates for the remaining stages.

Once the characteristics of the interconnect are known, sizing the output transistors is straightforward. However, because the LE may need to drive a long wire that also includes logic used to make program- mable connections, the output buffer must be powerful and large. A typical FPGAs has short wires, general-purpose wires, global interconnect, and specialized clock distribution networks. The reason that FPGAs need different types of wires is that wires can introduce a lot of delay, and wiring networks of different length and connectivity need different circuit designs.

We saw some uses for different types of interconnect when we studied exist- ing FPGAs, but the rationale for building several types of interconnect becomes much clearer when we study the circuit design of programma- ble interconnect. We saw that a relatively short wire—a wire that is much shorter than the size of the chip—has a delay equal to the delay through a logic gate. Since many connections on FPGAs are long, thanks to the relatively large size of a logic element, we must take care to design circuits that minimize wire delay.

A long wire that goes from one point to another needs a chain of buffers to minimize the delay through the wire. Now we must apply that general knowledge to the interconnect circuits in FPGAs. Figure shows the general form of a path between two logic ele- ments. A signal leaves a logic element, goes through a buffer, enters the routing channel through a programmable interconnect block, passes through several more programmable interconnect blocks, then passes through a final programmable interconnect block to enter the destination LE.

We have studied the circuit design of logic elements; now we need to consider the circuits for programmable interconnect blocks. Brown et al. However, interconnection circuit the design of a programmable interconnection point for an SRAM-based or flash-based FPGA requires more care because the circuitry can intro- duce significant delay as well as cost a significant amount of area.

The circuit design of a pass-transistor-based programmable interconnection point is shown in Figure If we use pass transistors at the program- mable interconnection points, we have two parameters we can use to minimize the delay through the wire segment: the width of the pass tran- sistor and the width of the wire.

As we saw in Section 2. The increased current through the transistor reduces its effective resistance, but at the cost of a larger transistor. Similarly, we can increase the width of a wire to reduce its resistance, but at the cost of both increased capac- itance and a larger wire.

Rather than uniformly change the width of the wire, we can also taper the wire as described in Section 2. We can also ask ourselves whether we should use three-state buffers rather than pass transistors at the programmable interconnection points. The use of three-state buffers in programmable interconnect is illus- trated in Figure The three-state buffer is larger than a pass transis- tor but it provides amplification that the pass transistor does not.

They use the product of area and wire delay as a metric for the cost-effectiveness of a given circuit design. Figure compares the product of switch area and wire delay as a function of the width of the pass transistor at a pro- grammable interconnection point. The plot shows curves for wires of different lengths, with the length of a wire being measured in multiples of the size of a logic element. Figure shows the area-delay curve for a three-state buffer. Uniformly increasing the wire width has little effect because the wire capacitance is much larger, swamping the effects of reduced resistance.

Figure shows how delay through the wire var- ies as the driving buffer and routing switch size change; these curves were generated for a 0. Each curve shows the delay for a given size of driver.

The curves are U shaped—as the routing switch increases in size, delay first decreases and then increases. The initial drop in delay is due to decreasing resistance in the switch; the ultimate increase in delay happens when the increases in capacitance overwhelm the improvements obtained from lower resistance. Architecture of FPGA Fabrics shows that there is a best size for the pass transistor routing switch for any given choice of driver size. A clock network is particularly difficult because it must go to many different places.

As illustrated in Figure 3- 22, clock signals are often distributed by trees of drivers, with larger transistors on the drivers near the clock source and smaller transistors on the drivers close to the flip-flops and latches. This structure presents a much larger capacitive load than does a point-to-point wire. Buffers must be distributed throughout the clock tree in order to minimize delay. Logic elements and inter- connect are, to some extent, mutually exclusive, since we have only a limited amount of area on chip.

Wires do exist on several levels but the transistors for the interconnection points and amplifiers take up area that could be devoted to logic elements.

How many inputs should it have? Should it provide dedicated logic for addition or other functions? Do we need global interconnect, local interconnect, and other types? How much of each type do we need? Longer segments pro- vide shorter delay but less routing flexibility.

Interconnect may be distributed uniformly or in various patterns. An FPGA fabric is different from a custom chip in that it is intended to be used for many different logic designs. As a result, the standard by which the fabric should be judged is the quality of implementation of a typical set of logic designs. Compa- nies that design FPGAs usually do not use them to build systems, so they may collect benchmarks from customers or from public sources.

Book description. Digital designs once built in custom silicon are increasingly implemented in field programmable gate arrays FPGAs. By: Wolf, Wayne. Material type: BookSeries: Prentice Hall modern semiconductor design series. Wolf introduces powerful new IP-based design techniques at all three levels: gates, subsystems, and architecture. FPGA-based system design. Electronic design and test engineers of today have to deal with these complex and. Wayne Wolf, Princeton University.



0コメント

  • 1000 / 1000