CS-534: Packet Switch Architecture

Exercise Set 6: Queueing Architectures and their Cost

Assigned: 2000-03-28 (week 8) -- Due: 2000-04-04 (week 9)

6.1 Single-Chip Switch Buffer Memory Cost

Consider a 64x64x640 Mbps single-chip switch in 0.35-micron CMOS technology that uses on-chip SRAM blocks like those of section 3.1 to implement its buffer memory(ies). The 64 input and 64 output switch ports operate at 640 Mbps each, and, inside the chip, they have the form of 8-bit paths each, operating at 80 MHz; consider that the entire switch operates with this 80 MHz clock. All buffer memory capacities will be powers of two, and multiples of a basic 64-Byte segment. You are to study the buffer memory cost (area and power consumption) of the various queueing architectures for this switch chip. Make a comparison table similar to the one of section 4.5.

The rows of the table will refer to the following queueing architectures: (a) crosspoint queueing; (b) 8x8-block-crosspoint queueing; (c) output queueing; (e) single shared buffer; (f) input queueing (plain or advanced); (g) internal speed-up s=3: (g1) input side, (g2) output side, (g3) total for (g).

The columns of the table will refer to the following cost metrics: (1) number of buffer memories; (2) throughput of each buffer memory; (3) total throughput of all buffer memories; (4) capacity of each buffer memory, in 64-Byte segments; (5) total capacity of all buffer memories, in segments; (6) number of SRAM blocks per buffer memory; (7) organization of each SRAM block, in (words)x(bits); (8) area of each SRAM block, in mm2; (9) total area of all buffer memories, in mm2; (10) power consumption (worst case) of each SRAM block; (11) total power consumption of all buffer memories.

The capacity of the buffer memories will be as follows. Start with a basic capacity of four (4) 64-Byte segments per buffer memory in the crosspoint queueing architecture. Every time 8 buffer memories are placed together, we get a savings of 50% due their total capacity being shared. Thus, when 8 of the basic buffers of crosspoint queueing are placed together, their total capacity will become 16 segments; when 64 of the basic buffers of crosspoint queueing are placed together, their total capacity will be 64 segments; and so on. For the internal speed-up architecture, assume that the full-capacity buffer must be on the output side, while on the input side only 1/4 of that is needed.

6.2 Off-Chip Buffer Memory Cost

We wish to make three different switches with 2.5 Gbps ports: (a) a 16x16 switch, (b) a 32x32 switch, (c) a 64x64 switch. For each of them, we are considering: (1) shared buffer architecture; (2) internal speed-up of some value between 30% and 80% (s= 1.3 to 1.8). All buffer memories for these switches will be made using off-chip RAMs like those seen in section 3.2; two options are under consideration: (.1) ZBT SRAM chips, organized "x32", at 166 MHz; (.2) SDRAM DIMM modules, organized "x64", at 143 MHz.

For each combination of switch and queueing architecture, calculate how many SRAM chips or how many SDRAM DIMM's are needed in order to achieve the desired throughput. How much of the SDRAM throughput can you utilize? what will be the burst size? Assume that the SDRAM bus turn-around overhead is 3 clocks; how much throughput does that cost to you in each architecture? For the shared buffer architecture, what buffer width is needed? is that practical? why yes or why not? (consider in particular the SDRAM case, in connection with the burst size --what will be the "segment" size for packet segmentation?) For the internal speed-up architecture, adjust the speed-up so that each buffer memory consists of an integer number of chips or DIMM's; what speed-up value results?

Make a rough calculation of the number of pins that the switch chips will need in order to connect to the SRAM chips or to the DRAM DIMM's. For simplicity, assume 24 pins for address and control in all cases (SRAM or DRAM); however, consider carefully which memories receive a common address and control, and which one receive different such values. Assume that the maximum number of chips or DIMM's whose address and control pins can be driven from the same switch pins are 16; you need separate switch pins to drive loads heavier than that.