|CS-534: Packet Switch Architecture
|Department of Computer Science
© copyright: University of Crete, Greece
Consider the situation where a packet destined to a gigabit Ethernet output arrives through an OC-12 input. The maximum-sized Ethernet packet is 1518 bytes long, and, when arriving through an ATM port, it arrives segmented into 32 ATM cells (1518 bytes / 48 payload-bytes/cell = 31.63 cells). The first 1518 bytes in the (48-byte) payloads of the 32 cells are precisely the 1518 bytes that constitute the Ethernet packet. Assume that we are guaranteed that these 32 cells will always arrive back-to-back, without any idle cells in-between them (when ATM traffic is carried over SONET links, cells are always transmitted back-to-back, without any "spacing" between them other than the SONET overhead bytes; however, in general, not all cells need to be valid --idle cells can be freely injected into the SONET payload in the general case, but not in our case between same-packet cells).
How soon, in nanoseconds, after the arrival of the first payload byte of the first ATM cell of the above 32-cell train can we transmit the first byte of the 8-byte gigabit Ethernet preamble? Obviously, right after the 8-byte preamble is transmitted, the 1518 bytes of the Ethernet packet must be transmitted, back-to-back, at the gigabit Ethernet rate, without any "hiccups". Our switch circuits are able to transmit any given byte within 100 ns at the earliest, after that byte is received at an input port.
Thus, for simple, unconstrained memory operation, besides total memory throughput, the other interesting performance metric is the peak possible rate of random accesses to arbitrary, independent locations (i.e. not necessarily sequential or in the same row). This is, in other words, the peak address rate for random, independent accesses.
(a) What is this number, for the various SRAM technologies seen in class, in millions of accesses per second (Maccesses/s)? Consider the single-port and dual-port on-chip SRAM, the QDR (burst-of-2, burst-of-4), and the DDR (only burst-of-4 is available) off-chip SRAM examples seen in class (sections 3.1 and 3.2).
(b) Consider that we build a 64-Byte (512-bit) wide buffer memory out of each of the above technologies in (a). For technologies that provide only burst accesses, the memory "width" is the total size of the entire burst that is accessed at a time. This 64-Byte width is a customary segment size in modern networking equipment, because 64 bytes is a "round" number just above the ATM cell size or the minimum IP packet size. (In the case of 18-bit or 36-bit wide parts, the total memory width will be 64x9 = 576 bits, where the extra 64 bits per segment are usually used for parity or ECC, and/or other off-band overhead information, e.g. end-of-packet and other such mark bits).
How many blocks (on-chip) or parts (off-chip) are needed, in each case of (a), for this 64-Byte wide buffer memory to be made? What is their aggregate peak throughput in Gbits/s? What is their total power consumption at peak rate, and their consumption per Gbps of offered throughput?
(c) --Optional Question--
Look on the web to find newer, "available now" SRAM parts that offer a higher number of Maccesses/s than the best of (a). Look, for example, at companies like Micron, IDT, Cypress, IBM, Fujitsu, Hitachi, Samsung, etc. Do not spend more than 1 hour on this investigation.
(a) First, find the peak access rate for trully random accesses, i.e. accesses that may fall in the same bank but in a different row relative to the previous access. Hint: this is directly linked to the (same-bank) cycle time.
What is the chip's peak data throughput (Gb/s) in this case? Assume all accesses are in the same direction (all reads, or all writes). Also, assume that you may set the burst size to "full page", i.e. "very long", and that the burst goes on continuously until interrupted by the next READ or WRITE command, at which time the new burst starts right away, without any idle time on the data bus (I am not sure whether this is in fact possible, or whether the next ACTIVE command to the same bank implicitely terminates the previous burst, but let's assume for this question that it is possible).
How does this throughput change for alternating read-write accesses? Assume that we perform a read access of a certain burst size, followed by a write access of some appropriate burst size, followed by another read, etc, where the burst sizes are adjusted so as to not decrease the peak address rate that we started with.
(b) --Optional Question-- (for expert hardware designers):
Now, allow for interleaved bank accesses, i.e. not trully random accesses any more. What is now the peak address rate, provided that accesses are successfully scheduled so that bank conflicts never occur. Show a timing diagram of how to interleave ACTIVE and READ/WRITE commands to the various banks. How many banks do you need, at a minimum, to achieve the peak address rate? What should the burst size be to fully utilize the data bus?
A PCI bus can be 32- or 64-bit wide, and can use a 33 or 66 or 100 MHz clock. Assume that each data transfer over PCI takes a fixed 4-clock-cycle overhead (arbitration, turn-around, framing, addressing, etc), plus 1 additional clock cycle for each (32- or 64-bit) data word being transferred; thus, a 1-word transfer takes 5 clock cycles, while an 8-word-burst transfer takes 12 clock cycles.
(a) What is the PCI throughput, in Mbps, in the following 4 cases?
(b) If a first generation switch without DMA is built around a PCI bus, and the system bottleneck is the PCI bus, what would be the maximum switch throughput (= aggregate incoming throughput = aggregate outgoing throughput), in Mbps, in the four cases of part (a)?
(c) Same question as (b), but for a first generation switch with DMA.
(d) Same question as (b), but for a second generation switch.
|Up to the Home Page of CS-534
University of Crete, Greece.
Last updated: 4 May 2004, by M. Katevenis.