CS-534: Packet Switch Architecture

# Exercise Set 4: Switch Generations, Elastic Buffers

Normally assigned 2000-03-02 (week 4) and due 2000-03-09 (week 5)
Actually Assigned: 2000-03-07 (week 5) -- Due: 2000-03-14 (week 6)

### 4.1 First Generation Router/Switch Performance

Read Example 8.6 from the book (Keshav), pp. 180-182, which was written in 1996 and concerns the performance of a first-generation router assuming that the bottleneck in the system is the processor.

(a) Today, the processor in this example would run at 400 MHz, instead of 133 MHz. How would the answers in the example change owing to this? Assume for the moment that the interrupt latency stays fixed, and that so does the time to access packet data from memory or I/O (the latter are not sped up much by the faster clock because many of these accesses refer to I/O or miss in the caches). Although the processor has become 3 times faster, the router throughput has increased by a much lower percentage --how much? Why is this so? Which are the dominant delays?

(b) Further to the faster clock in (a), assume that the interrupt latency is now reduced to 2 microseconds. How does this change your answers in (a)? Give each of your two answers both in Mbits/s and Mpackets/s.

(c) Further to (b), assume now that the mean packet size becomes 64 bytes instead of 500 bytes. How does this change your answers in (b)? Do the Mpackets/s increase or decrease? Do the Mbits/s increase or decrease? Why?

(d) Further to (c), assume that we now use this system to switch ATM cells. Ignoring the (very heavy, unfortunately) call set-up overhead, assume that the only changes are: (i) the packet size is 53 bytes, instead of 64 or 500; (ii) the packet forwarding code takes 50 machine cycles, instead of 200; and (iii) the packet header is 4 bytes, instead of 20. What are now the two thrpoughput values, in Mbits/s and Mpackets/s?

### 4.2 PCI Bus in a First/Second Generation Router

A PCI bus can be 32- or 64-bit wide, and can use a 33 or 66 MHz clock. Assume that each data transfer over PCI takes a fixed 4-clock-cycle overhead (arbitration, turn-around, framing, addressing, etc), plus 1 additional clock cycle for each (32- or 64-bit) data word being transferred; thus, a 1-word transfer takes 5 clock cycles, while an 8-word-burst transfer takes 12 clock cycles.

(a) What is the PCI throughput, in Mbps, in the following 4 cases? (i) 32-bit 33 MHz PCI, transferring individual words all the time; (ii) 64-bit 66 MHz PCI, transferring individual words all the time; (iii) 32-bit 33 MHz PCI, transferring 64-byte bursts all the time; (iv) 64-bit 66 MHz PCI, transferring 64-byte bursts all the time.

(b) If a first generation switch without DMA is built around a PCI bus, and the system bottleneck is the PCI bus, what would be maximum switch throughput (= aggregate incoming throughput = aggregate outgoing throughput), in Mbps, in the four cases of part (a)?

(c) Same question as (b), but for a first generation switch with DMA.

(d) Same question as (b), but for a second generation switch.

### 4.3 Elastic Buffers

In an elastic buffer made using 2-port SRAM, the Empty and Full flags are asynchronous with respect to either of the two clocks.

(a) Actually, each flag is set in synchrony with one clock and is reset in synchrony with the other clock. Say specifically which flag transition occurs in synchrony with which clock and why.

(b) Could we exploit the above fact in order to avoid having to wait for a synchronization delay when looking at the Empty flag in order to flow-control the dequeue (read) operations, and when looking at the Full flag in order to flow-control the enqueue (write) operations? Why not? Construct a precise scenario (timing diagram) of word enqueueings and dequeueings that demonstrates that the Empty flag may be changing at precisely the time when the read FSM tries to look at it, or (another scenario where) the Full flag may be changing at precisely the time when the write FSM tries to look at it.