CS-534: Packet Switch Architecture
Fall 2001
Department of Computer Science
© copyright: University of Crete, Greece

Exercise Set 6:
Linked-List Queue Management

Assigned: 2001-11-01 (week 6) -- Due: 2001-11-09 (week 7)

6.1 Pointer-Access Cost of Enqueue and Dequeue Operations

In section 3.5, 4th transparency, we saw a table of enqueue and dequeue latencies and throughput, for various degrees of memory access parallelism, i.e. different numbers of separate memory blocks, on-chip or off-chip. This table assumed that all memory accesses shown on the previous two transparencies --both normal ("basic") and exceptional-- have to be performed during each and every enqueue or dequeue operation. This is not necessarily true, especially for the more "serial" implementations (fewer memories available, more cycles per operation): the first access will often show whether we are in the normal or in the exceptional case, and the next accesses can then be tailored accordingly.

Consider the case where all pointers are stored in a single, 1-port, off-chip memory (first line of the table). More precisely, the width of this memory is either the width of one pointer (one segment address), or that width plus 1 bit; in the latter case, we can store one head or tail pointer plus one Empty Flag in one word of this available memory. (Note: an alternative scheme to storing both an empty flag and a pointer in a word is to only store the pointer while reserving one specific pointer value (e.g. "NULL") to denote an empty queue; for the purposes of this exercise, the two schemes are identical). This single memory contains all next-pointers (NxtPtr), all head pointers (Hd) of all queues, all tail pointers (Tl) of all queues, and all empty flags (in case such flags are used) of all queues. Each access to this single memory is performed to a single address (it is not possible to access in parallel the pointer in word A and the empty flag in another word B), and the access either reads the whole addressed word, or writes the whole addressed word (pointer plus flag, if a flag exists).

Count the number of accesses to this memory that are necessary in order to perform each of the following operations in the cases listed:

Determine the above 4 counts of memory accesses separately in each of the following 3 queue implementation schemes: The counts of memory accesses are important because they provide an upper bound for the achievable rate of queue operations (memory access rate divided by number of accesses per queue operation). Assume that the operations performed on our queues consist of alternating enqueues and dequeues (1 enq, then 1 deq, then 1 enq, then 1 deq, etc).

6.2 Descriptor Memory Space needed for Multicast QoS Support

In section 3.6, we saw that the provision of quality of service for multicast traffic requires that it be possible for a segment to reside simultaneously in multiple (per-output) queues. Two schemes were considered for implementing this: dedicated per-output and per-segment next-pointers (nxtPtr), or queue-member descriptors that are decoupled from the segment addresses.

Consider a shared-buffer switch with 16 output ports. The buffer memory is made of 16 synchronous DRAM chips of size 2 M x 32 bits, each, like the MT46V2M32 DDR SDRAM chip example that we saw in section 3.2. The segment size is 128 bytes. What is the size of the buffer memory, in MBytes, and what is the number of segments in it? How wide (how many bits) is a segment pointer? How many reference counts and how wide each do we need?

(a) According to the first scheme --dedicated per-output and per-segment next-pointers-- how many nxtPtr's will the system need to store? What is the size, in Mbits, of the memory required to store these pointers plus the required reference counts? Assume that this memory is built out of 9 Mbit DDR static RAM chips like the MT57V256H36 chip example seen in section 3.2, of size 256 K x 36 bits each, that support burst-of-4 accesses only. Further assume that the width of this memory should be such that each access touches all nxtPtr's and the reference count associated with one segment (remember that each access yields a burst of 4 words from each chip). How wide (how many chips) does this memory need to be? Note that the memory width must be an integer number of chips, but this number does not need to be a power of two. How deep does the memory need to be? How many SRAM chips do we need in total?

(b) Now consider the second scheme --queue-member descriptors that are decoupled from segment addresses. We choose to have a number of descriptors that is twice the number of buffer memory segments. How many descriptors will we have? What is the width (number of bits) of a descriptor pointer? How large (number of bits) is a descriptor? What is the size, in Mbits, of the memory required to store these descriptors plus the required reference counts? Assuming that this memory is also built out of 9 Mbit SRAM chips, and that a "nice" packing of descriptors and reference counts into memory words can be found, what is the (minimum) required number of chips for this memory?

(c) Assuming that the average fan-out of multicast segments is 4, what is the maximum percentage of multicast segments in the buffer memory, so that we do not run out of descriptors before we run out of buffer space?

(Hint for optional thought on question (b), for those interested: a "nice" packing of descriptors and reference counts into memory words could be a placement such that the number of descriptors per burst is not a power of 2. In that case, to avoid the need for a divider in the addressing circuit, we could make descriptor pointers to be true burst-number and burst-offset pointers, and agree that not all integer numbers are valid descriptor pointers; as long as the free list of descriptors is correctly initialized to only contain the valid descriptors, no one else should ever care...)

Up to the Home Page of CS-534
© copyright University of Crete, Greece.
Last updated: 1 Nov. 2001, by M. Katevenis.