CS-534: Packet Switch Architecture
Fall 2001
Department of Computer Science
© copyright: University of Crete, Greece

3. Buffer Memory Technologies and Architectures


Sections in this Chapter:


3.1 On-Chip Memories


On-Chip SRAM


(a) Detailed Examples in an Older Technology:
The plots, below, show examples of cost (area, power) and performance (cycle time) for on-chip static RAM blocks, as functions of capacity, number of ports, and port width. These examples are inspired by and representative of various 0.35-micron CMOS technologies of about 1996 to 1998. Today (2001), they are outdated, but the detailed figures in these plots are useful in studying how memory size and width affect the cost and performance.

On-chip SRAM area

On-chip SRAM cycle time and power consumption

Examples of Cost-Performance of On-Chip SRAM Systems

Notice that the cycle time and power figures given are for the worst case; power consumption for 2-port memories is for both ports.


(b) Some Examples in two Current Technologies:
The cost and performance figures below are for a couple of SRAM configuration examples in two modern (2001) technologies -- 0.18-micron CMOS (usual modern technology) and 0.13-micron CMOS (aggressive and expensive modern technology). Fabrication is by Virtual Silicon Technology, Inc. and can be provided to the Univ. of Crete by the Europractice IC Service (members of the Univ. of Crete, please do not contact either of the above directly, but instead go through the local Europractice representatives by sending e-mail to europractice@csd.uoc.gr).

On-Chip SRAM / Register-File Examples - Year 2001

Notice that the cycle time and power figures given are for the typical case --the worst-case numbers are usually quite worse, may be as much as by a factor of two.... Also, power consumption for 2-port memories is per-port.


3.2 Off-Chip Memory Technologies


SRAM with address and data registers
	  (pipelined, clocked, synchronous interface)

DDR (Double Data Rate) Timing

Source-Synchronous Data Clocking


Separate D(in) and Q(out) versus Shared DQ Data Bus

QDR (Quad Data Rate) SRAM

Reference: the "QDR Partnership" Web Site: http://www.qdrsram.com/.

Example QDR SRAM: Micron MT54V512H18

Micron Technology Inc.

Reference 1: Micron "MT54V 512H 18" 512 K x 18 bit (9 Mbit) QDR SRAM: PDF data sheet available on-line. For product availability information see the Component Selector Guide.

Reference 2 (Jan. 2002): Alpine Microsystems plans to introduce (Q1-Q2 of 2002) the "PacketRAM Family" of pipelined QDR SRAMS, offering 96 Gbps at 333 MHz, and higher speeds later on.


ZBT (Zero Bus Turn-around) Timing

Example DDR SRAM: Micron MT57V256H36

Micron Technology Inc.

Reference: Micron "MT57V 256H 36" 256 K x 36 bit (9 Mbit) DDR SRAM: PDF data sheet available on-line. For product availability information see the Component Selector Guide.


DRAM Basics: Row Address, Column Address, Precharge

Example DDR SRAM: Micron MT57V256H36

Single-Bank Read Access

Single-Bank Write Access

Multiple Accesses to Different Columns in the same Row of a Bank

Multi-Bank Operation: Memory Interleaving

Micron Technology Inc.

Reference: Micron "MT46V 2M 32" 2 M x 32 bit (64 Mbit) DDR SDRAM: PDF data sheet available on-line. For product availability information see the Component Selector Guide.


3.3 Communicating across Clock Domains and Elastic Buffers


The need for cross-clock-domain communication

Metastability, synchronization delay
Reference: W. Dally, J. Poulton: "Digital Systems Engineering", Cambridge University Press, 1998, ISBN 0-521-59292-5 (section 10.2: Synchronization Fundamentals).

Was the signal sampled before or after its change?

Asynchronous sampling of multibit signals (almost impossible)

Elastic Buffer (2-asynchronous-port SRAM)

Reminder: Circular Array Implementation of FIFO Queue

One-Hot Pointer Encoding

Empty/Full FIFO Detection using One-Hot Pointer Encoding

Empty and Full flags: asynchronous to either clock

Synchronized Empty/Full Generation for High-Throughput Operation Synchronized Empty/Full Generation: explanations
Reference: W. Dally, J. Poulton: "Digital Systems Engineering", Cambridge University Press, 1998, ISBN 0-521-59292-5 (section 10.3: Synchronizer Design, especially section 10.3.4.2).


3.4 FIFO Buffer Memories


Single FIFO Queue in a Memory Block: Circular Buffer

Multiple FIFO Queues with Statically Partitioned Space for each

Multiple FIFO Queues with Shared Space 1: shifting (impractical)

Multiple FIFO Queues with Shared Space 2: linked lists of blocks

See Exercises 4 and Exercises 5 for a discussion of how large the segment (block) size should be.


3.5 Multi-Queue Data Structures


Multi-queue buffer memory using linked lists of memory blocks

Enqueue/Dequeue: basic cases

Enqueue/Dequeue: exceptional cases

Cost-performance tradeoffs

For an example of a highly-pipelined implementation of managing multiple linked-list queues, see:


NxtPtr inside data memory: free block preallocation

Reference: A. Nikologiannis, M. Katevenis: "Efficient Per-Flow Queueing in DRAM at OC-192 Line Rate using Out-of-Order Execution Techniques", Proc. IEEE Int. Conf. on Communications (ICC'2001), Helsinki, Finland, June 2001, pp. 2048-2052; http://archvlsi.ics.forth.gr/muqpro/queueMgt.html

Packet size, block size, line rate, queue Op rate

See also Exercises 4 and Exercises 5 for discussions on segment (block) size and access rate.

Queue Op rate, free list rate, free list bypass


Dropping a Segment

Dropping a Packet


3.6 Queueing for Multicast Traffic


Multicast Traffic: Same or Different Queues with Unicast Traffic?

case 2: each segment is allowed to belong to multiple queues

Data Structures for a segment to belong to up to N Queues

case 2: Decouple Linked List Nodes from Data Buffer Addresses


Up to the Home Page of CS-534
 
© 2001 copyright: University of Crete, Greece.
Last updated: 30 Jan. 2002, by M. Katevenis.