CS-534: Packet Switch Architecture
Fall 2001
Department of Computer Science
© copyright: University of Crete, Greece

4. Switch Queueing Architectures and Crossbar Scheduling


Subsections in the current document:


Introduction


Switch Queueing Architectures


4.1 Output Queueing Family


Output Queueing and Variations

Output Queueing: The Reference Arcitecture

Shared Buffer: Top Performance at Low Cost for small N

Buffer Space Requirements: When the incoming traffic consists of fixed-size packets from independent, identically distributed (i.i.d.) Bernoulli processes, with uniformly-distributed destination (output) ports, analysis and simulation have yielded the results plotted below --primarily from the Hluchyj and Karol JSAC paper of December 1988 referenced below (© copyright IEEE).
Attention: results derived for i.i.d. Bernoulli (non-bursty) arrivals, with uniformly-distributed destinations (no overloaded hot-spots), are only useful for gaining a rough, first insight into the behavior of systems, but are often not representative of the real behavior of systems under real traffic!...

Output queueing buffer size for very large N

Output queueing buffer size for various N and load 0.8 and 0.9 Shared buffering buffer size for various N and load 0.8 and 0.9

Buffer space requirements comparison for shared buffering versus output queueing

© copyright IEEE --primarily from the Hluchyj and Karol JSAC paper of December 1988 referenced below.


Crosspoint (Distributed) Queueing

Block-Crosspoint Queueing: Distributed Shared Buffers

Knock-Out Switch


4.2 Input Queueing Family


Input Queueing Family

Input Queueing is NOT the Dual of Output Queueing

Throughput and Delay under Input Queueing with Head-of-Line (HOL) Blocking: When the incoming traffic consists of fixed-size packets from independent, identically distributed (i.i.d.) Bernoulli processes, with uniformly-distributed destination (output) ports, analysis and simulation have yielded the results plotted below --primarily from the Hluchyj and Karol JSAC paper of December 1988 referenced below (© copyright IEEE).
Attention: results derived for i.i.d. Bernoulli (non-bursty) arrivals, with uniformly-distributed destinations (no overloaded hot-spots), are only useful for gaining a rough, first insight into the behavior of systems, but are often not representative of the real behavior of systems under real traffic!...

Input queueing saturation throughput

Input queueing delay Output queueing delay

Delay comparison for output versus input queueing

References:


Advanced Input Queueing (Buffering) - Virtual Output Queues


4.3 Combinations and Variations


Internal Speed-Up: Combination of Input and Output Queueing

Sorting Networks with Distributed Control

Switching Fabrics with Internal Buffering and Backpressure


4.4 Queueing Architecture Summary and Comparisons


Summary of MxN Switch Architectures


4.5 Shared Buffer Implementation: Wide Memory, Pipelined Memory


There is a Crossbar inside every Switch

High-Throughput Buffer: Interleaved Memory

Multiplexing the Switch Ports onto a single Memory Port

Time-Multiplexing the Switch Ports on a Wide Memory Port

Wide Memory


Pipelined Memory

Animated Illustration of the Pipelined Memory Operation [click here]

Control of the Pipelined Memory

VLSI Implementation the Pipelined Memory

Reference:
M. Katevenis, P. Vatsolaki, A. Efthymiou: "Pipelined Memory Shared Buffer for VLSI Switches", Proceedings of the ACM SIGCOMM '95 Conference, Cambridge, MA USA, 30 August - 1 Sep. 1995, pp. 39-48; http://archvlsi.ics.forth.gr/sw_arch/pipeMem.html . FORTH, Crete, Greece (1994): USA patent number 5,774,653 of 30 June 1998.


4.6 Crossbar Scheduling with (Per-Input) Virtual-Output Queues


Maximal versus maximum matching

Maximum matching is complex and may be unfair


The Two-Dimensional Round-Robin (2DRR) scheduler

Fairness in the 2DRR scheduler


Parallel Iterative Matching (PIM)

Performance of PIM, and fairness issue


The iSLIP algorithm

iSLIP discussion

Bibliographic references: click here


4.7 Speedup Required to Emulate Output Queueing


Can CIOQ with internal speedup emulate OQ?

Recent results on OQ emulation

Counter example to prove that speedup of 2-(1/N) is necessary

Bibliographic references: click here


Up to the Home Page of CS-534
 
© 2001 copyright: University of Crete, Greece.
Last updated: 29 Nov. 2001, by M. Katevenis.