Dept. of Computer Science, University of Crete.
CS-534: Packet Switch Architecture

Wormhole IP over (Connectionless) ATM

Sections in the current document:

1. Introduction: IP over ATM
2. Wormhole IP over (Connectionless) ATM

1. Introduction: IP over ATM

IP and ATM both have important positive elements; on the other hand, there are concerns about some aspects of either. Therefore, one would like to have the positive parts of both of these technologies in the (eventual) "global network" (the integration of telephony and internet?), while avoiding their disadvantages.

ATM (Connection-Oriented):

Why ATM Hardware
- fixed-size cells ==> easy [and fast (?)] hardware
- small-size cells ==> lower latency for high-priority traffic
Concerns about ATM Software (Network Management):
- lengthy standardization process
- complex ==> costly, slow
- in particular: connection set-up is slow, (partly?) due to the coupling between connection set-up, admission control, and reservation of resources for QoS guarantees.
How about pre-established connections: "Permanent VP/VC" (PVP/PVC)
- OK for ATM subnetworks of small to medium size (up to on the order of thousand nodes)
- impractical / infeasible for larger networks
- what QoS to provide to a PVP/PVC when its contents change with time???

IP (Connectionless):

Why IP
- because it is here, (probably?) to stay
- because it is fast for short-lived communication (no set-up delay)
Concerns about IP:
- non policed
- no QoS guarantees
- how to add policing and QoS guarantees without a notion of "connection" or "flow identification"??

IP over ATM (or "ATM under IP"): Combine the best of both

Use ATM Hardware
Speed-up Connection Establishment (do not use ATM-standard connection set-up software):
- others: still do it in software, but faster
- us, here: do it in hardware
- start with "Connectionless ATM", by R. Barnett (Electronics & Communications Engineering Journal, IEE, Great Britain, October 1997, pp. 221-230).
- show similarity with wormhole routing
- evolution from wormhole-IP-over-ATM switches to wormhole-IP-over-ATM "routing filters": easier to make; work with pre-existing ATM subnetworks.
Provide QoS guarantees to IP:
- automatic flow classification and identification (in hardware?)
- per-connection queueing and weighted-round-robin scheduling
- continuous adjustment (at run time) of the scheduling weight of permanent VP's?

2. Wormhole IP over (Connectionless) ATM

The key ideas in routing IP packets at hardware speed over ATM are:

IP Routing Table Lookup in hardware, at high speed (less than 1 cell-time).
Open a new VC (connection) for every IP packet that is being routed, and then tear down that connection as soon as forwarding of this packet is completed. Such hardware-created, short-lived connections are almost like "no connections", hence the name "connectionless ATM".
Upstream connection management: manage the connections on a link (open new connections, etc.) at the upstream node (link source) rather than at the downstream node. Thus, the new connection (VC) for every IP packet can be established right-away, at the source of every link, without having to wait for a round-trip delay, until the downstream neighbor allocates a new VC ID, before cells of the new packet can be forwarded on the link.

To the best of our knowledge, these ideas appeared for the first time in the literature in:

R. Barnett (rbarn@westell.com): "Connectionless ATM", Electronics & Communications Engineering Journal, IEE, Great Britain, October 1997, pp. 221-230.

We propose the name Wormhole IP over ATM, which, we believe, describes the above ideas better than the name "connectionless ATM". The reason is that the above method of routing IP packets resembles very closely "Wormhole Routing", the most popular interconnection network architecture in traditional multiprocessors (W. Dally and C. Seitz: "Deadlock-Free Message Routing in Multiprocessor Interconnection Networks", IEEE Trans. on Computers, May 1987, pp. 547-553). The similarities between the two techniques are as follows.

In wormhole routing, packets are routed by being segmented into fixed-size flits. In wormhole IP, packets are routed by being segmented into fixed-size ATM cells.
In wormhole routing, a number of virtual channels (lanes) exist on each physical link. In wormhole IP, we use a number of virtual circuits (VC's) which exist on each ATM link or in each ATM virtual path (VP).
In wormhole routing, when a new packet must be forwarded over a physical link, the sender must first allocate an (unused) virtual channel to this packet; if such a free virtual channel does not exist, the packet has to wait. A virtual channel remains assigned to a single packet until the entire packet is forwarded over the link; the virtual channel is then freed and can be assigned to another packet. Wormhole IP works in the same way: a new connection is opened for each IP packet, meaning that an unused VC is allocated to this IP packet until the entire packet is forwarded over the link or the VP; subsequently, the connection is torn down (the VC is freed). When all VC's are in use, newly arriving packets have to wait.

A difference between the two techniques is that wormhole routing uses backpressure, while wormhole IP is run over any kind of ATM; most ATM networks do not use backpressure, although backpressure (credit-based flow control) can certainly be added to ATM. We illustrate wormhole IP over ATM using the following figure.

Wormhole IP over ATM Switch

We start with a conventional ATM switch, and augment it with (i) an IP Routing Table with a hardware lookup mechanism, (ii) free-lists of connection identifiers with a fast management mechanism, and (iii) two additional bits per connection in the conventional ATM connection (VC translation) table.

Two kinds of traffic pass through this switch: (i) conventional ATM connections (black color in the above figure), and (ii) wormhole IP over ATM traffic (blue lines, above). For each incoming VC ID, the connection table specifies the kind of traffic and the connection state. There are 5 possible combinations of these parameters, requiring 3 bits of status information per connection versus 1 corresponding bit in conventional ATM:

inactive ATM: this VC ID is intended for conventional ATM traffic; no connection has yet been set up that uses this VC identifier.
active ATM: this is an established conventional ATM connection. This and the previous are the only possible states in a conventional ATM switch.
inactive IP: this VC ID is intended for wormhole IP over ATM traffic; no IP packet is currently using this VC identifier.
active IP: an IP packet is currently using this VC ID.
drop IP: an IP packet is currently arriving on this VC ID, but it cannot be forwarded (it has to be dropped) due to some error condition.

[DRAFT TEXT FOLLOWS !!!]

(000), (001): For each incoming cell, MuqPro II looks up its connection ID in the connection table. If the (incoming) connection state is 000 or 001 (normal-ATM), processing is as in MuqPro I, i.e. as in traditional ATM; such entries in the connection table are presumably set up by the microprocessor, which executes some ATM signaling protocol.

(100): When the incoming connection state is 100 (inactive-IP), the switch controller interprets the incoming cell as the first cell of an IP packet that has been segmented according to AAL-5. Consequently, that cell contains a destination IP address. [Note: assume reliable links, e.g. optical fibers; with such low bit error rates, it is not worth waiting till the end of the packet is received, in order to check the IP CRC; in the rare cases where the address is wrong, the packet will simply be mis-routed, and the eventual receiver will check the CRC and discard it. Cell loss due to buffer overflows in the switches (if the network does not run credit flow control) is another issue: it is treated by the "drop-IP-packet" connection state -- see below].

The switch controller performs the following for every cell arriving on an inactive-IP connection (state 100): * look up the destination IP address contained in this "head cell" in the IP routing table; assume that this is a multi-level table in SRAM chips that has been set up and is being updated by the microprocessor, and that it contains 1 bit per IP destination address, specifying whether or not packets going to that destination should be forwarded on this outgoing link; * IF the IP routing table contains a 0, or IF the cell buffer of the switch is full and there is no space in it for this cell, drop the cell and change the connection state to 110 (drop-IP-packet) [explanation: if the head cell of an IP packet has to be dropped, the entire packet will be dropped, so that subsequent switches do not interpret any remaining non-head cell of this packet as containing an IP destination address; as a side-effect, cells that are useless for their recipient are not allowed to consume network resources, like several researchers have pointed out before]; * ELSE (i.e. if the IP routing table contains a 1), proceed as follows: * obtain an inactive-IP connection ID on the outgoing link, by dequeueing the next item from the corresponding free-ID list (the address space of connection ID's on a link is managed by the switch that drives the link); * install the outgoing connection ID that was just obtained into the connection table; * change the connection state to 111 (active-IP); * choose a service class for the new connection; this can be set to a default IP service class (low weight, low cost class?), unless an indication to the contrary can be found in the IP packet header. * enqueue the incoming cell into the queue that corresponds to its (new) outgoing connection ID; * enqueue the (new) outgoing connection ID into the proper scheduler queue.

[Note on rate and latency of opening new IP-packet connections: the hope is that all the above operations for an IP packet's head cell arriving on an inactive-IP connection can be performed at the rate of one such operation per 16 clock cycles (of the 30 MHz clock of MuqPro I / II), which is precisely the rate at which MuqPro I processes incoming cells of any type (including normal ATM cells); regarding latency, I do not see why this would have to be more than about 30 clock cycles, i.e. roughly 1 microsecond (excluding the SONET / datapath latency, though, which is higher)].

The end of an IP packet that was segmented according to AAL-5 is recognized by a special value in the PTI header field of the ATM cell that carries this end of packet. This end-of-IP-packet indicator is monitored for cells arriving on drop-IP or active-IP state connections, as described below.

(110): When the incoming connection state is 100 (drop-IP-packet), the incoming cell is dropped, after looking at its PTI header field. If the dropped cell was an end-of-IP-packet cell, the connection is placed in state 100 (inactive-IP). Thus, once it has been decided to drop (the rest of) an IP packet, the entire (rest of) packet is dropped, and the connection is then reset to sate 100, so as to interpret the next arriving cell as the head of a new IP packet.

(111) When the incoming connection state is 111 (active-IP), the switch controller processes the cell as for normal-ATM connections: look up the service class, rate/credit information, and the outgoing connection ID, enqueue the cell in the corresponding queue, and possibly enqueue that connection ID in a scheduler queue. In addition, do the following: * if this is an end-of-IP-packet cell, the connection is placed in state 100 (inactive-IP), so as to interpret the next arriving cell as the head of a new IP packet; * if this is not an end-of-IP-packet, but the cell buffer of the switch is (almost) full, truncate this IP packet, i.e. set the end-of-IP-packet indication in the PTI header field, and set the connection state to 110 (drop-IP-packet).

(OUTPUT) As in MuqPro I, the output scheduler performs the following: Choose one of the (per-service-class) queues of connections (according to a weighted round-robin discipline), dequeue the next connection from that queue, and service the next cell of it. If after receiving such service, the connection is still eligible for further service (e.g. it still has cells and credits), enqueue this connection again at the back of the queue where it was dequeued from. In addition to the above, the output scheduler of MuqPro II looks into the outgoing connection table, to find the state (I bit) of the connection that is receiving service. If this I bit indicates that this is an IP-over-ATM connection, and if the cell that is being serviced is an end-of-IP-packet cell, then this connection ID is returned to the free-ID list.

Optimization for Multi-Packet TCP Connections:

The above proposal opens a new ATM connection for each IP packet. Switches "recycle" connections after all cells of the IP packet on a given connection have left the switch. The output scheduler allocates transmission (throughput) resources according to connections, hence according to IP packets. If an end-to-end session consists of a single IP packet, then such allocation of network resources is (or at least looks) fair. On the other hand, if a session injects multiple IP packets into the network, subsequent packets may open up new ATM connections in a switch before old connections by previous packets of the same session have been closed. This results in receiving unfairly large service, and in congested sessions negatively affecting uncongested ones, in just the same way as the wormhole backpressure protocol performs worse than the ATLAS I credit flow control protocol in [KaSS96]. To remedy this situation, we can consider two methods: one that is simpler but not transparent to today's standards, and one that is complex and costly but transparent.

The easy but non-transparent solution is to extend AAL-5 with a new value of the PTI cell header field, which signifies "end of packet *but not* end of session". This would be in addition to the existing AAL-5 PTI value that signifies "end of packet", which would then mean "end of packet and end of session". When this extension is put into effect, wormhole-IP-over-ATM switches like the above MuqPro II will keep a (single) ATM connection open untill they see the normal end-of-packet indication, i.e. until an end-of-packet-and-end-of-session occurs. End stations, on the other hand, will delimit IP packets using the new end-of-packet-but-not-end-of-session indicator.

The costly but transparent solution is to augment the routing table in each wormhole-IP-over-ATM switch with a connection ID field, and to also introduce an IP-address field in the outgoing connection table. The former entry is used to mark all IP destination addresses for which an ATM connection is already open; new IP packets destined to such IP addresses should not be allowed to open new ATM connections. The IP-address field in the outgoing connection table will be used to unmark the routing table, when such a connection is closed. [KaSS96] M. Katevenis, D. Serpanos, E. Spyridakis: "Credit Flow Controlled ATM versus Wormhole Routing", Technical Report TR-171, ICS, FORTH, Heraklio, Crete, Greece, July 1996; URL: file://ftp.ics.forth.gr/tech-reports/1996/1996.TR171.ATM_vs_Wormhole.ps.gz

Interoperation with existing IP, ATM, and IP/Tag Switching:

The proposed wormhole IP-over-ATM switch interoperates well with pre-existing networking equipment: * To ATM equipment, it looks like an ATM switch. Normal-ATM connections (states 000 and 001) are just that: normal ATM connections, which share the switch resources with the IP connections according to the scheduling policies decided by the network manager, and thus enjoy whatever QoS guarantees these scheduling policies are designed for.

* To IP equipment, it looks like an IP router. On the incoming side, IP packets have to be segmented into ATM cells. Each IP packet has to be transmitted as consecutive cells over a single connection ID (VP/VC). Different IP packets can be sent sequentially over the same connection ID, or sequentially or in parallel over different connection ID's. The connection ID's used must belong to a set that is known to the wormhole IP-over-ATM switch, i.e. whose state was set to 100 by the network manager. Thus, the wormhole switch can receive IP packets from a normal IP link, provided a segmentation device is used; single, default VP/VC connection suffices. The wormhole switch can also receive IP packets from an IP Switch (Ipsilon) or from a Tag Switch (CISCO); I believe that the only difference here is that different IP packets can arrive in parallel over different connection ID's.

On the outgoing side, IP packets have to be reassembled from ATM cells. Each IP packet is reassembled from consecutive cells over a single connection ID (VP/VC). The number of different connection ID's over which IP packets can appear (in parallel) depends on the network manager: it is the set of ID's with which the "free-ID list" was initialized on the outgoing link of the wormhole IP-over-ATM switch. One extreme choice is to provide a single such ID, in which case the wormhole switch will effectively serialize the IP packets over that single connection; needless to say that this will often result in poor performance and head-of-line blocking, but if the IP receiver is that dumb, then it gets what it deserves.... [Note: this brings to surface one issue that the wormhole switch must solve: what happens when a new IP packet arrives over an inactive-IP connection, and the "free-ID list" is empty. If the outgoing link provides a large number of IP-over-ATM connection ID's (e.g. thousands or tens of thousands), then this empty free-ID list may be a rare phenomenon; in such a case, we may opt for dropping the IP packet. Otherwise (including the above extreme case of a single outgoing IP-over-ATM connection ID), the wormhole switch must also implement a queue of IP packets (i.e. a queue of queues of cells) which are awaiting a free outgoing ID. A separate "IP router process" must also be implemented, that chooses one of these waiting IP packets whenever a free outgoing ID becomes available; to avoid head-of-line blocking, this router process may want to only consider IP packets that have been received in full (not just partly), in case free ID's are scarce].

Speed, and Scalability:

I believe that we should be able to make the above MuqPro II prototype operate as a 4x1 "switch" at 155 Mb/s/link, using FPGA's and a 30 MHz clock. A next step, of realistic complexity, could be to a 4x1 "switch" at 622 Mb/s/link, using an ASIC implementation and a 120 MHz clock; of course, the major problem to solve there is that the external memory speed does not scale with the internal clock scaling. These two implementations should be enough for "products" in the next few years. For higher fan-in/fan-out's and for higher link speeds, other, more advanced architectures, plus pipelining, will probably be necessary. We are not afraid of these, when the "market" becomes ready for such high speeds....

[ Up to the Home Page of CS-534 ]