IP over ATM (or "ATM under IP"): Combine the best of both
R. Barnett (email@example.com): "Connectionless ATM", Electronics & Communications Engineering Journal, IEE, Great Britain, October 1997, pp. 221-230.We propose the name Wormhole IP over ATM, which, we believe, describes the above ideas better than the name "connectionless ATM". The reason is that the above method of routing IP packets resembles very closely "Wormhole Routing", the most popular interconnection network architecture in traditional multiprocessors (W. Dally and C. Seitz: "Deadlock-Free Message Routing in Multiprocessor Interconnection Networks", IEEE Trans. on Computers, May 1987, pp. 547-553). The similarities between the two techniques are as follows.
We start with a conventional ATM switch, and augment it with (i) an IP Routing Table with a hardware lookup mechanism, (ii) free-lists of connection identifiers with a fast management mechanism, and (iii) two additional bits per connection in the conventional ATM connection (VC translation) table.
Two kinds of traffic pass through this switch: (i) conventional ATM connections (black color in the above figure), and (ii) wormhole IP over ATM traffic (blue lines, above). For each incoming VC ID, the connection table specifies the kind of traffic and the connection state. There are 5 possible combinations of these parameters, requiring 3 bits of status information per connection versus 1 corresponding bit in conventional ATM:
(000), (001): For each incoming cell, MuqPro II looks up its connection ID in the connection table. If the (incoming) connection state is 000 or 001 (normal-ATM), processing is as in MuqPro I, i.e. as in traditional ATM; such entries in the connection table are presumably set up by the microprocessor, which executes some ATM signaling protocol.
(100): When the incoming connection state is 100 (inactive-IP), the switch controller interprets the incoming cell as the first cell of an IP packet that has been segmented according to AAL-5. Consequently, that cell contains a destination IP address. [Note: assume reliable links, e.g. optical fibers; with such low bit error rates, it is not worth waiting till the end of the packet is received, in order to check the IP CRC; in the rare cases where the address is wrong, the packet will simply be mis-routed, and the eventual receiver will check the CRC and discard it. Cell loss due to buffer overflows in the switches (if the network does not run credit flow control) is another issue: it is treated by the "drop-IP-packet" connection state -- see below].
The switch controller performs the following for every cell arriving on an inactive-IP connection (state 100): * look up the destination IP address contained in this "head cell" in the IP routing table; assume that this is a multi-level table in SRAM chips that has been set up and is being updated by the microprocessor, and that it contains 1 bit per IP destination address, specifying whether or not packets going to that destination should be forwarded on this outgoing link; * IF the IP routing table contains a 0, or IF the cell buffer of the switch is full and there is no space in it for this cell, drop the cell and change the connection state to 110 (drop-IP-packet) [explanation: if the head cell of an IP packet has to be dropped, the entire packet will be dropped, so that subsequent switches do not interpret any remaining non-head cell of this packet as containing an IP destination address; as a side-effect, cells that are useless for their recipient are not allowed to consume network resources, like several researchers have pointed out before]; * ELSE (i.e. if the IP routing table contains a 1), proceed as follows: * obtain an inactive-IP connection ID on the outgoing link, by dequeueing the next item from the corresponding free-ID list (the address space of connection ID's on a link is managed by the switch that drives the link); * install the outgoing connection ID that was just obtained into the connection table; * change the connection state to 111 (active-IP); * choose a service class for the new connection; this can be set to a default IP service class (low weight, low cost class?), unless an indication to the contrary can be found in the IP packet header. * enqueue the incoming cell into the queue that corresponds to its (new) outgoing connection ID; * enqueue the (new) outgoing connection ID into the proper scheduler queue.
[Note on rate and latency of opening new IP-packet connections: the hope is that all the above operations for an IP packet's head cell arriving on an inactive-IP connection can be performed at the rate of one such operation per 16 clock cycles (of the 30 MHz clock of MuqPro I / II), which is precisely the rate at which MuqPro I processes incoming cells of any type (including normal ATM cells); regarding latency, I do not see why this would have to be more than about 30 clock cycles, i.e. roughly 1 microsecond (excluding the SONET / datapath latency, though, which is higher)].
The end of an IP packet that was segmented according to AAL-5 is recognized by a special value in the PTI header field of the ATM cell that carries this end of packet. This end-of-IP-packet indicator is monitored for cells arriving on drop-IP or active-IP state connections, as described below.
(110): When the incoming connection state is 100 (drop-IP-packet), the incoming cell is dropped, after looking at its PTI header field. If the dropped cell was an end-of-IP-packet cell, the connection is placed in state 100 (inactive-IP). Thus, once it has been decided to drop (the rest of) an IP packet, the entire (rest of) packet is dropped, and the connection is then reset to sate 100, so as to interpret the next arriving cell as the head of a new IP packet.
(111) When the incoming connection state is 111 (active-IP), the switch controller processes the cell as for normal-ATM connections: look up the service class, rate/credit information, and the outgoing connection ID, enqueue the cell in the corresponding queue, and possibly enqueue that connection ID in a scheduler queue. In addition, do the following: * if this is an end-of-IP-packet cell, the connection is placed in state 100 (inactive-IP), so as to interpret the next arriving cell as the head of a new IP packet; * if this is not an end-of-IP-packet, but the cell buffer of the switch is (almost) full, truncate this IP packet, i.e. set the end-of-IP-packet indication in the PTI header field, and set the connection state to 110 (drop-IP-packet).
(OUTPUT) As in MuqPro I, the output scheduler performs the following: Choose one of the (per-service-class) queues of connections (according to a weighted round-robin discipline), dequeue the next connection from that queue, and service the next cell of it. If after receiving such service, the connection is still eligible for further service (e.g. it still has cells and credits), enqueue this connection again at the back of the queue where it was dequeued from. In addition to the above, the output scheduler of MuqPro II looks into the outgoing connection table, to find the state (I bit) of the connection that is receiving service. If this I bit indicates that this is an IP-over-ATM connection, and if the cell that is being serviced is an end-of-IP-packet cell, then this connection ID is returned to the free-ID list.
Optimization for Multi-Packet TCP Connections:
The above proposal opens a new ATM connection for each IP packet. Switches "recycle" connections after all cells of the IP packet on a given connection have left the switch. The output scheduler allocates transmission (throughput) resources according to connections, hence according to IP packets. If an end-to-end session consists of a single IP packet, then such allocation of network resources is (or at least looks) fair. On the other hand, if a session injects multiple IP packets into the network, subsequent packets may open up new ATM connections in a switch before old connections by previous packets of the same session have been closed. This results in receiving unfairly large service, and in congested sessions negatively affecting uncongested ones, in just the same way as the wormhole backpressure protocol performs worse than the ATLAS I credit flow control protocol in [KaSS96]. To remedy this situation, we can consider two methods: one that is simpler but not transparent to today's standards, and one that is complex and costly but transparent.
The easy but non-transparent solution is to extend AAL-5 with a new value of the PTI cell header field, which signifies "end of packet *but not* end of session". This would be in addition to the existing AAL-5 PTI value that signifies "end of packet", which would then mean "end of packet and end of session". When this extension is put into effect, wormhole-IP-over-ATM switches like the above MuqPro II will keep a (single) ATM connection open untill they see the normal end-of-packet indication, i.e. until an end-of-packet-and-end-of-session occurs. End stations, on the other hand, will delimit IP packets using the new end-of-packet-but-not-end-of-session indicator.
The costly but transparent solution is to augment the routing table in each wormhole-IP-over-ATM switch with a connection ID field, and to also introduce an IP-address field in the outgoing connection table. The former entry is used to mark all IP destination addresses for which an ATM connection is already open; new IP packets destined to such IP addresses should not be allowed to open new ATM connections. The IP-address field in the outgoing connection table will be used to unmark the routing table, when such a connection is closed. [KaSS96] M. Katevenis, D. Serpanos, E. Spyridakis: "Credit Flow Controlled ATM versus Wormhole Routing", Technical Report TR-171, ICS, FORTH, Heraklio, Crete, Greece, July 1996; URL: file://ftp.ics.forth.gr/tech-reports/1996/1996.TR171.ATM_vs_Wormhole.ps.gz
Interoperation with existing IP, ATM, and IP/Tag Switching:
The proposed wormhole IP-over-ATM switch interoperates well with pre-existing networking equipment: * To ATM equipment, it looks like an ATM switch. Normal-ATM connections (states 000 and 001) are just that: normal ATM connections, which share the switch resources with the IP connections according to the scheduling policies decided by the network manager, and thus enjoy whatever QoS guarantees these scheduling policies are designed for.
* To IP equipment, it looks like an IP router. On the incoming side, IP packets have to be segmented into ATM cells. Each IP packet has to be transmitted as consecutive cells over a single connection ID (VP/VC). Different IP packets can be sent sequentially over the same connection ID, or sequentially or in parallel over different connection ID's. The connection ID's used must belong to a set that is known to the wormhole IP-over-ATM switch, i.e. whose state was set to 100 by the network manager. Thus, the wormhole switch can receive IP packets from a normal IP link, provided a segmentation device is used; single, default VP/VC connection suffices. The wormhole switch can also receive IP packets from an IP Switch (Ipsilon) or from a Tag Switch (CISCO); I believe that the only difference here is that different IP packets can arrive in parallel over different connection ID's.
On the outgoing side, IP packets have to be reassembled from ATM cells. Each IP packet is reassembled from consecutive cells over a single connection ID (VP/VC). The number of different connection ID's over which IP packets can appear (in parallel) depends on the network manager: it is the set of ID's with which the "free-ID list" was initialized on the outgoing link of the wormhole IP-over-ATM switch. One extreme choice is to provide a single such ID, in which case the wormhole switch will effectively serialize the IP packets over that single connection; needless to say that this will often result in poor performance and head-of-line blocking, but if the IP receiver is that dumb, then it gets what it deserves.... [Note: this brings to surface one issue that the wormhole switch must solve: what happens when a new IP packet arrives over an inactive-IP connection, and the "free-ID list" is empty. If the outgoing link provides a large number of IP-over-ATM connection ID's (e.g. thousands or tens of thousands), then this empty free-ID list may be a rare phenomenon; in such a case, we may opt for dropping the IP packet. Otherwise (including the above extreme case of a single outgoing IP-over-ATM connection ID), the wormhole switch must also implement a queue of IP packets (i.e. a queue of queues of cells) which are awaiting a free outgoing ID. A separate "IP router process" must also be implemented, that chooses one of these waiting IP packets whenever a free outgoing ID becomes available; to avoid head-of-line blocking, this router process may want to only consider IP packets that have been received in full (not just partly), in case free ID's are scarce].
Speed, and Scalability:
I believe that we should be able to make the above MuqPro II prototype operate as a 4x1 "switch" at 155 Mb/s/link, using FPGA's and a 30 MHz clock. A next step, of realistic complexity, could be to a 4x1 "switch" at 622 Mb/s/link, using an ASIC implementation and a 120 MHz clock; of course, the major problem to solve there is that the external memory speed does not scale with the internal clock scaling. These two implementations should be enough for "products" in the next few years. For higher fan-in/fan-out's and for higher link speeds, other, more advanced architectures, plus pipelining, will probably be necessary. We are not afraid of these, when the "market" becomes ready for such high speeds....