AccessMyLibrary provides FREE access to millions of articles from top publications available through your library.

Wormhole Routing Techniques for Directly Connected Multicomputer Systems.

ACM Computing Surveys

| September 01, 1998 | MOHAPATRA, PRASANT | COPYRIGHT 1996 Association for Computing Machinery, Inc. (Hide copyright information)Copyright

Wormhole routing has emerged as the most widely used switching technique in massively parallel computers. We present a detailed survey of various techniques for enhancing the performance and reliability of wormhole-routing schemes in directly connected networks. We start with an overview of the direct network topologies and a comparison of various switching techniques. Next, the characteristics of the wormhole routing mechanism are described in detail along with the theory behind deadlock-free routing. The performance of routing algorithms depends on the selection of the path between the source and the destination, the network traffic, and the router design. The routing algorithms are implemented in the router chips. We outline the router characteristics and describe the functionality of various elements of the router. Depending on the usage of paths between the source and the destination, routing algorithms are classified as deterministic, fully adaptive, and partially adaptive. We discuss several representative algorithms for all these categories. The algorithms within each category vary in terms of resource requirements and performance under various traffic conditions. The main difference among various adaptive routing schemes is the technique used to avoid deadlocks. We also discuss a few algorithms based on deadlock recovery techniques. Along with performance, fault tolerance is essential for message routing in multicomputers, and we thus discuss several fault-tolerant wormhole routing algorithms along with their fault-handling capabilities. These routing schemes enable a message to reach its destination even in the presence of faults in the network. The implementation details of wormhole routing algorithms in contemporary commercial systems are also discussed. We conclude by itemizing several future directions and open issues.

Categories and Subject Descriptors: B.4.3 [Input/Output and Data Communications]: Interconnections (Subsystems)--topology; C.1.4 [Computer Systems Organization]: Parallel Architectures; C.2.1 [Computer-Communication Networks]: Network Architecture and Design; C.2.2 [Computer-Communication Networks]: Network Protocols--routing protocols; C.2.6 [Computer-Communication Networks]: Internetworking--routers

General Terms: Algorithms, Design, Performance, Reliability

Additional Key Words and Phrases: Deadlock avoidance and recovery, directly connected multicomputers, fault-tolerance, network topology, router design, switching techniques, virtual channels, wormhole routing algorithms

1. INTRODUCTION

Large-scale parallel computers are potential candidates for providing very high computational power. These systems are usually organized as an ensemble of nodes, each with its own processor, local memory, and other supporting devices. The nodes are interconnected using a variety of topologies that can be classified into two broad categories: direct and indirect. In direct networks, each node has a point-to-point or direct connection to some of the other nodes, called neighboring nodes; examples of direct network topologies include hypercube, mesh, and tree. In indirect networks, the nodes are connected to other nodes or a shared memory through one or more switching elements. Examples of indirect networks include crossbar, bus, and multistage interconnection networks.

Direct networks have emerged as a popular architecture for massively parallel computers because of their scalability. The total communication bandwidth, memory bandwidth, and processing capability of the system increase with the number of nodes. Examples of experimental and commercial systems based on the direct interconnection network include Intel's iPSC, Touchstone Delta [Intel 1990] and Paragon [Intel 1991], Ncube-2/3 [NCUBE Company 1990], Cray T3D [Kessler and Schwarzmeier 1993; Scott and Thorson 1994], MIT J-Machine [Noakes et al. 1993], and Stanford DASH [Lenoski et al. 1992]. The nodes of a direct-network-based multicomputer communicate by passing messages through an interconnection network. Neighboring nodes send messages to one another directly, whereas nodes that are not connected directly communicate with each other by passing messages through intermediate nodes. Support hardware is essential to handle the transmission of messages between nodes. In most systems, a router is associated with each node to handle communication-related tasks. Dedicated routers are also used to allow overlapping of computation and communication within each node.

Figure 1 shows the architecture of a generic node consisting of a processor, a local memory, a router, interconnects, and other functional units such as I/O devices. The router has internal channels that connect it to the processor, local memory, or other functional units. The input internal channels are used to absorb messages destined for the host processor. The output internal channels are used by the host processor to send outgoing messages to remote nodes. Some systems use multiple internal channels to avoid communication bottle-necks between the local processor or memory and the router. The multiple internal channels can have either all-port or k-port architecture. In the all-port architecture, every external channel has a corresponding internal channel, thus allowing the node to send and receive on all external channels simultaneously. A k-port architecture has k internal channels, where k is less than the total number of external channels. The internal channels in a k-port router are multiplexed by the external channels, which are used for messages in transit. Usually a crossbar switch is used in the router to connect the input external channels to the output external channels. The control unit is responsible for flow control of the messages traversing the router.

[Figure 1 ILLUSTRATION OMITTED]

In direct-network-based multicomputers, a task is allocated to a group of nodes that communicate for successful execution of the task. The speed of execution depends on the processor as well as on the communication performance. The latency incurred by a message traversing from a source node to a destination node affects the overall performance of the multicomputer system. Because of the interprocessor interactions, the communication latency also affects the granularity of parallelism that can be exploited in the system. Thus it is essential to devise techniques that reduce the communication latency incurred in direct networks.

The communication latency is the most important performance metric in direct networks. It comprises start-up latency, network latency, and the blocking time [Ni and McKinley 1993]. The start-up latency is the time required for the system to handle the message at the source and destination nodes and depends primarily on the design of the interface between the local processors and routers. Network latency, defined as the time spent by a message in the network, is computed as the time between the instant when the message head is injected into the network by the source and the instant when the message tail is absorbed by the destination node. Both start-up and network latencies are fixed for a given network. The blocking time of a message is the time spent waiting for a channel currently being used by another message. Thus the blocking time depends on the resource contentions that a message encounters in its path. Blocking time cannot be determined statically, as it depends on the network traffic distribution and the path taken by a message.

The communication latency of direct networks depends on several factors including switching, routing, flow control, and topology. Several switching techniques have been proposed for direct networks. Wormhole switching has emerged as a popular technique and has been used in both commercial and experimental systems. Wormhole switching can be employed in both direct and indirect networks. It is widely used in contemporary multicomputers because of its low latency and requirement of small buffers at the nodes. Theories have been developed for designing cost-effective, efficient, deadlock-free, and livelock-free wormhole routing algorithms. Based on these theories, several deterministic and adaptive routing algorithms have been proposed in the literature. In this article, we survey different techniques of wormhole routing along with the theory behind the design of deadlock-free algorithms. Complementary techniques for deadlock recovery are also described. In addition, we review fault-tolerant wormhole routing schemes that can route messages in the presence of faults. Details of wormhole routing schemes implemented in several commercial systems are also included.

A preliminary survey on wormhole routing was given by Ni and McKinley [1993]. Since then, several advances have been made in wormhole-routed networks. Furthermore, the present report discusses several issues not covered by the earlier survey report, such as fault-tolerant routing, deadlock recovery techniques, router designs, and implementation in commercial systems. Topics not discussed in this survey include collective communication and routing in indirect networks. Collective communication could itself be the subject of a survey; indeed, one such work is by McKinley et al. [1995]. We focus primarily on direct network topologies such as meshes and k-ary n-cubes because of their widespread adoption in commercial systems. However, we also discuss the implementation of wormhole routing in some recent commercial systems (CM-5 and IBM SP1/SP2) that are based on indirect networks.

The rest of the survey is organized as follows. The properties of direct network topologies are outlined in Section 2. Section 3 discusses various switching techniques along with wormhole switching, which forms the basis of wormhole routing. Virtual channels, flow control mechanisms, and router characteristics are also described in this section. The classification of wormhole routing algorithms and deadlock-free routing theory are presented in Section 4. Section 5 discusses various deterministic wormhole routing algorithms, followed by a discussion of adaptive routing algorithms in Section 6 and of fault-tolerant routing algorithms in Section 7. In Section 8, we discuss the implementation of wormhole routing algorithms in commercial parallel computers, and give concluding remarks and a discussion of open issues in Section 9.

2. DIRECT NETWORK TOPOLOGIES

The topology of a network defines how the nodes are interconnected and is generally modeled as a graph in which the vertices represent the nodes and the edges denote the channels. Multidimensional meshes and k-ary n-cubes, the basic topologies used in most parallel computers, are defined as follows [Ni and McKinley 1993].

Definition 1. An n-dimensional mesh is an interconnection structure that has [k.sub.0] x [k.sub.1] x ... x [k.sub.n - 1] nodes, where [k.sub.i] denotes the number of nodes in the ith dimension. Each node in the mesh is identified by an n-coordinate vector ([x.sub.0], [x.sub.1], ..., [x.sub.n-1]), where 0 [is less than or equal to] [x.sub.i] [is less than or equal to] [k.sub.i] - 1. Two nodes ([x.sub.0], [x.sub.1], ..., [x.sub.n] - 1) and ([y.sub.0], [y.sub.1], ..., [y.sub.n - 1]) are connected if and only if there exists an i such that [x.sub.i] = [y.sub.i] [+ or -] 1, and [x.sub.j] = [y.sub.j] for all j [not equal] i. Thus the number of neighbors of a node ranges from n to 2n, depending on its location in the mesh.

Definition 2. A k-ary n-cube is defined as an interconnection structure of n dimensions having k nodes in each dimension. Each node in the k-ary n-cube is identified by an n-coordinate vector ([x.sub.0], [x.sub.1], ..., [x.sub.n - 1]), where 0 [is less than or equal to] [x.sub.i] [is less than or equal to] k - 1. Two nodes ([x.sub.0], [x.sub.1], ..., [x.sub.n - 1]) and ([y.sub.0], [y.sub.1], ..., [y.sub.n - 1]) are connected if and only if there exists an i such that [x.sub.i] = ([y.sub.i] [+ or -] 1) mod k, and [x.sub.j] = [y.sub.j] for all j [not equal to] i. There are wraparound channels in the k-ary n-cubes (specified by the use of modulus in the definition), which are not present in n-dimensional meshes. If k = 2, then every node has n neighbors, one in each dimension. If k [is greater than] 2, then every node has 2n neighbors, two in each dimension.

The hypercube and torus are two other popular topologies for direct networks. Hypercubes are special cases of an n-dimensional mesh in which [k.sub.i] = 2, for all i, 0 [is less than or equal to] i [is less than or equal to] n - 1; they can be termed 2-ary n-cubes. A k-ary n-cube is called a torus when n = 2. Figure 2 shows a three-dimensional (3-D) hypercube and a two-dimensional (2-D) mesh. A torus can be constructed by connecting the corresponding end nodes of the 2-D mesh with wraparound connections.

[Figure 2 ILLUSTRATION OMITTED]

Several issues are associated wit h the mesh, torus, and hypercube topologies. The mesh is an asymmetrical topology in which the node degree depends on its location. Interprocessor communication performance depends on the location of source and destination. The channels near the center of the mesh experience higher traffic density than those on the periphery. The torus and hypercube are symmetrical topologies in which the degree of a node is the same irrespective of its location in the network. Thus, unlike the mesh, all the nodes in tori and hypercubes are identical in connectivity. The network diameter of a mesh is greater than that of the torus, which in turn has a greater diameter than the hypercube for the same number of nodes.

The bisection width of a network is defined as the number of channels that must be removed to partition the network into two equal subnetworks. The bisection width has a significant effect on the interprocessor communication performance [Dally 1990]. The bisection width (BW) of a [2.sup.n] x [2.sup.n] 2-D mesh, [2.sup.n] x 2n 2-D torus, and a 2n-cube hypercube are [BW.sub.mesh] = [2.sup.n], [BW.sub.torus] = [2.sup.n+1], [BW.sub.hypercube] = [2.sup.2n-1], respectively. The bisection density, which is the product of the bisection width and the channel width, can be used as a measure of the network cost [Ni and McKinley 1993]. For the same cost, the 2-D mesh can support wider channels than the 2-D torus, which in turn can support wider channels than the hypercube [Ni and McKinley 1993]. Thus the channel bandwidth of the three topologies can be expressed as: mesh [is greater than] torus [is greater than] hypercube.

In general, low-dimensional meshes are preferred because they have low fixed-node degrees and fixed-length channel wires, which make them more scalable than high-dimensional meshes and k-ary n-cubes. Low-dimensional meshes also have higher channel bandwidth per bisection density and have lower contention and blocking latencies, which results in lower communication latencies and higher hotspot throughput [Dally 1990]. Furthermore, two or three topological dimensions are easier to implement in the three physical dimensions. On the other hand, high-dimensional meshes and k-ary n-cubes have lower diameters, which shortens the path lengths. High-dimensional topologies also have more paths between pairs of nodes, which permits more adaptivity and fault tolerance.

A class of shuffle networks known as de Bruijn (dB) graphs have become popular recently. They are suitable for large networks and can be defined for any number of nodes, including prime numbers [Samatham and Pradhan 1989]. For a specific node degree, dB networks, in most cases, have the smallest diameter compared to the contemporary network topologies. Formally, a unidirectional dB network can be defined as follows [Samatham and Pradhan 1989].

Definition 3. An r-radix unidirectional de Bruijn digraph dBD(r, [r.sup.m]) has the total number of nodes N = [r.sup.m] and the address of a node X is represented as([x.sub.m - 1], [x.sub.m - 2],..,[x.sub.0]) where [x.sub.i] [element of] {0, 1,..., (r - 1)} for 0 [is less than or equal to] i [is less than or equal to] m - 1. Its neighboring nodes are ([x.sub.m-2], [x.sub.m-3], ..., [Alpha]), where [Alpha] = 0, 1,..., r - 1.

Several other topologies based on Caley graphs have been also proposed [Akers and Krishnamurthy 1989]. However, here we focus primarily on k-ary n-cubes and multidimensional meshes. Wormhole routing techniques for dB networks and other topologies based on the Caley graphs are reported in Boppana and Chalasani [1995] and Park and Agrawal [1995].

3. WORMHOLE SWITCHING

Nodes in a direct network communicate by passing messages from one node to another. A message may be divided into one or more equal or variable-size packets. A packet is the smallest unit of information that contains routing and sequencing information. In this section, we discuss various switching techniques used in or proposed for multicomputer systems.

3.1 Switching Techniques

In most multicomputer systems, a message enters the network from a source node and is switched or routed towards its destination through a series of intermediate nodes. Four types of switching techniques are usually used for this purpose: circuit switching, packet switching, virtual cut-through switching, and wormhole switching.

In circuit switching, a dedicated path is established between source and destination before the data transfer initiates. …

Related articles from newspapers, magazines, journals, and more
©2013 Gale, a part of Cengage Learning. All rights reserved. Contact us | Privacy policy | Terms and conditions

The AccessMyLibrary advertising network includes: womensforum.com GlamFamily