Publisher
Florida Atlantic University
Description
Mesh-connected multicomputers are one of the simplest and least expensive structures to build a system using hundreds and even thousands of processors. The nodes communicate with each other by sending and receiving messages. As the system gets larger and larger, it not only requires the routing algorithms be efficient but also fault-tolerant. The fault model we use in 2-D meshes is a faulty block while in 3-D meshes, the fault model is a faculty cube. In order to route messages through feasible minimum paths, the extended safety level is used to determine the existence of a minimal path and faulty block (cube) information is used to guide the routing. This dissertation presents an in-depth study of fault-tolerant minimal routing in 2-D tori, 3-D meshes, and tree-based fault-tolerant multicasting in 2-D and 3-D meshes using extended safety levels. Also path-based fault-tolerant deadlock-free multicasting in 2-D and 3-D meshes is studied. In fault-tolerant minimal routing in 2-D meshes, if no faulty block is encountered, any adaptive minimal routing can be used until the message encounters a faulty block. The next step is guided by the faulty block information until the message gets away from the faulty block. After that, any minimal adaptive routing can be used again. The minimal routing in 2-D tori is similar to that in 2-D meshes if at the beginning of the routing a conversion is made from a 2-D torus to a 2-D mesh. The fault-tolerant minimal routing in 3-D meshes can be done in a similar way. In the tree-based multicasting in 2-D and 3-D meshes, a time-step optimal and traffic-step suboptimal algorithm is proposed. Several heuristic strategies are presented to resolve a conflict, which are compared by simulations. A path-based fault-tolerant deadlock-free multicast algorithm in 2-D meshes with inter-block distance of at least three is presented to solve the deadlock problem in tree-based multicast algorithms. The approach is then extended to 3-D meshes and to inter-block distance of at least two in 2-D meshes. The path is Hamiltonian that is only updated locally in the neighborhood of a faulty block when a faulty block is encountered. Two virtual channels are used to prevent deadlock in 2-D and 3-D meshes with inter-block (inter-cube) distance of at least three and two more virtual channels are added if the inter-block distance is at least two.