Parallel computers

Model
Digital Document
Publisher
Florida Atlantic University
Description
The development of a parallel data structure and an associated elemental decomposition algorithm for explicit finite element analysis for massively parallel SIMD computer, the DECmpp 12000 (MasPar MP-1) machine, is presented, and then extended to implementation on the MIMD computer, Cray-T3D. The new parallel data structure and elemental decomposition algorithm are discussed in detail and is used to parallelize a sequential Fortran code that deals with the application of isoparametric elements for the nonlinear dynamic analysis of shells of revolution. The parallel algorithm required the development of a new procedure, called an 'exchange', which consists of an exchange of nodal forces at each time step to replace the standard gather-assembly operations in sequential code. In addition, the data was reconfigured so that all nodal variables associated with an element are stored in a processor along with other element data. The architectural and Fortran programming language features of the MasPar MP-1 and Cray-T3D computers which are pertinent to finite element computations are also summarized, and sample code segments are provided to illustrate programming in a data parallel environment. The governing equations, the finite element discretization and a comparison between their implementation on Von Neumann and SIMD-MIMD parallel computers are discussed to demonstrate their applicability and the important differences in the new algorithm. Various large scale transient problems are solved using the parallel data structure and elemental decomposition algorithm and measured performances are presented and analyzed in detail. Results show that Cray-T3D is a very promising parallel computer for finite element computation. The 32 processors of this machine shows an overall speedup of 27-28, i.e. an efficiency of 85% or more and 128 processors shows a speedup of 70-77, i.e. an efficiency of 55% or more. The Cray-T3D results demonstrated that this machine is capable of outperforming the Cray-YMP by a factor of about 10 for finite element problems with 4K elements, therefore, the method of developing the parallel data structure and its associated elemental decomposition algorithm is recommended for implementation on other finite element code in this machine. However, the results from MasPar MP-1 show that this new algorithm for explicit finite element computations do not produce very efficient parallel code on this computer and therefore, the new data structure is not recommended for further use on this MasPar machine.
Model
Digital Document
Publisher
Florida Atlantic University
Description
In this dissertation, we propose and analyze a cluster-based hypercube architecture in which each node of the hypercube is furnished with a cluster of n processors connected through a small crossbar switch with n memory modules. Topological analysis of the cluster-based hypercube architecture shows that it reduces the complexity of the basic hypercube architecture by reducing the diameter, the degree of a node and the number of links in the hypercube. The proposed architecture uses the higher processing power furnished by the cluster of execution processors in each node to address the needs of computation-intensive parallel application programs. It provides a smaller dimension hypercube with the same number of execution processors as a higher dimension conventional hypercube architecture. This scheme can be extended to meshes and other architectures. Mathematical analysis of the parallel simplex and parallel Gaussian elimination algorithms executing on the cluster-based hypercube show the order of complexity of executing an n x n matrix problem on the cluster-based hypercube using parallel simplex algorithm to be O(n^2) and that of the parallel Gaussian elimination algorithm to be O(n^3). The timing analysis derived from the mathematical analysis results indicate that for the same number of processors in the cluster-based hypercube system as the conventional hypercube system, the computation to communication ratio of the cluster-based hypercube executing a matrix problem by parallel simplex algorithm increases when the number of nodes of the cluster-based hypercube is decreased. Self-driven simulations were developed to run parallel simplex and parallel Gaussian elimination algorithms on the proposed cluster-based hypercube architecture and on the Intel Personal Supercomputer (iPSC/860), which is a conventional hypercube. The simulation results show a response time performance improvement of up to 30% in favor of the cluster-based hypercube. We also observe that for increased link delays, the performance gap increases significantly in favor of the cluster-based hypercube architecture when both the cluster-based hypercube and the Intel iPSC/860, a conventional hypercube, execute the same parallel simplex and Gaussian elimination algorithms.