Parallel processing (Electronic computers)

Model
Digital Document
Publisher
Florida Atlantic University
Description
Parallel/distributed systems offer a tremendous processing capacity. However, in order to take full advantage of it, good load distributions are needed. We study the task graph partition problem for a given parallel/distributed application which is modeled using data parallelism and is implemented in a transputer system in a mesh construction. Our approach uses domain partition to separate a given domain into a set of subdomains of equal size and each of which has at most four neighbors. We devise three methods to partition a given domain, these methods are compared based on several criteria. The impact of the number of processors used in implementation is also investigated based on several parameters, including processor speed, communication speed, and amount of computation and communication per data point. We discuss implementation of our approach in the application based on the existing features of the transputer system, and compare different versions of application through running a simulation system.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Multi-agent control is a very promising area of robotics. In applications for which it is difficult or impossible for humans to intervene, the utilization of multi-agent, autonomous robot groups is indispensable. This thesis presents a novel approach to reactive multi-agent control that is practical and elegant in its simplicity. The basic idea upon which this approach is based is that a group of robots can cooperate to determine the shortest path through a previously unmapped environment by virtue of redundant sharing of simple data between multiple agents. The idea was implemented with two robots. In simulation, it was tested with over sixty agents. The results clearly show that the shortest path through various environments emerges as a result of redundant sharing of information between agents. In addition, this approach exhibits safeguarding techniques that reduce the risk to robot agents working in unknown and possibly hazardous environments. Further, the simplicity of this approach makes implementation very practical and easily expandable to reliably control a group comprised of many agents.
Model
Digital Document
Publisher
Florida Atlantic University
Description
We propose a new minimum total communication distance (TCD) algorithm and an optimal TCD algorithm for broadcast in a 2-dimensional mesh (2-D mesh). The former generates a minimum TCD from a given source node, and the latter guarantees a minimum TCD among all the possible source nodes. These algorithms are based on a divide-and-conquer approach where a 2-D mesh is partitioned into four submeshes of equal size. The source node sends the broadcast message to a special node called an eye in each submesh. The above procedure is then recursively applied in each submesh. These algorithms are extended to a 3-dimensional mesh (3-D mesh), and are generalized to a d-dimensional mesh or torus. In addition, the proposed approach can potentially be used to solve optimization problems in other collective communication operations.
Model
Digital Document
Publisher
Florida Atlantic University
Description
To improve the performance of parallel/distributed systems, we propose four parallel load balance algorithms. The new partition algorithm achieves load balance among processors via domain partition. If we assume the problem domain is evenly load distributed, this algorithm will divide the whole domain into a required number of subdomains with the same area. If a problem domain has a dynamic load distribution, although the new partition algorithm is still suitable for the initial mapping, we propose three dynamic load balance algorithms. These dynamic algorithms achieve load balance among processors by transferring load among processors. We applied the new partition algorithm to a specific domain and compared the method to some existing partition algorithms. We also simulated three dynamic load balance algorithms. Results of comparisons and simulations show that all the four algorithms have satisfactory performance.
Model
Digital Document
Publisher
Florida Atlantic University
Description
We study the embedding of binomial trees with variable roots in faulty hypercubes. Based on novel embedding strategies, we propose three embedding algorithms with variable nodes as the root. The first algorithm can tolerate up to n - 1 faulty links, but the execution can be done within log2(n - 1) subcube splits. The second one can tolerate up to [(3(n - 1))\2] faulty links. The last one can tolerate up to [(3(n - 4))\2] faulty nodes.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Many image processing applications can exploit high degree of data locality and parallelism to improve performance. Customized massively parallel computing hardware provides expensive but the most efficient solution as far as performance is concerned. Now-a-days high bandwidth network with high performance workstations are becoming economical and are widely available. Hence, better performance-to-cost ratio can be achieved by implementing message passing paradigm of parallel processing. Parallel edge detector was developed using message passing interfaces on cluster of workstations. This thesis discussed performance evaluation in terms of execution time, speedup, and efficiency for tasks with different computational intensities. Effects of sequential I/O component and load balancing are also discussed.
Model
Digital Document
Publisher
Florida Atlantic University
Description
The objective of the work is to verify the feasibility of converting a large FEA code into a massively parallel FEA code in terms of computational speed and cost. Sequential subroutines in the Research EPIC hydro code, a Lagrangian finite element analysis code for high velocity elastic-plastic impact problems, are individually converted into parallel code using Cray Adaptive Fortran (CRAFT). The performance of massively parallel subroutines running on 32 PEs on Cray-T3D is faster than their sequential counterparts on Cray-YMP. At next stage of the research, Parallel Virtual Machine (PVM) directives is used to develop a PVM version of the EPIC hydro code by connecting the converted parallel subroutines running on multiple PEs of T3D to the sequential part of the code running on single PE. With an incremental increase in the massively parallel subroutines into the PVM EPIC hydro code, the performance with respect to speedup of the code increased accordingly. The results indicate that significant speedup can be achieved in the EPIC hydro code when most or all of the subroutines are massively parallelized.
Model
Digital Document
Publisher
Florida Atlantic University
Description
In many scientific and signal processing applications, there are increasing demands for large volume and high speed computations, which call for not only high-speed low power computing hardware, but also for novel approaches in developing new algorithms and architectures. This thesis is concerned with the development of such architectures and algorithms suitable for the VLSI implementation of recursive and nonrecursive 1-dimension digital filters using multiple slower processing elements. As the background for the development, vectorization techniques such as state-space modeling, block processing, and look ahead computation are introduced. Concurrent architectures such as systolic arrays, wavefront arrays and appropriate parallel filter realizations such as lattice, all-pass, and wave filters are reviewed. A fully hardware efficient systolic array architecture termed as Multiplexed Block-State Filter is proposed for the high speed implementation of lattice and direct realizations of digital filters. The thesis also proposes a new simplified algorithm, Alternate Pole Pairing Algorithm, for realizing an odd order recursive filter as the sum of two all-pass filters. Performance of the proposed schemes are verified through numerical examples and simulation results.
Model
Digital Document
Publisher
Florida Atlantic University
Description
The rotorcraft trim solution involves a search for control inputs for
required flight conditions as well as for corresponding initial conditions for
periodic response or orbit. The control inputs are specified indirectly to
satisfy flight conditions of prescribed thrust levels, rolling and pitching
moments etc. In addition to the nonlinearity of the equations of motion and
control inputs, the control inputs appear not only in damping and stiffness
matrices but also in the forcing-function or input matrix; they must be found
concomitantly with the periodic response from external constraints on the
flight conditions. The Floquet Transition Matrix (FTM) is generated for
perturbations about that periodic response; usually, a byproduct of the trim
analysis. The damping levels or stability margins are computed from an
eigenanalysis of the FTM. The Floquet analysis comprises the trim analysis
and eigenanalysis and is routinely used for small order systems (order N <
100). However, it is practical for neither design applications nor
comprehensive analysis models that lead to large systems (N > 100); the execution time on a sequential computer is prohibitive. The trim analysis
takes the bulk of this execution time.
Accordingly, this thesis develops concepts and methods of parallelism
toward Floquet analysis of large systems with computational reliability
comparable to that of sequential computations. A parallel shooting scheme
with damped Newton iteration is developed for the trim analysis. The scheme
uses parallel algorithms of Runge-Kutta integration and linear equations
solution. A parallel QR algorithm is used for the eigenanalysis of the FTM.
Additional parallelism in each iteration cycle is achieved by concurrent
operations such as perturbations of initial conditions and control inputs,
follow-up integrations and formations of the columns of the Jacobian matrix.
These parallel shooting and eigenanalysis schemes are applied to the
nonlinear flap-lag stability with a three-dimensional dynamic wake (N ~
150). The stability also is investigated by widely used sequential schemes of
shooting with damped Newton iteration and QR eigenanalysis. The
computational reliability is quantified by the maximum condition number of
the Jacobian matrices in the Newton iteration, the eigenvalue condition
numbers and the residual errors of the eigenpairs. The saving in computer
time is quantified by the speedup, which is the ratio of the execution times of
Floquet analysis by sequential and parallel schemes. The work is carried out
on massively parallel MasPar MP-1, a distributed-memory, single-instruction
multiple-data or SIMD computer. A major finding is that with increasing
system order, while the parallel execution time remains nearly constant, the
sequential execution time increases nearly cubically with N. Thus,
parallelism promises to make large-scale Floquet analysis practical.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Fault simulators can be used for various purposes, such as the determination of the Fault coverage, the Automatic test pattern generation and the preparation of the Fault dictionaries. As the size of the digital circuits increases, the number of gates present increases and the time taken for fault simulation also increases. In order to reduce the fault simulation time, massively parallel computers are being used. We have developed a fault simulator on MASPAR, a massively parallel Single Instruction Multiple Data machine, based on the principles of parallel pattern parallel fault simulation. In order to eliminate the limitation of limited memory on MASPAR, we have designed an algorithm which reduces the amount of memory required for storing the circuit. We have implemented these algorithms in two different ways. These algorithms were tested on ISCAS85 benchmark circuits. The results have shown an improvement over other parallel algorithms.