Computer architecture

Model
Digital Document
Publisher
Florida Atlantic University
Description
As a new trend in designing a computer architecture, Reduced
Instruction set Computers(RISC) have been proposed recently.
This thesis reviews the new design approach behind the RISC and
discuss the controversy between the proponents of the RISC
approach and those of the traditional Complex Instruction set
COmputer(CISC) approach. Ridge 32 is selected as a case study of
the RISCs. Architectural parameters to evaluate the computer
performance are considered to analyze the performance of the Ridge
32. A simulator for the Ridge 32 was implemented in PASCAL as a
way of measuring those parameters. Measurement results on the
several selected benchmark programs are given and analyzed to
evaluate the characteristics of the Ridge 32.
Model
Digital Document
Publisher
Florida Atlantic University
Description
The concept of a Reduced Instruction Set Computer
(RISC) has evolved out of a desire to enhance the performance
of a computer. We present here a detailed design of a
Testable Reduced Instruction Set Computer (TRISC) that utilizes
a Multiple Register Set. Level Sensitive Scan Design
(LSSD) is used to incorporate testability into our design.
We first evolved a functional description of the design
using Digital Design Language (DDL) a hardware programming
language. We then entered the schematic of the design into
Daisy's Logician V, a CAD/CAE workstation, using NCR CMOSII
Digital Standard Cell Library. We then performed a unit
delay simulation on the hierarchical design database to
ascertain the logical functioning of the system.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Current multicore processors attempt to optimize consumer experience via task partitioning and concurrent execution of these (sub)tasks on the cores. Conversion of sequential code to parallel and concurrent code is neither easy, nor feasible with current methodologies. We have developed a mapping process that synergistically uses top-down and bottom-up methodologies. This process is amenable to automation. We use bottom-up analysis to determine decomposability and estimate computation and communication metrics. The outcome is a set of proposals for software decomposition. We then build abstract concurrent models that map these decomposed (abstract) software modules onto candidate multicore architectures; this resolves concurrency issues. We then perform a system level simulation to estimate concurrency gain and/or cost, and QOS (Qualify-of-Service) metrics. Different architectural combinations yield different QOS metrics; the requisite system architecture may then be chosen. We applied this 'middle-out' methodology to optimally map a digital camera application onto a processor with four cores.
Model
Digital Document
Publisher
Florida Atlantic University
Description
One of the major components of any pervasive system is its proactive behavior. Various models have been developed to provide system wide changes which would enable proactive behavior. A major drawback of these approaches is that they do not address the need to make use of existing applications whose design cannot be changed. To overcome this drawback, a middleware architecture called "Concord" is proposed. Concord is based on a simple model which consists of Lookup Server and Database. The rewards for this simple model are many. First, Concord uses the existing computing infrastructure. Second, Concord standardizes the interfaces for all services and platforms. Third new services can be added dynamically without any need for reconfiguration. Finally, Concord consists of Database that can maintain and publish the active set of available resources. Thus Concord provides a solid system for integration of various entities to provide seamless connectivity and enable proactive behavior.
Model
Digital Document
Publisher
Florida Atlantic University
Description
In this dissertation, we propose and analyze a cluster-based hypercube architecture in which each node of the hypercube is furnished with a cluster of n processors connected through a small crossbar switch with n memory modules. Topological analysis of the cluster-based hypercube architecture shows that it reduces the complexity of the basic hypercube architecture by reducing the diameter, the degree of a node and the number of links in the hypercube. The proposed architecture uses the higher processing power furnished by the cluster of execution processors in each node to address the needs of computation-intensive parallel application programs. It provides a smaller dimension hypercube with the same number of execution processors as a higher dimension conventional hypercube architecture. This scheme can be extended to meshes and other architectures. Mathematical analysis of the parallel simplex and parallel Gaussian elimination algorithms executing on the cluster-based hypercube show the order of complexity of executing an n x n matrix problem on the cluster-based hypercube using parallel simplex algorithm to be O(n^2) and that of the parallel Gaussian elimination algorithm to be O(n^3). The timing analysis derived from the mathematical analysis results indicate that for the same number of processors in the cluster-based hypercube system as the conventional hypercube system, the computation to communication ratio of the cluster-based hypercube executing a matrix problem by parallel simplex algorithm increases when the number of nodes of the cluster-based hypercube is decreased. Self-driven simulations were developed to run parallel simplex and parallel Gaussian elimination algorithms on the proposed cluster-based hypercube architecture and on the Intel Personal Supercomputer (iPSC/860), which is a conventional hypercube. The simulation results show a response time performance improvement of up to 30% in favor of the cluster-based hypercube. We also observe that for increased link delays, the performance gap increases significantly in favor of the cluster-based hypercube architecture when both the cluster-based hypercube and the Intel iPSC/860, a conventional hypercube, execute the same parallel simplex and Gaussian elimination algorithms.
Model
Digital Document
Publisher
Florida Atlantic University
Description
The growing demand for high availability of computer systems has led to a wide application range of fault-tolerant systems. In some real-time applications ultrareliable computer systems are required. Such computer systems should be capable of tolerating failures of not only their hardware components but also of their software components. This dissertation discusses three aspects of designing an ultrareliable system: (a) a hierarchical ultrareliable system structure; (b) a set of unified methods to tolerate both software and hardware faults in combination; and (c) formal specifications in the system structure. The proposed hierarchical structure has four layers: Application, Software Fault Tolerance, Combined Fault Tolerance and Configuration. The Application Layer defines the structure of the application software in terms of the modular structure using a module interconnection language. The failure semantics of the service provided by the system is also defined at this layer. At the Software Fault Tolerance Layer each module can use software fault tolerance methods. The implementation of the software and hardware fault tolerance is achieved at the Combined Fault Tolerance Layer which utilizes the combined software/hardware fault tolerance methods. The Configuration Layer performs actual software and hardware resource management for the requests of fault identification and recovery from the Combined Fault Tolerance Layer. A combined software and hardware fault model is used as the system fault model. This model uses the concepts of fault pattern and fault set to abstract the various occurrences of software and hardware faults. We also discuss extended comparison models that consider faulty software as well. The combined software/hardware fault tolerance methods are based on recovery blocks, N-version programming, extended comparison methods and both forward and backward recovery methods. Formal specifications and verifications are used in the system design process and the system structure to show that the design and implementation of a fault-tolerant system satisfy the functional and non-functional requirements. Brief discussions and examples of using formal specifications in the hierarchical structure are given.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Multistage interconnection networks (MINs) have become an important subset of the interconnection networks which are used to communicate between processors and memory modules for large scale multiprocessor systems. Unfortunately, unique path MINs lack fault tolerance. In this dissertation, a novel scheme for constructing fault-tolerant MINs is presented. We first partition the given MINs into even sized partitions and show some fault-tolerant properties of the partitioned MINs. Using three stages of multiplexers/demultiplexers, an augmenting scheme which takes advantage of locality in program execution is then proposed to further improve the fault-tolerant ability and performance of the partitioned MINs. The topological characteristics of augmented partitioned multistage interconnection networks (APMINs) are analyzed. Based on switch fault model, simulations have been carried out to evaluate the full access and dynamic full access capabilities of APMINs. The results show that the proposed scheme significantly improves the fault-tolerant capability of MINs. Cost effectiveness of this new scheme in terms of cost, full access, dynamic full access, locality, and average path length has also been evaluated. It has been shown that this new scheme is more cost effective for high switch failure rate and/or large size networks. Analytical modeling techniques have been developed to evaluate the performance of AP-Omega network and AP-Omega network-based multiprocessor systems. The performance of Omega, modified Omega, and AP-Omega networks in terms of processor utilization and processor waiting time have been compared and the results show that the new scheme indeed, improves the performance both in network level and in system level. Finally, based on the reliability of serial/parallel network components, models for evaluating the terminal reliability and the network reliability of AP-Omega network using upper and lower bound measures have also been proposed and the results show that applying locality improve APMINs' reliability.
Model
Digital Document
Publisher
Florida Atlantic University
Description
This research is aimed towards the concept of a new switching node architecture for cell-switched Asynchronous Transfer Mode (ATM) networks. The proposed architecture has several distinguishing features when compared with existing Banyan based switching node. It has a cylindrical structure as opposed to a flat structure as found in Banyans. The wrap around property results in better link utilization as compared with existing Banyans beside resulting in reduced average route length. Simplified digit controlled routing is maintained as found in Banyans. The cylindrical nature of the architecture, results in pipeline activity. Such architecture tends to sort the traffic to a higher address, eliminating the need of a preprocessing node as a front end processing node. Approximate Markov chain analyses for the performance of the switching node with single input buffers is presented. The analyses are used to compute the time delay distribution of a cell leaving the node. A simulation tool is used to validate the analytical model. The simulation model is free from the critical assumptions which are necessary to develop the analytical model. It is shown that the analytical results closely match with the simulation results. This confirms the authenticity of the simulation model. We then study the performance of the switching node for various input buffer sizes. Low throughput with single input buffered switching node is observed; however, as the buffer size is increased from two to three the increase in throughput is more than 100%. No appreciable increase in node delay is noted when the buffer size is increased from two to three. We conclude that the optimum buffer size for large throughput is three and the maximum throughput with offered load of 0.9 and buffer size three is 0.75. This is because of head of line blocking phenomenon. A technique to overcome such inherent problem is presented. Several delays which a cell faces are analyzed and summarized below. The wait delay with buffer sizes one and two is high. However, the wait delay is negligible when the buffer size is increased beyond two. This is because increasing the buffer size reduces the head of line blocking. Thus more cells can move forward. Node delay and switched delay are comparable when the buffer size is greater than two. The delay offered is within a threshold range as noted for real time traffic. The delay is clock rate dependent and can be minimized by running the switching node at a higher clock speed. The worst delay noted for a switched cell for a node operating at a clock rate of 200 Mhz is 0.5 usec.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Recently, Artificial Neural Network (ANN) computing systems have become one of the most active and challenging areas of information processing. The successes of experimental neural computing systems in the fields of pattern recognition, process control, robotics, signal processing, expert system, and functional analysis are most promising. However due to a number of serious problems, only small size fully connected neural networks have been implemented to run in real-time. The primary problem is that the execution time of neural networks increases exponentially as the neural network's size increases. This is because of the exponential increase in the number of multiplications and interconnections which makes it extremely difficult to implement medium or large scale ANNs in hardware. The Modular Grouped Weight Quantization (MGWQ) presented in this dissertation is an ANN design which assures that the number of multiplications and interconnections increase linearly as the neural network's size increases. The secondary problems are related to scale-up capability, modularity, memory requirements, flexibility, performance, fault tolerance, technological feasibility, and cost. The MGWQ architecture also resolves these problems. In this dissertation, neural network characteristics and existing implementations using different technologies are described. Their shortcomings and problems are addressed, and solutions to these problems using the MGWQ approach are illustrated. The theoretical and experimental justifications for MGWQ are presented. Performance calculations for the MGWQ architecture are given. The mappings of the most popular neural network models to the proposed architecture are demonstrated. System level architecture considerations are discussed. The proposed ANN computing system is a flexible and a realistic way to implement large fully connected networks. It offers very high performance using currently available technology. The performance of ANNs is measured in terms of interconnections per second (IC/S); the performance of the proposed system changes between 10^11 to 10^14 IC/S. In comparison, SAIC's DELTA II ANN system achieves 10^7. A Cray X-MP achieves 5*10^7 IC/S.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Providing multiprocessor capability to the class of computers commonly referred to as personal workstations is the next evolutionary step in their development. Uniprocessor workstations limit the user in throughput, reliability, functionality, and architecture. Multiprocessor workstations have the potential of increasing system throughput. A multiprocessor system with expanded architecture derived from a set of heterogeneous processors gives the user a diverse application base within a single system. The replication and diversity offered in systems of this design, when coupled with fault-tolerant design techniques, enhances system reliability. A heterogeneous multiprocessor architecture is presented which combines loosely- and tightly-coupled configurations (multicomputer and multiprocessor). This architecture provides for incremental growth of the system, either by static or dynamic reconfiguration. The software view of the system is that of an object-oriented environment. The object-oriented approach is used to unify the heterogeneous nature of the system. The process is the unit of concurrency in the system and cooperating concurrent processes are supported. A set of system primitives are provided to support the requirements of a heterogeneous multiprocessing environment. A virtual machine layer controls the distribution of processes and allocation of resources in the system. A virtual network is used to provide communication paths and resource sharing. The virtual network is designed to be bridged to an external physical network. The system requirements for a secure and reliable operating environment are incorporated into the design. This system utilizes "hardware porting" as a means to overcome the lag of software support for hardware advances. Rather than software port an entire application base to a new system architecture, hardware porting brings the required instruction set architecture to the applications. This heterogeneous multiprocessor architecture builds on a popular system architecture, the scIBM PS/2 with the Micro Channel system bus. Incorporating a second bus, the scSCSI bus, as a system extension is explored.