Fault-tolerant distributed shared memories

File
Contributors
Publisher
Florida Atlantic University
Date Issued
1993
Description
Distributed shared memory (DSM) implements a shared-memory programming interface on message-passing hardware. The shared-memory programming paradigm offers several advantages over the message-passing paradigm. DSM is recognized as an important technology for massively parallel computing. However, as the number of processors in a system increases, the probability of a failure increases. To be widely useful, the DSM must be able to tolerate failures. This dissertation presents a method of implementing fault-tolerant DSM (FTDSM) that is based on the idea of a snooper. The snooper monitors DSM protocol messages and keeps a backup of the current state of the DSM. The snooper can respond on behalf of failed processors. The snooper-based FTDSM is an improvement over existing FTDSMs because it is based on the efficient dynamic distributed manager DSM algorithm, does not require the repair of a failed processor in access the DSM, and does not query all nodes to rebuild the state of the DSM. Three snooper-based FTDSM systems are developed. The single-snooper (SS) FTDSM has one snooper and is restricted to a broadcast network. Additional snoopers are added in the multiple-snooper (MS) FTDSM to improve performance. Two-phase commit (2PC) protocols are developed to coordinate the activities of the snoopers, and a special data structure is used to store causality information to reduce the amount of snooper activity. Snooping is integrated with each processor in the integrated snooper (IS) FTDSM. The IS FTDSM is scalable because it is not restricted to a broadcast network. The concept of dynamic snooping is introduced for the IS FTDSM and several snooper migration algorithms are studied. Several recovery algorithms are developed to allow failed processors to rejoin the system. The properties of data structures used to locate owners and snoopers are studied and used to prove that the system can tolerate any single fault. A flexible method of integrating application-level recovery with the FTDSM is presented, and a reliability analysis is conducted using a Markov-chain modeling tool to show that the snooper-based FTDSM is a cost effective way to improve the reliability of DSM.
Note

College of Engineering and Computer Science

Language
Type
Extent
253 p.
Identifier
12349
Additional Information
College of Engineering and Computer Science
FAU Electronic Theses and Dissertations Collection
Thesis (Ph.D.)--Florida Atlantic University, 1993.
Date Backup
1993
Date Text
1993
Date Issued (EDTF)
1993
Extension


FAU
FAU
admin_unit="FAU01", ingest_id="ing1508", creator="staff:fcllz", creation_date="2007-07-18 20:22:04", modified_by="staff:fcllz", modification_date="2011-01-06 13:08:39"

IID
FADT12349
Issuance
monographic
Person Preferred Name

Brown, Larry.
Graduate College
Physical Description

253 p.
application/pdf
Title Plain
Fault-tolerant distributed shared memories
Use and Reproduction
Copyright © is held by the author, with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
http://rightsstatements.org/vocab/InC/1.0/
Origin Information

1993
monographic

Boca Raton, Fla.

Florida Atlantic University
Physical Location
Florida Atlantic University Libraries
Place

Boca Raton, Fla.
Sub Location
Digital Library
Title
Fault-tolerant distributed shared memories
Other Title Info

Fault-tolerant distributed shared memories