TOPOLOGICAL DATA ANALYSIS FOR DATA SCIENCE: THE DELAUNAY-RIPS COMPLEX, TRIANGULATION STABILITIES, AND PROTEIN STABILITY PREDICTIONS

File
Publisher
Florida Atlantic University
Date Issued
2023
EDTF Date Created
2023
Description
Topological Data Analysis (TDA) is a relatively new field of research that utilizes topological notions to extract discriminating features from data. Within TDA, persistent homology (PH) is a robust method to compute multi-dimensional geometric and topological features of a dataset. Because these features are often stable under certain perturbations of the underlying data, are often discriminating, and can be used for visualization of structure in high-dimensional data and in statistical and machine learning modeling, PH has attracted the interest of researchers across scientific disciplines and in many industry applications. However, computational costs may present challenges to effectively using PH in certain data contexts, and theoretical stability results may not hold in practice. In this dissertation, we develop an algorithm that can reduce the computation burden of computing persistent homology on point cloud data. Naming it Delaunay-Rips (DR), we define, implement, and empirically test this computationally tractable simplicial complex construction for computing persistent homology of Euclidean point cloud data. We demonstrate the practical robustness of DR for persistent homology in comparison with other simplical complexes in machine learning applications such as predicting sleep state from patient heart rate. To justify the theoretical stability of DR, we prove the stability of the Delaunay triangulation of a pointcloud P under perturbations of the points of P. Specifically, we impose a notion of genericity on the points of P to ensure stability. In the final chapter, we contribute to the field of computational biology by taking a data-driven approach to learn topological features of designed proteins from their persistence diagrams. We find correlations between the learned topological features and biochemical features to investigate how protein structure relates to features identified by subject-matter experts. We train several machine learning models to assess the performance of incorporating topological features into training with biochemical features. Using cover-tree differencing via entropy reduction (CDER), we identify distinguishing regions of the persistence diagrams of stable/unstable proteins. More notably, we find statistically significant improvement in classification performance (in terms of average precision score) for certain designed secondary structure topologies.
Note

Includes bibliography.

Language
Type
Extent
160 p.
Identifier
FA00014311
Rights

Copyright © is held by the author with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.

Additional Information
Includes bibliography.
Dissertation (PhD)--Florida Atlantic University, 2023.
FAU Electronic Theses and Dissertations Collection
Date Backup
2023
Date Created Backup
2023
Date Text
2023
Date Created (EDTF)
2023
Date Issued (EDTF)
2023
Extension


FAU

IID
FA00014311
Organizations
Person Preferred Name

Mishra, Amish

author

Graduate College
Physical Description

application/pdf
160 p.
Title Plain
TOPOLOGICAL DATA ANALYSIS FOR DATA SCIENCE: THE DELAUNAY-RIPS COMPLEX, TRIANGULATION STABILITIES, AND PROTEIN STABILITY PREDICTIONS
Use and Reproduction
Copyright © is held by the author with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Origin Information

2023
2023
Florida Atlantic University

Boca Raton, Fla.

Place

Boca Raton, Fla.
Title
TOPOLOGICAL DATA ANALYSIS FOR DATA SCIENCE: THE DELAUNAY-RIPS COMPLEX, TRIANGULATION STABILITIES, AND PROTEIN STABILITY PREDICTIONS
Other Title Info

TOPOLOGICAL DATA ANALYSIS FOR DATA SCIENCE: THE DELAUNAY-RIPS COMPLEX, TRIANGULATION STABILITIES, AND PROTEIN STABILITY PREDICTIONS