Data Science

Model
Digital Document
Publisher
Florida Atlantic University
Description
Topological Data Analysis (TDA) is a relatively new field of research that utilizes topological notions to extract discriminating features from data. Within TDA, persistent homology (PH) is a robust method to compute multi-dimensional geometric and topological features of a dataset. Because these features are often stable under certain perturbations of the underlying data, are often discriminating, and can be used for visualization of structure in high-dimensional data and in statistical and machine learning modeling, PH has attracted the interest of researchers across scientific disciplines and in many industry applications. However, computational costs may present challenges to effectively using PH in certain data contexts, and theoretical stability results may not hold in practice. In this dissertation, we develop an algorithm that can reduce the computation burden of computing persistent homology on point cloud data. Naming it Delaunay-Rips (DR), we define, implement, and empirically test this computationally tractable simplicial complex construction for computing persistent homology of Euclidean point cloud data. We demonstrate the practical robustness of DR for persistent homology in comparison with other simplical complexes in machine learning applications such as predicting sleep state from patient heart rate. To justify the theoretical stability of DR, we prove the stability of the Delaunay triangulation of a pointcloud P under perturbations of the points of P. Specifically, we impose a notion of genericity on the points of P to ensure stability. In the final chapter, we contribute to the field of computational biology by taking a data-driven approach to learn topological features of designed proteins from their persistence diagrams. We find correlations between the learned topological features and biochemical features to investigate how protein structure relates to features identified by subject-matter experts. We train several machine learning models to assess the performance of incorporating topological features into training with biochemical features. Using cover-tree differencing via entropy reduction (CDER), we identify distinguishing regions of the persistence diagrams of stable/unstable proteins. More notably, we find statistically significant improvement in classification performance (in terms of average precision score) for certain designed secondary structure topologies.
Model
Digital Document
Publisher
Florida Atlantic University
Description
This dissertation focuses on the development of data-driven and physics-based modeling for two distinct significant structural engineering applications: time-varying response variables estimation and unwanted lateral vibration control. In the first part, I propose a machine learning (ML)-based surrogate modeling to directly predict dynamic responses over an entire mechanical system during operations. Any mechanical system design, as well as structural health monitoring systems, require transient vibration analysis. However, traditional methods and modeling calculations are time- and resource-consuming. The use of ML approaches is particularly promising in scientific and engineering challenges containing processes that are not completely understood, or where it is computationally infeasible to run numerical or analytical models at desired resolutions in space and time. In this research, an ML-based surrogate for the FEA approach is developed to forecast the time-varying response, i.e., displacement of a two-dimensional truss structure. Various ML regression algorithms including decision trees and deep neural networks are developed to predict movement over a truss structure, and their efficiencies are investigated. ML algorithms have been combined with FEA in preliminary attempts to address issues in static mechanical systems.