Machine learning

Model
Digital Document
Publisher
Florida Atlantic University
Description
A common topological data analysis approach used in the experimental sciences involves creating machine learning pipelines that incorporate discriminating topological features derived from persistent homology (PH) of data samples, encoded in persistence diagrams (PDs) and associated topological feature vectors. Often the most computationally demanding step is computing PH through an algorithmic process known as boundary matrix reduction. In this work, we introduce several methods to generate topological feature vectors from unreduced boundary matrices. We compared the performance of classifiers trained on vectorizations of unreduced PDs to vectorizations of fully-reduced PDs across several benchmark ML datasets. We discovered that models trained on PDs built from unreduced diagrams can perform on par and even outperform those trained on full-reduced diagrams. This observation suggests that machine learning pipelines which incorporate topology-based features may benefit in terms of computational cost and performance by utilizing information contained in unreduced boundary matrices.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Maintaining security in IoT systems depends on intrusion detection since these networks' sensitivity to cyber-attacks is growing. Based on the IoT23 dataset, this study explores the use of several Machine Learning (ML) and Deep Learning (DL) along with the hybrid models for binary and multi-class intrusion detection. The standalone machine and deep learning models like Random Forest (RF), Extreme Gradient Boosting (XGBoost), Artificial Neural Network (ANN), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Convolutional Neural Network (CNN) were used. Furthermore, two hybrid models were created by combining machine learning techniques: RF, XGBoost, AdaBoost, KNN, and SVM and these hybrid models were voting based hybrid classifier. Where one is for binary, and the other one is for multi-class classification. These models were tested using precision, recall, accuracy, and F1-score criteria and compared the performance of each model. This work thoroughly explains how hybrid, standalone ML and DL techniques could improve IDS (Intrusion Detection System) in terms of accuracy and scalability in IoT (Internet of Things).
Model
Digital Document
Publisher
Florida Atlantic University
Description
IoBT stands for the Internet of Battlefield Things. This concept extends the principles of the Internet of Things (IoT) for military and defense use. IoBT integrates smart devices, sensors, and technology on the battlefield to improve situational awareness, communication, and decision-making in military operations. Sensitive military data typically includes information crucial to national security, such as the location of soldiers and equipment. Unauthorized access to location data may compromise operational confidentiality and impede the element of surprise in military operations. Therefore, ensuring the security of location data is crucial for the success and efficiency of military operations. We propose two systems to address this issue.
First, we propose a novel deception-based scheme to enhance the location-information security of IoBT nodes. The proposed scheme uses a novel encryption method, dummy IDs, and dummy packets technology. We develop a mathematical model to evaluate our scheme in terms of safety time (ST), probability of failure (PF), and the probability of identifying the real packet in each location information update (PIRP). Then, we develop NetLogo simulations to validate the mathematical model. The proposed scheme increases ST, reduces PF and PIRP.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Businesses are the driving force behind economic systems and are the lifeline of the community as they help in the prosperity and growth of the nation. Hence it is important for the business to succeed in the market. The business’s success provides economic stability and sustainability that helps preserve resources for future generations. The success of a business is not only important to the owners but is also critical to the regional/domestic economic system, or even the global economy. Recent years have witnessed many new emerging businesses with tremendous success, such as Google, Apple, Facebook etc.. Yet, millions of businesses also fail or fade out within a rather short period of time. Finding patterns/factors connected to the business rise and fall remains a long-lasting question that puzzles many economists, entrepreneurs, and government officials. Recent advancements in artificial intelligence, especially machine learning, has lent researchers the powers to use data to model and predict business success. However, due to the data-driven nature of all machine learning methods, existing approaches are rather domain-driven and ad-hoc in their design and validations, particularly in the field of business prediction. The main challenge of business success prediction is twofold: (1) Identifying variables for defining business success; (2) Feature selection and feature engineering based on three main categories Investment, Business, and Market, each of which is focused on modeling a business from a particular perspective, such as sales, management, innovation etc.
Model
Digital Document
Publisher
Florida Atlantic University
Description
In the modern data landscape, vast amounts of unlabeled data are continuously generated, necessitating development of robust unsupervised techniques for handling unlabeled data. This is the case for fraud detection and healthcare sectors analyses, where data is often significantly imbalanced. This dissertation focuses on novel techniques for handling imbalanced data, with specific emphasis on a novel unsupervised class labeling technique for unlabeled fraud detection datasets and unlabeled cognitive datasets. Traditional supervised machine learning relies on labeled data, which is often expensive and difficult to create, particularly in domains requiring expert input. Additionally, such datasets suffer from challenges associated with class imbalance, where one class has significantly fewer examples than another, complicating model training and significantly reducing performance. The primary objectives of this dissertation include developing a novel unsupervised cleaning method, and an innovative unsupervised class labeling method. We validate and evaluate our methods across various datasets, which include two Medicare fraud detection datasets, a credit card fraud detection dataset, and three datasets used for detecting cognitive decline.
Our unique approach involves using an unsupervised autoencoder to learn from dataset features and synthesize labels. Primarily targeting imbalanced datasets, but still effective for balanced datasets, our method calculates an error metric for each instance. This metric is used to distinguish between fraudulent and legitimate cases, allowing us to assign a binary class label. To further improve label generation, we integrate an unsupervised feature selection method that ranks and identifies the most important features without using class labels.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Dynamical systems play a pivotal role across various scientific domains, encompassing disciplines from physics to biology and engineering. The long-term behavior of these systems hinges on the structure of their attractors, with many exhibiting multistability characterized by multiple minimal attractors. Understanding the structure of these attractors and their corresponding basins is a central theme in dynamical systems theory.
In recent years, machine learning algorithms have emerged as potent tools for clustering, prediction, and modeling complex data. By harnessing the capabilities of neural networks along with techniques from topological data analysis, in particular persistence homology, we can construct surrogate models of system asymptotics. This approach also allows for the decomposition of phase space into polygonal regions and the identification of plausible attracting neighborhoods, facilitating homological Conley index computation at reduced computational expense compared to current methods. Through various illustrative examples, we demonstrate that sufficiently low training loss yields constructed neighborhoods whose homological Conley indices aligns with a priori knowledge of the dynamics.
Model
Digital Document
Publisher
Florida Atlantic University
Description
The aim of this dissertation is to achieve a thorough understanding and develop an algorithmic framework for a crucial aspect of autonomous and artificial intelligence (AI) systems: Data Analysis. In the current era of AI and machine learning (ML), ”data” holds paramount importance. For effective learning tasks, it is essential to ensure that the training dataset is accurate and comprehensive. Additionally, during system operation, it is vital to identify and address faulty data to prevent potentially catastrophic system failures. Our research in data analysis focuses on creating new mathematical theories and algorithms for outlier-resistant matrix decomposition using L1-norm principal component analysis (PCA). L1-norm PCA has demonstrated robustness against irregular data points and will be pivotal for future AI learning and autonomous system operations.
This dissertation presents a comprehensive exploration of L1-norm techniques and their diverse applications. A summary of our contributions in this manuscript follows: Chapter 1 establishes the foundational mathematical notation and linear algebra concepts critical for the subsequent discussions, along with a review of the complexities of the current state-of-the-art in L1-norm matrix decomposition algorithms. In Chapter 2, we address the L1-norm error decomposition problem by introducing a novel method called ”Individual L1-norm-error Principal Component Computation by 3-layer Perceptron” (Perceptron L1 error). Extensive studies demonstrate the efficiency of this greedy L1-norm PC calculator.
Model
Digital Document
Publisher
Florida Atlantic University
Description
This study focuses on developing optimization models to estimate missing precipitation data at twenty-two sites within Kentucky State. Various optimization formulations and regularization models are explored in this context. The performance of these models is evaluated using a range of performance measures and error metrics for handling missing records. The findings revealed that regularization models performed better than optimization models. This superiority is attributed to their ability to reduce model complexity while enhancing overall performance. The study underscores the significance of regularization techniques in improving the accuracy and efficiency of precipitation data estimation.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Computational tools grounded in algebraic topology, known collectively as topological data analysis (TDA), have been used for dimensionality-reduction to preserve salient and discriminating features in data. This faithful but compressed representation of data through TDA’s flagship method, persistent homology (PH), motivates its use to address the complexity, depth, and inefficiency issues present in privacy-preserving, homomorphic encryption (HE)-based machine learning (ML) models, which permit a data provider (often referred to as the Client) to outsource computational tasks on their encrypted data to a computationally-superior but semi-honest party (the Server). This work introduces efforts to adapt the well-established TDA-ML pipeline on encrypted data to realize the benefits TDA can provide to HE’s computational limitations as well as provide HE’s provable security on the sensitive data domains in which TDA has found success in (e.g., sequence, gene expression, imaging). The privacy-protecting technologies which could emerge from this foundational work will lead to direct improvements to the accessibility and equitability of health care systems. ML promises to reduce biases and improve accuracies of diagnoses, and enabling such models to act on sensitive biomedical data without exposing it will improve trustworthiness of these systems.
Model
Digital Document
Publisher
Florida Atlantic University
Description
The Internet of Things (IoT) refers to a network of interconnected nodes constantly engaged in communication, data exchange, and the utilization of various network protocols. Previous research has demonstrated that IoT devices are highly susceptible to cyber-attacks, posing a significant threat to data security. This vulnerability is primarily attributed to their susceptibility to exploitation and their resource constraints. To counter these threats, Intrusion Detection Systems (IDS) are employed. This study aims to contribute to the field by enhancing IDS detection efficiency through the integration of Ensemble Learning (EL) methods with traditional Machine Learning (ML) and deep learning (DL) models. To bolster IDS performance, we initially utilize a binary ML classification approach to classify IoT network traffic as either normal or abnormal, employing EL methods such as Stacking and Voting. Once this binary ML model exhibits high detection rates, we extend our approach by incorporating a ML multi-class framework to classify attack types. This further enhances IDS performance by implementing the same Ensemble Learning methods. Additionally, for further enhancement and evaluation of the intrusion detection system, we employ DL methods, leveraging deep learning techniques, ensemble feature selections, and ensemble methods. Our DL approach is designed to classify IoT network traffic. This comprehensive approach encompasses various supervised ML, and DL algorithms with ensemble methods. The proposed models are trained on TON-IoT network traffic datasets. The ensemble approaches are evaluated using a comprehensive metrics and compared for their effectiveness in addressing this classification tasks. The ensemble classifiers achieved higher accuracy rates compared to individual models, a result attributed to the diversity of learning mechanisms and strengths harnessed through ensemble learning. By combining these strategies, we successfully improved prediction accuracy while minimizing classification errors. The outcomes of these methodologies underscore their potential to significantly enhance the effectiveness of the Intrusion Detection System.