Dong, Yuhong.

Relationships
Member of: Graduate College
Person Preferred Name
Dong, Yuhong.
Model
Digital Document
Publisher
Florida Atlantic University
Description
In today's competitive environment for software products, quality has become an increasingly important asset to software development organizations. Software quality models are tools for focusing efforts to find faults early in the development. Delaying corrections can lead to higher costs. In this research, the classification Bayesian Networks modelling technique was used to predict the software quality by classifying program modules either as fault-prone or not fault-prone. A general classification rule was applied to yield classification Bayesian Belief Network models. Six classification Bayesian Belief Network models were developed based on quality metrics data records of two very large window application systems. The fit data set was used to build the model and the test data set was used to evaluate the model. The first two models used median based data cluster technique, the second two models used median as critical value to cluster metrics using Generalized Boolean Discriminant Function and the third two models used Kolniogorov-Smirnov test to select the critical value to cluster metrics using Generalized Boolean Discriminant Function; All six models used the product metrics (FAULT or CDCHURN) as predictors.
Model
Digital Document
Publisher
Florida Atlantic University
Description
An un-supervised learning algorithm on application level intrusion detection, named Graph Sequence Learning Algorithm (GSLA), is proposed in this dissertation. Experiments prove its effectiveness. Similar to most intrusion detection algorithms, in GSLA, the normal profile needs to be learned first. The normal profile is built using a session learning method, which is combined with the one-way Analysis of Variance method (ANOVA) to determine the value of an anomaly threshold. In the proposed approach, a hash table is used to store a sparse data matrix in triple data format that is collected from a web transition log instead of an n-by-n dimension matrix. Furthermore, in GSLA, the sequence learning matrix can be dynamically changed according to a different volume of data sets. Therefore, this approach is more efficient, easy to manipulate, and saves memory space. To validate the effectiveness of the algorithm, extensive simulations have been conducted by applying the GSLA algorithm to the homework submission system at our computer science and engineering department. The performance of GSLA is evaluated and compared with traditional Markov Model (MM) and K-means algorithms. Specifically, three major experiments have been done: (1) A small data set is collected as a sample data, and is applied to GSLA, MM, and K-means algorithms to illustrate the operation of the proposed algorithm and demonstrate the detection of abnormal behaviors. (2) The Random Walk-Through sampling method is used to generate a larger sample data set, and the resultant anomaly score is classified into several clusters in order to visualize and demonstrate the normal and abnormal behaviors with K-means and GSLA algorithms. (3) Multiple professors' data sets are collected and used to build the normal profiles, and the ANOVA method is used to test the significant difference among professors' normal profiles. The GSLA algorithm can be made as a module and plugged into the IDS as an anomaly detection system.