Khoshgoftaar, Taghi M.

Relationships
Member of: Thesis advisor
Person Preferred Name
Khoshgoftaar, Taghi M.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Software systems that control military radar systems must be highly reliable. A fault can compromise safety and security, and even cause death of military personnel. In this experiment we identify fault-prone software modules in a subsystem of a military radar system called the Joint Surveillance Target Attack Radar System, JSTARS. An earlier version was used in Operation Desert Storm to monitor ground movement. Product metrics were collected for different iterations of an operational prototype of the subsystem over a period of approximately three years. We used these metrics to train a decision tree model and to fit a discriminant model to classify each module as fault-prone or not fault-prone. The algorithm used to generate the decision tree model was TREEDISC, developed by the SAS Institute. The decision tree model is compared to the discriminant model.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Accurately classifying the quality of software is a major problem in any software development project. Software engineers develop models that provide early estimates of quality metrics which allow them to take actions against emerging quality problems. The use of a neural network as a tool to classify programs as a low, medium, or high risk for errors or change is explored using multiple software metrics as input. It is demonstrated that a neural network, trained using the back-propagation supervised learning strategy, produced the desired mapping between the static software metrics and the software quality classes. The neural network classification methodology is compared to the discriminant analysis classification methodology in this experiment. The comparison is based on two and three class predictive models developed using variables resulting from principal component analysis of software metrics.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Accurately predicting the quality of software is a major problem in any software development project. Software engineers develop models that provide early estimates of quality metrics which allow them to take action against emerging quality problems. Most often the predictive models are based upon multiple regression analysis which become unstable when certain data assumptions are not met. Since neural networks require no data assumptions, they are more appropriate for predicting software quality. This study proposes an improved neural network architecture that significantly outperforms multiple regression and other neural network attempts at modeling software quality. This is demonstrated by applying this approach to several large commercial software systems. After developing neural network models, we develop regression models on the same data. We find that the neural network models surpass the regression models in terms of predictive quality on the data sets considered.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Since maintenance is the most expensive phase of the software life cycle, detecting most of the errors as early as possible in the software development effort can provide substantial savings. This study investigates the behavior of complexity metrics during testing and maintenance, and their relationship to modifications made to the software. Interface complexity causes most of the change activities during integration testing and maintenance, while size causes most of the changes during unit testing. Principal component analysis groups 16 complexity metrics into four domains. Changes in domain pattern are observed throughout the software life cycle. Using those domains as input, regression analysis shows that software complexity measures collected as early as the unit testing phase can identify and predict change prone modules. With a low rate of misclassification, discriminant analysis further confirms that complexity metrics provide a strong indication of the changes made to a module during testing and maintenance.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Advanced system bus architectures such as the Micro Channel and the EISA bus support what is called bus-mastering that allows the I/O subsystems attached to the bus to arbitrate and take control of the bus to perform data transfers independent of the system processor. I/O subsystems that can control/master the system bus are called Bus-Masters. The IBM Subsystem Control Block (SCB) architecture defines interrupt-driven as well as peer-to-peer I/O protocols for performing data transfers to/from the bus-masters. In previous studies, the performance of the SCB protocols is evaluated in network server environments using simulation models. The main drawback of these studies is that the server system is modeled in considerable detail but the network and the clients are not considered. In this study, we developed models to simulate a complete network file server environment where a single file server based on the SCB architecture provides file service to a variable number of clients on a token-ring network. We then evaluate the performance of the SCB protocols using the results obtained from the simulations.
Model
Digital Document
Publisher
Florida Atlantic University
Description
One of the important problems which software engineers face is how to determine which software reliability model should be used for a particular system. Some recent attempts to compare different models used complementary graphical and analytical techniques. These techniques require an excessive amount of time for plotting the data and running the analyses, and they are still rather subjective as to which model is best. So another technique needs to be found that is simpler and yet yields a less subjective measure of goodness of fit. The Akaike Information Criterion (AIC) is proposed as a new approach for selecting the best model. The performance of AIC is measured by Monte-Carlo simulation and by comparison to published data sets. The AIC chooses the correct model 95% of the time.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Imbalanced class distributions typically cause poor classifier performance on the minority class, which also tends to be the class with the highest cost of mis-classification. Data sampling is a common solution to this problem, and numerous sampling techniques have been proposed to address it. Prior research examining the performance of these techniques has been narrow and limited. This work uses thorough empirical experimentation to compare the performance of seven existing data sampling techniques using five different classifiers and four different datasets. The work addresses which sampling techniques produce the best performance in the presence of class unbalance, which classifiers are most robust to the problem, as well as which sampling techniques perform better or worse with each classifier. Extensive statistical analysis of these results is provided, in addition to an examination of the qualitative effects of the sampling techniques on the types of predictions made by the C4.5 classifier.
Model
Digital Document
Publisher
Florida Atlantic University
Description
The security of wireless networks has gained considerable importance due to the rapid proliferation of wireless communications. While computer network heuristics and rules are being used to control and monitor the security of Wireless Local Area Networks (WLANs), mining and learning behaviors of network users can provide a deeper level of security analysis. The objective and contribution of this thesis is three fold: exploring the security vulnerabilities of the IEEE 802.11 standard for wireless networks; extracting features or metrics, from a security point of view, for modeling network traffic in a WLAN; and proposing a data mining-based approach to intrusion detection in WLANs. A clustering- and expert-based approach to intrusion detection in a wireless network is presented in this thesis. The case study data is obtained from a real-word WLAN and contains over one million records. Given the clusters of network traffic records, a distance-based heuristic measure is proposed for labeling clusters as either normal or intrusive. The empirical results demonstrate the promise of the proposed approach, laying the groundwork for a clustering-based framework for intrusion detection in computer networks.
Model
Digital Document
Publisher
Florida Atlantic University
Description
This thesis expands upon an existing noise cleansing technique, polishing, enabling it to be used in the Software Quality Prediction domain, as well as any other domain where the data contains continuous values, as opposed to categorical data for which the technique was originally designed. The procedure is applied to a real world dataset with real (as opposed to injected) noise as determined by an expert in the domain. This, in combination with expert assessment of the changes made to the data, provides not only a more realistic dataset than one in which the noise (or even the entire dataset) is artificial, but also a better understanding of whether the procedure is successful in cleansing the data. Lastly, this thesis provides a more in-depth view of the process than previously available, in that it gives results for different parameters and classifier building techniques. This allows the reader to gain a better understanding of the significance of both model generation and parameter selection.
Model
Digital Document
Publisher
Florida Atlantic University
Description
In the literature, there has been limited research that systematically investigates the possibility of exercising a hybrid approach by simply learning from the output of numerous base-level learners. We analyze a hybrid learning approach upon the systems that had previously been worked with twenty-four different classifiers. Instead of relying on only one classifier's judgment, it is expected that taking into account the opinions of several learners is a wise decision. Moreover, by using clustering techniques some base-level classifiers were eliminated from the hybrid learner input. We had three different experiments each with a different number of base-level classifiers. We empirically show that the hybrid learning approach generally yields better performance than the best selected base-level learners and majority voting under some conditions.