Khoshgoftaar, Taghi M.

Relationships
Member of: Thesis advisor
Person Preferred Name
Khoshgoftaar, Taghi M.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Reliability of software systems is one of the major concerns in today's world as computers have really become an integral part of our lives. Society has become so dependent on reliable software systems that failures can be dangerous in terms of worsening a company's business, human relationships or affecting human lives. Software quality models are tools for focusing efforts to find faults early in the development. In this experiment, we used classification tree modeling techniques to predict the software quality by classifying program modules either as fault-prone or not fault-prone. We introduced the Classification And Regression Trees (scCART) algorithm as a tool to generate classification trees. We focused our experiments on very large telecommunications system to build quality models using set of product and process metrics as independent variables.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Though software development has been evolving for over 50 years, the development of computer software systems has largely remained an art. Through the application of measurable and repeatable processes, efforts have been made to slowly transform the software development art into a rigorous engineering discipline. The potential gains are tremendous. Computer software pervades modern society in many forms. For example, the automobile, radio, television, telephone, refrigerator, and still-camera have all been transformed by the introduction of computer based controls. The quality of these everyday products is in part determined by the quality of the computer software running inside them. Therefore, the timely delivery of low-cost and high-quality software to enable these mass market products becomes very important to the long term success of the companies building them. It is not surprising that managing the number of faults in computer software to competitive levels is a prime focus of the software engineering activity. In support of this activity, many models of software quality have been developed to help control the software development process and ensure that our goals of cost and quality are met on time. In this study, we focus on the software quality modeling activity. We improve existing static and dynamic methodologies and demonstrate new ones in a coordinated attempt to provide engineering methods applicable to the development of computer software. We will show how the power of separate predictive and classification models of software quality may be combined into one model; introduce a three group fault classification model in the object-oriented paradigm; demonstrate a dynamic modeling methodology of the testing process and show how software product measures and software process measures may be incorporated as input to such a model; demonstrate a relationship between software product measures and the testability of software. The following methodologies were considered: principal components analysis, multiple regression analysis, Poisson regression analysis, discriminant analysis, time series analysis, and neural networks. Commercial grade software systems are used throughout this dissertation to demonstrate concepts and validate new ideas. As a result, we hope to incrementally advance the state of the software engineering "art".
Model
Digital Document
Publisher
Florida Atlantic University
Description
Providing high quality software products is the common goal of all software engineers. Finding faults early can produce large savings over the software life cycle. Therefore, software quality has become the main subject in our research field. This thesis presents a series of studies on a very large legacy telecommunication system. The system has significantly more than ten million lines of code written in a high-level language similar to Pascal. Software quality models were developed to predict the class of each module either as fault-prone or as not fault-prone. We used the SPRINT/SLIQ algorithm to build the classification tree models. We found out that SPRINT/ SLIQ as an improved CART algorithm can give us tree models with more accuracy, more balance, and less overfitting. We also found that software process metrics can significantly improve the predictive accuracy of software quality models.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Reliability has become a very important and competitive factor for software products. Using software quality models based on software measurements provides a systematic and scientific way to detect software faults early and to improve software reliability. This thesis considers several classification techniques including Generalized Classification Rule, MetaCost algorithm, Cost-Boosting algorithm and AdaCost algorithm. We also introduce the weighted logistic regression algorithm, and a new method to evaluate the performance of classification models---ROC Analysis. We focus our experiments on a very large legacy telecommunications system (LLTS) to build software quality models with principal components analysis. Two other data sets, CCCS and LTS are also used in our experiments.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Graphs are often used to depict an abstraction of software. A graph may be an abstraction of a software system and a subgraph may represent a software module. Coupling and cohesion are attributes that summarize the degree of interdependence or connectivity among subsystems or within subsystems, respectively. When used in conjunction with measures of other attributes, coupling and cohesion can contribute to an assessment or prediction of software quality. Information theory is attractive to us because the design decisions embodied by the graph are information. Using information theory, we propose measures of the cohesion and coupling of a modular system and cohesion and coupling of each constituent module. These measures conform to the properties of cohesion and coupling defined by Briand, Morasca and Basili, applied to undirected graphs and therefore, are in the families of measures called cohesion and coupling.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Software quality is crucial both to software makers and customers. However, in reality, improvement of quality and reduction of costs are often at odds. Software modeling can help us to detect fault-prone software modules based on software metrics, so that we can focus our limited resources on fewer modules and lower the cost but still achieve high quality. In the present study, a tree classification modeling technique---TREEDISC was applied to three case studies. Several major contributions have been made. First, preprocessing of raw data was adopted to solve the computer memory problem and improve the models. Secondly, TREEDISC was thoroughly explored by examining the roles of important parameters in modeling. Thirdly, a generalized classification rule was introduced to balance misclassification rates and decrease type II error, which is considered more costly than type I error. Fourthly, certainty of classification was addressed. Fifthly, TREEDISC modeling was validated over multiple releases of software product.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Software quality models often have raw software metrics as the input data for predicting quality. Raw metrics are usually highly correlated with one another and thus may result in unstable models. Principal components analysis is a statistical method to improve model stability. This thesis presents a series of studies on a very large legacy telecommunication system. The system has significantly more than ten million lines of code written in a high level language similar to Pascal. Software quality models were developed to predict the class of each module either as fault-prone or as not fault-prone. We found out that the models based on principal components analysis were more robust than those based on raw metrics. We also found out that software process metrics can significantly improve the predictive accuracy of software quality models.
Model
Digital Document
Publisher
Florida Atlantic University
Description
In today's competitive environment for software products, quality has become an increasingly important asset to software development organizations. Software quality models are tools for focusing efforts to find faults early in the development. Delaying corrections can lead to higher costs. In this research, the classification tree modeling technique was used to predict the software quality by classifying program modules either as fault-prone or not fault-prone. The S-Plus regression tree algorithm and a general classification rule were applied to yield classification tree models. Two classification tree models were developed based on four consecutive releases of a very large legacy telecommunications system. The first release was used as the training data set and the subsequent three releases were used as evaluation data sets. The first model used twenty-four product metrics and four execution metrics as candidate predictors. The second model added fourteen process metrics as candidate predictors.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Collecting software metrics manually could be a tedious, inaccurate, and subjective task. Two new tools were developed to automate this process in a rapid, accurate, and objective way. The first tool, the Metrics Analyzer, evaluates 19 metrics at the function level, from complete or partial systems written in C. The second tool, the Call Graph Generator, does not assess a metric directly, but generates a call graph based on a complete or partial system written in C. The call graph is used as an input to another tool (not considered here) that measures the coupling of a module, such as a function or a file. A case study analyzed the relationships among the metrics, including the coupling metric, using principal component analysis, which transformed the 19 metrics into eight principal components.
Model
Digital Document
Publisher
Florida Atlantic University
Description
This thesis presents the results of an empirical investigation of the applicability of genetic algorithms to a real-world problem in software reliability--the fault-prone module identification problem. The solution developed is an effective hybrid of genetic algorithms and neural networks. This approach (ENNs) was found to be superior, in terms of time, effort, and confidence in the optimality of results, to the common practice of searching manually for the best-performing net. Comparisons were made to discriminant analysis. On fault-prone, not-fault-prone, and overall classification, the lower error proportions for ENNs were found to be statistically significant. The robustness of ENNs follows from their superior performance over many data configurations. Given these encouraging results, it is suggested that ENNs have potential value in other software reliability problem domains, where genetic algorithms have been largely ignored. For future research, several plans are outlined for enhancing ENNs with respect to accuracy and applicability.