Software measurement

Prediction of software quality using classification tree modeling

Model

Digital Document

Publisher

Florida Atlantic University

Description

Reliability of software systems is one of the major concerns in today's world as computers have really become an integral part of our lives. Society has become so dependent on reliable software systems that failures can be dangerous in terms of worsening a company's business, human relationships or affecting human lives. Software quality models are tools for focusing efforts to find faults early in the development. In this experiment, we used classification tree modeling techniques to predict the software quality by classifying program modules either as fault-prone or not fault-prone. We introduced the Classification And Regression Trees (scCART) algorithm as a tool to generate classification trees. We focused our experiments on very large telecommunications system to build quality models using set of product and process metrics as independent variables.

Member of

FAU Theses and Dissertations

Classification of software quality using tree modeling with the SPRINT/SLIQ algorithm

Model

Digital Document

Mao, Wenlei.

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

Providing high quality software products is the common goal of all software engineers. Finding faults early can produce large savings over the software life cycle. Therefore, software quality has become the main subject in our research field. This thesis presents a series of studies on a very large legacy telecommunication system. The system has significantly more than ten million lines of code written in a high-level language similar to Pascal. Software quality models were developed to predict the class of each module either as fault-prone or as not fault-prone. We used the SPRINT/SLIQ algorithm to build the classification tree models. We found out that SPRINT/ SLIQ as an improved CART algorithm can give us tree models with more accuracy, more balance, and less overfitting. We also found that software process metrics can significantly improve the predictive accuracy of software quality models.

Member of

FAU Theses and Dissertations

Cost of misclassification in software quality models

Model

Digital Document

Guan, Xin.

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

Reliability has become a very important and competitive factor for software products. Using software quality models based on software measurements provides a systematic and scientific way to detect software faults early and to improve software reliability. This thesis considers several classification techniques including Generalized Classification Rule, MetaCost algorithm, Cost-Boosting algorithm and AdaCost algorithm. We also introduce the weighted logistic regression algorithm, and a new method to evaluate the performance of classification models---ROC Analysis. We focus our experiments on a very large legacy telecommunications system (LLTS) to build software quality models with principal components analysis. Two other data sets, CCCS and LTS are also used in our experiments.

Member of

FAU Theses and Dissertations

Measurement of coupling and cohesion of software

Model

Digital Document

Chen, Ye.

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

Graphs are often used to depict an abstraction of software. A graph may be an abstraction of a software system and a subgraph may represent a software module. Coupling and cohesion are attributes that summarize the degree of interdependence or connectivity among subsystems or within subsystems, respectively. When used in conjunction with measures of other attributes, coupling and cohesion can contribute to an assessment or prediction of software quality. Information theory is attractive to us because the design decisions embodied by the graph are information. Using information theory, we propose measures of the cohesion and coupling of a modular system and cohesion and coupling of each constituent module. These measures conform to the properties of cohesion and coupling defined by Briand, Morasca and Basili, applied to undirected graphs and therefore, are in the families of measures called cohesion and coupling.

Member of

FAU Theses and Dissertations

Classification of software quality using tree modeling with the S-Plus algorithm

Model

Digital Document

Deng, Jianyu.

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

In today's competitive environment for software products, quality has become an increasingly important asset to software development organizations. Software quality models are tools for focusing efforts to find faults early in the development. Delaying corrections can lead to higher costs. In this research, the classification tree modeling technique was used to predict the software quality by classifying program modules either as fault-prone or not fault-prone. The S-Plus regression tree algorithm and a general classification rule were applied to yield classification tree models. Two classification tree models were developed based on four consecutive releases of a very large legacy telecommunications system. The first release was used as the training data set and the subsequent three releases were used as evaluation data sets. The first model used twenty-four product metrics and four execution metrics as candidate predictors. The second model added fourteen process metrics as candidate predictors.

Member of

FAU Theses and Dissertations

Software metrics collection: Two new research tools

Model

Digital Document

Jordan, Sylviane G.

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

Collecting software metrics manually could be a tedious, inaccurate, and subjective task. Two new tools were developed to automate this process in a rapid, accurate, and objective way. The first tool, the Metrics Analyzer, evaluates 19 metrics at the function level, from complete or partial systems written in C. The second tool, the Call Graph Generator, does not assess a metric directly, but generates a call graph based on a complete or partial system written in C. The call graph is used as an input to another tool (not considered here) that measures the coupling of a module, such as a function or a file. A case study analyzed the relationships among the metrics, including the coupling metric, using principal component analysis, which transformed the 19 metrics into eight principal components.

Member of

FAU Theses and Dissertations

Evaluating indirect and direct classification techniques for network intrusion detection

Model

Digital Document

Ibrahim, Nawal H.

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

Increasing aggressions through cyber terrorism pose a constant threat to information security in our day to day life. Implementing effective intrusion detection systems (IDSs) is an essential task due to the great dependence on networked computers for the operational control of various infrastructures. Building effective IDSs, unfortunately, has remained an elusive goal owing to the great technical challenges involved, and applied data mining techniques are increasingly being utilized in attempts to overcome the difficulties. This thesis presents a comparative study of the traditional "direct" approaches with the recently explored "indirect" approaches of classification which use class binarization and combiner techniques for intrusion detection. We evaluate and compare the performance of IDSs based on various data mining algorithms, in the context of a well known network intrusion evaluation data set. It is empirically shown that data mining algorithms when applied using the indirect classification approach yield better intrusion detection models.

Member of

FAU Theses and Dissertations

Partitioning filter approach to noise elimination: An empirical study in software quality classification

Model

Digital Document

Rebours, Pierre.

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

This thesis presents two new noise filtering techniques which improve the quality of training datasets by removing noisy data. The training dataset is first split into subsets, and base learners are induced on each of these splits. The predictions are combined in such a way that an instance is identified as noisy if it is misclassified by a certain number of base learners. The Multiple-Partitioning Filter combines several classifiers on each split. The Iterative-Partitioning Filter only uses one base learner, but goes through multiple iterations. The amount of noise removed is varied by tuning the filtering level or the number of iterations. Empirical studies on a high assurance software project compare the effectiveness of our noise removal approaches with two other filters, the Cross-Validation Filter and the Ensemble Filter. Our studies suggest that using several base classifiers as well as performing several iterations with a conservative scheme may improve the efficiency of the filter.

Member of

FAU Theses and Dissertations

Three-group software quality classification modeling with TREEDISC algorithm

Model

Digital Document

Liu, Yongbin.

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

Maintaining superior quality and reliability of software systems is important nowadays. Software quality modeling detects fault-prone modules and enables us to achieve high quality in software system by focusing on fewer modules, because of limited resources and budget. Tree-based modeling is a simple and effective method that predicts the fault proneness in software systems. In this thesis, we introduce TREEDISC modeling technique with a three-group classification rule to predict the quality of software modules. A general classification rule is applied and validated. The three impact parameters, group number, minimum leaf size and significant level, are thoroughly evaluated. An optimization procedure is conducted and empirical results are presented. Conclusions about the impact factors as well as the robustness of our research are performed. TREEDISC modeling technique with three-group classification has proved to be an efficient and convincing method in software quality control.

Member of

FAU Theses and Dissertations

empirical study of a three-group software quality classification model

Model

Digital Document

Cherukuri, Reena.

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

Maintaining superior quality and reliability of software systems is an important issue in software reliability engineering. Software quality estimation models based on software metrics provide a systematic and scientific way to detect fault-prone modules and enable us to achieve high quality in software systems by focusing on high-risk modules within limited resources and budget. In previous works, classification models for software quality usually classified modules into two groups, fault-prone or not fault-prone. This thesis presents a new technique for classifying modules into three groups, i.e., high-risk, medium-risk, and low-risk groups. This new technique calibrates three-group models according to the resources available, which makes it different from other classification techniques. The proposed three-group classification method proved to be efficient and useful for resource utilization in software quality control.

Member of

FAU Theses and Dissertations

Software measurement