Computer software--Quality control

Developing accurate software quality models using a faster, easier, and cheaper method

Model

Digital Document

Publisher

Florida Atlantic University

Description

Managers of software development need to know which components of a system are fault-prone. If this can be determined early in the development cycle then resources can be more effectively allocated and significant costs can be reduced. Case-Based Reasoning (CBR) is a simple and efficient methodology for building software quality models that can provide early information to managers. Our research focuses on two case studies. The first study analyzes source files and classifies them as fault-prone or not fault-prone. It also predicts the number of faults in each file. The second study analyzes the fault removal process, and creates models that predict the outcome of software inspections.

Member of

FAU Theses and Dissertations

Modeling fault-prone modules of subsystems

Model

Digital Document

Thaker, Vishal Kirit.

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

In software engineering software quality has become a topic of major concern. It has also been recognized that the role of maintenance organization is to understand and estimate the cost of maintenance releases of software systems. Planning the next release so as to maximize the increase in functionality and the improvement in quality are essential to successful maintenance management. With the growing collection of software in organizations this cost is becoming substantial. In this research we have compared two software quality models. We tried to see whether a model built on entire system which predicts subsystem and a model built on subsystem which predicts the same subsystem has similar, better or worst classification results. We used Classification And Regression Tree algorithm (CART) to build classification models. A case study is based on a very large telecommunication system.

Member of

FAU Theses and Dissertations

Software quality prediction using case-based reasoning

Model

Digital Document

Berkovich, Yevgeniy.

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

The ability to efficiently prevent faults in large software systems is a very important concern of software project managers. Successful testing allows us to build quality software systems. Unfortunately, it is not always possible to effectively test a system due to time, resources, or other constraints. A critical bug may cause catastrophic consequences, such as loss of life or very expensive equipment. We can facilitate testing by finding where faults are more likely to be hidden. Case-Based Reasoning (CBR) is one of many methodologies that make this process faster and cheaper by discovering faults early in the software life cycle. This is one of the methodologies used to predict software quality of the system by discovering fault-prone modules. We employ the SMART tool to facilitate CBR , using product and process metrics as independent variables. The study found that CBR is a robust tool capable of carrying out software quality prediction on its own with acceptable results. We also show that CBR's weaknesses do not hinder its effectiveness in finding misclassified modules.

Member of

FAU Theses and Dissertations

Information theory and software measurement

Model

Digital Document

Allen, Edward B.

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

Development of reliable, high quality, software requires study and understanding at each step of the development process. A basic assumption in the field of software measurement is that metrics of internal software attributes somehow relate to the intrinsic difficulty in understanding a program. Measuring the information content of a program attempts to indirectly quantify the comprehension task. Information theory based software metrics are attractive because they quantify the amount of information in a well defined framework. However, most information theory based metrics have been proposed with little reference to measurement theory fundamentals, and empirical validation of predictive quality models has been lacking. This dissertation proves that representative information theory based software metrics can be "meaningful" components of software quality models in the context of measurement theory. To this end, members of a major class of metrics are shown to be regular representations of Minimum Description Length or Variety of software attributes, and are interval scale. An empirical validation case study is presented that predicted faults in modules based on Operator Information. This metric is closely related to Harrison's Average Information Content Classification, which is the entropy of the operators. New general methods for calculating synthetic complexity at the system level and module level are presented, quantifying the joint information of an arbitrary set of primitive software measures. Since all kinds of information are not equally relevant to software quality factors, components of synthetic module complexity are also defined. Empirical case studies illustrate the potential usefulness of the proposed synthetic metrics. A metrics data base is often the key to a successful ongoing software metrics program. The contribution of any proposed metric is defined in terms of measured variation using information theory, irrespective of the metric's usefulness in quality models. This is of interest when full validation is not practical. Case studies illustrate the method.

Member of

FAU Theses and Dissertations

Multivariate modeling of software engineering measures

Model

Digital Document

Lanning, David Lee.

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

One goal of software engineers is to produce software products. An additional goal, that the software production must lead to profit, releases the power of the software product market. This market demands high quality products and tight cycles in the delivery of new and enhanced products. These market conditions motivate the search for engineering methods that help software producers ship products quicker, at lower cost, and with fewer defects. The control of software defects is key to meeting these market conditions. Thus, many software engineering tasks are concerned with software defects. This study considers two sources of variation in the distribution of software defects: software complexity and enhancement activity. Multivariate techniques treat defect activity, software complexity, and enhancement activity as related multivariate concepts. Applied techniques include principal components analysis, canonical correlation analysis, discriminant analysis, and multiple regression analysis. The objective of this study is to improve our understanding of software complexity and software enhancement activity as sources of variation in defect activity, and to apply this understanding to produce predictive and discriminant models useful during testing and maintenance tasks. These models serve to support critical software engineering decisions.

Member of

FAU Theses and Dissertations

Software quality modeling and analysis with limited or without defect data

Model

Digital Document

Seliya, Naeem A.

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

The key to developing high-quality software is the measurement and modeling of software quality. In practice, software measurements are often used as a resource to model and comprehend the quality of software. The use of software measurements to understand quality is accomplished by a software quality model that is trained using software metrics and defect data of similar, previously developed, systems. The model is then applied to estimate quality of the target software project. Such an approach assumes that defect data is available for all program modules in the training data. Various practical issues can cause an unavailability or limited availability of defect data from the previously developed systems. This dissertation presents innovative and practical techniques for addressing the problem of software quality analysis when there is limited or completely absent defect data. The proposed techniques for software quality analysis without defect data include an expert-based approach with unsupervised clustering and an expert-based approach with semi-supervised clustering. The proposed techniques for software quality analysis with limited defect data includes a semi-supervised classification approach with the Expectation-Maximization algorithm and an expert-based approach with semi-supervised clustering. Empirical case studies of software measurement datasets obtained from multiple NASA software projects are used to present and evaluate the different techniques. The empirical results demonstrate the attractiveness, benefit, and definite promise of the proposed techniques. The newly developed techniques presented in this dissertation is invaluable to the software quality practitioner challenged by the absence or limited availability of defect data from previous software development experiences.

Member of

FAU Theses and Dissertations

CBR-based software quality models and quality of data

Model

Digital Document

Xiao, Yudong.

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

The performance accuracy of software quality estimation models is influenced by several factors, including the following two important factors: performance of the prediction algorithm and the quality of data. This dissertation addresses these two factors, and consists of two components: (1) a proposed genetic algorithm (GA) based optimization of software quality models for accuracy enhancement, and (2) a proposed partitioning- and rule-based filter (PRBF) for noise detection toward improvement of data quality. We construct a generalized framework of our embedded GA-optimizer, and instantiate the GA-optimizer for three optimization problems in software quality engineering: parameter optimization for case-based reasoning (CBR) models; module rank optimization for module-order modeling (MOM); and structural optimization for our multi-strategy classification modeling approach, denoted RB2CBL. Empirical case studies using software measurement data from real-world software systems were performed for the optimization problems. The GA-optimization approaches improved software quality prediction accuracy, highlighting the practical benefits of using GA for solving optimization problems in software engineering. The proposed noise detection approach, PRBF, was empirically evaluated using data categorized into two classes. Empirical studies on artificially corrupted datasets and datasets with known (natural) noise demonstrated that PRBF can effectively detect both artificial and natural noise. The proposed filter is a stable and robust technique, and always provided optimal or near-optimal noise detection results. In addition, it is applicable on datasets with nominal and numerical attributes, as well as those with missing values. The PRBF technique supports two methods of noise detection: class noise detection and cost-sensitive noise detection. The former is an easy-to-use method and does not need parameter settings, while the latter is suited for applications where each class has a specific misclassification cost. PRBF can also be used iteratively to investigate the two general types of data noise: attribute and class noise.

Member of

FAU Theses and Dissertations

Software reliability engineering with genetic programming

Model

Digital Document

Liu, Yi.

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

Software reliability engineering plays a vital role in managing and controlling software quality. As an important method of software reliability engineering, software quality estimation modeling is useful in defining a cost-effective strategy to achieve a reliable software system. By predicting the faults in a software system, the software quality models can identify high-risk modules, and thus, these high-risk modules can be targeted for reliability enhancements. Strictly speaking, software quality modeling not only aims at lowering the misclassification rate, but also takes into account the costs of different misclassifications and the available resources of a project. As a new search-based algorithm, Genetic Programming (GP) can build a model without assuming the size, shape, or structure of a model. It can flexibly tailor the fitness functions to the objectives chosen by the customers. Moreover, it can optimize several objectives simultaneously in the modeling process, and thus, a set of multi-objective optimization solutions can be obtained. This research focuses on building software quality estimation models using GP. Several GP-based models of predicting the class membership of each software module and ranking the modules by a quality factor were proposed. The first model of categorizing the modules into fault-prone or not fault-prone was proposed by considering the distinguished features of the software quality classification task and GP. The second model provided quality-based ranking information for fault-prone modules. A decision tree-based software classification model was also proposed by considering accuracy and simplicity simultaneously. This new technique provides a new multi-objective optimization algorithm to build decision trees for real-world engineering problems, in which several trade-off objectives usually have to be taken into account at the same time. The fourth model was built to find multi-objective optimization solutions by considering both the expected cost of misclassification and available resources. Also, a new goal-oriented technique of building module-order models was proposed by directly optimizing several goals chosen by project analysts. The issues of GP , bloating and overfitting, were also addressed in our research. Data were collected from three industrial projects, and applied to validate the performance of the models. Results indicate that our proposed methods can achieve useful performance results. Moreover, some proposed methods can simultaneously optimize several different objectives of a software project management team.

Member of

FAU Theses and Dissertations

Count models for software quality estimation

Model

Digital Document

Gao, Kehan

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

The primary aim of software engineering is to produce quality software that is delivered on time, within budget, and fulfils all its requirements. A timely estimation of software quality can serve as a prerequisite in achieving high reliability of software-based systems. More specifically, software quality assurance efforts can be prioritized for targeting program modules that are most likely to have a high number of faults. Software quality estimation models are generally of two types: a classification model that predicts the class membership of modules into two or more quality-based classes, and a quantitative prediction model that estimates the number of faults (or some other software quality factor) that are likely to occur in software modules. In the literature, a variety of techniques have been developed for software quality estimation, most of which are suited for either prediction or classification but not for both, e.g., the multiple linear regression (only for prediction) and logistic regression (only for classification).

Member of

FAU Theses and Dissertations

Fuzzy logic techniques for software reliability engineering

Model

Digital Document

Xu, Zhiwei.

Khoshgoftaar, Taghi M.

Publisher

Florida Atlantic University

Description

Modern people are becoming more and more dependent on computers in their daily lives. Most industries, from automobile, avionics, oil, and telecommunications to banking, stocks, and pharmaceuticals, require computers to function. As the tasks required become more complex, the complexity of computer software and hardware has increased dramatically. As a consequence, the possibility of failure increases. As the requirements for and dependence on computers increases, the possibility of crises caused by computer failures also increases. High reliability is an important attribute for almost any software system. Consequently, software developers are seeking ways to forecast and improve quality before release. Since many quality factors cannot be measured until after the software becomes operational, software quality models are developed to predict quality factors based on measurements collected earlier in the life cycle. Due to incomplete information in the early life cycle of software development, software quality models with fuzzy characteristics usually perform better because fuzzy concepts deal with phenomenon that is vague in nature. This study focuses on the usage of fuzzy logic in software reliability engineering. Discussing will include the fuzzy expert systems and the application of fuzzy expert systems in early risk assessment; introducing the interval prediction using fuzzy regression modeling; demonstrating fuzzy rule extraction for fuzzy classification and its usage in software quality models; demonstrating the fuzzy identification, including extraction of both rules and membership functions from fuzzy data and applying the technique to software project cost estimations. The following methodologies were considered: nonparametric discriminant analysis, Z-test and paired t-test, neural networks, fuzzy linear regression, fuzzy nonlinear regression, fuzzy classification with maximum matched method, fuzzy identification with fuzzy clustering, and fuzzy projection. Commercial software systems and the COCOMO database are used throughout this dissertation to demonstrate the usefulness of concepts and to validate new ideas.

Member of

FAU Theses and Dissertations

Computer software--Quality control