Genetic programming (Computer science)

Model
Digital Document
Publisher
Florida Atlantic University
Description
Three major problems make Genetic Programming unfeasible or impractical
for real world problems.
The first is the excessive time complexity.In nature the evolutionary process
can take millions of years, a time frame that is clearly not acceptable for the solution
of problems on a computer. In order to apply Genetic Programming to real world
problems, it is essential that its efficiency be improved.
The second is called overfitting (where results are inaccurate outside the
training data). In a paper[36] for the Federal Reserve Bank, authors Neely and
Weller state “a perennial problem with using flexible, powerful search procedures
like Genetic Programming is overfitting, the finding of spurious patterns in the data.
Given the well-documented tendency for the genetic program to overfit the data it
is necessary to design procedures to mitigate this.”
The third is the difficulty of determining optimal control parameters for the
Genetic Programming process. Control parameters control the evolutionary process. They include settings such as, the size of the population and the number of generations
to be run. In his book[45], Banzhaf describes this problem, “The bad
news is that Genetic Programming is a young field and the effect of using various
combinations of parameters is just beginning to be explored.”
We address these problems by implementing and testing a number of novel
techniques and improvements to the Genetic Programming process. We conduct
experiments using data sets of various degrees of difficulty to demonstrate success
with a high degree of statistical confidence.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Genetic Programming is an evolutionary technique for searching through the space of S-expressions for programs that represent optimal or acceptable solutions to a given problem. Genetic Programming often has difficulty in finding the appropriate numeric constants to use in leaf nodes of the S-expressions. This thesis describes the use of local search algorithms to search for numeric constants that will improve the S-expressions found by Genetic Programming. Three methods, Multi-Dimensional Hill Climbing, Vector Hill Climbing, and Numeric Mutation are combined with Genetic Programming to create hybrid systems. The performance of these hybrid systems is analyzed and future directions for improving Genetic Programming with the use of hybrid systems are discussed.
Model
Digital Document
Publisher
Florida Atlantic University
Description
The performance accuracy of software quality estimation models is influenced by several factors, including the following two important factors: performance of the prediction algorithm and the quality of data. This dissertation addresses these two factors, and consists of two components: (1) a proposed genetic algorithm (GA) based optimization of software quality models for accuracy enhancement, and (2) a proposed partitioning- and rule-based filter (PRBF) for noise detection toward improvement of data quality. We construct a generalized framework of our embedded GA-optimizer, and instantiate the GA-optimizer for three optimization problems in software quality engineering: parameter optimization for case-based reasoning (CBR) models; module rank optimization for module-order modeling (MOM); and structural optimization for our multi-strategy classification modeling approach, denoted RB2CBL. Empirical case studies using software measurement data from real-world software systems were performed for the optimization problems. The GA-optimization approaches improved software quality prediction accuracy, highlighting the practical benefits of using GA for solving optimization problems in software engineering. The proposed noise detection approach, PRBF, was empirically evaluated using data categorized into two classes. Empirical studies on artificially corrupted datasets and datasets with known (natural) noise demonstrated that PRBF can effectively detect both artificial and natural noise. The proposed filter is a stable and robust technique, and always provided optimal or near-optimal noise detection results. In addition, it is applicable on datasets with nominal and numerical attributes, as well as those with missing values. The PRBF technique supports two methods of noise detection: class noise detection and cost-sensitive noise detection. The former is an easy-to-use method and does not need parameter settings, while the latter is suited for applications where each class has a specific misclassification cost. PRBF can also be used iteratively to investigate the two general types of data noise: attribute and class noise.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Software reliability engineering plays a vital role in managing and controlling software quality. As an important method of software reliability engineering, software quality estimation modeling is useful in defining a cost-effective strategy to achieve a reliable software system. By predicting the faults in a software system, the software quality models can identify high-risk modules, and thus, these high-risk modules can be targeted for reliability enhancements. Strictly speaking, software quality modeling not only aims at lowering the misclassification rate, but also takes into account the costs of different misclassifications and the available resources of a project. As a new search-based algorithm, Genetic Programming (GP) can build a model without assuming the size, shape, or structure of a model. It can flexibly tailor the fitness functions to the objectives chosen by the customers. Moreover, it can optimize several objectives simultaneously in the modeling process, and thus, a set of multi-objective optimization solutions can be obtained. This research focuses on building software quality estimation models using GP. Several GP-based models of predicting the class membership of each software module and ranking the modules by a quality factor were proposed. The first model of categorizing the modules into fault-prone or not fault-prone was proposed by considering the distinguished features of the software quality classification task and GP. The second model provided quality-based ranking information for fault-prone modules. A decision tree-based software classification model was also proposed by considering accuracy and simplicity simultaneously. This new technique provides a new multi-objective optimization algorithm to build decision trees for real-world engineering problems, in which several trade-off objectives usually have to be taken into account at the same time. The fourth model was built to find multi-objective optimization solutions by considering both the expected cost of misclassification and available resources. Also, a new goal-oriented technique of building module-order models was proposed by directly optimizing several goals chosen by project analysts. The issues of GP , bloating and overfitting, were also addressed in our research. Data were collected from three industrial projects, and applied to validate the performance of the models. Results indicate that our proposed methods can achieve useful performance results. Moreover, some proposed methods can simultaneously optimize several different objectives of a software project management team.