Regression analysis

Model
Digital Document
Publisher
Florida Atlantic University
Description
The purpose of this study is to determine what factors could influence an economic agents' decision to travel or vacation in Florida. This study measures this decision by analyzing the state Division of Tourism estimates for visitors in light of changes in; national gross domestic product, non-aviation gasoline prices, average airfares, and exchange rates. This data was compiled on a quarterly basis form 1980 to 1993 and analyzed by employing Translog and Cobb-Douglas demand functional forms for use in regression analysis. Based upon the regression results, the Cobb-Douglas functional form best represents what has historically occurred in the real economic world and follows generally accepted micro-economic demand theory. The Cobb-Douglas techniques reveal that an economic agents' future income expectations, measured by GDP levels, has a significant influence on Florida visitor estimates and has a role in the decision to vacation in Florida.
Model
Digital Document
Publisher
Florida Atlantic University
Description
In this study we correlate academic and non-academic descriptors with Organic Chemistry final grades for students enrolled at a Florida public university. Using multiple regression analysis, the following predictors are analyzed for a sample population of 904 students: age, gender, ethnicity, academic classification, SAT scores, major, overall grade point average (GPA), semesters lapsed between courses, institution where General Chemistry was taken, prerequisite grades, and number of math and science courses taken with their respective grades. Results indicate strong correlations exist between final grade in Organic Chemistry, GPA and General Chemistry final grade. Additionally, Organic Chemistry final grades correlate with academic course load and the type of institution where General Chemistry was studied. We believe these results can be employing as a tool for advising students in planning their academic programs.
Model
Digital Document
Publisher
Florida Atlantic University
Description
The detection of the abnormal blood cells and particles in a blood test is essential in medical diagnosis. The detection rules, which are usually implemented in the widely used automated hematology analyzer, are therefore critical for the health and even lives of millions of people. The research endeavor of this thesis is on generating such detection rules using a supervised machine learning algorithm. The first part of this thesis studies the hematology data and surveys the popular classification algorithms. In the second part, the selected algorithm, CART, is implemented with deliberately selected parameters. In the third part, a modification of the algorithm, logical pruning with Enclose the Normal principle, is exercised. To extend the algorithm and to achieve better performance, I developed and implemented the idea of decision tree combinations. The research has proven to be successful by the achievement of good performance and reasonable detection rules.
Model
Digital Document
Publisher
Florida Atlantic University
Description
The cross-validated classification accuracy of predictive discriminant analysis (PDA) and logistic regression (LR) models was compared for the two-group classification problem. Thirty-four real data sets varying in number of cases, number of predictor variables, degree of group separation, relative group size, and equality of group covariance matrices were employed for the comparison. PDA models were built based on assumptions of multivariate normality and equal covariance matrices, and cases were classified using Tatsuoka's (1988, p. 351) minimum chi square rule. LR models were built using the International Mathematical and Statistical Library (IMSL) subroutine Categorical Generalized Linear Model (CTGLM), available with the 32-bit Microsoft Fortran v4.0 Powerstation. CTGLM uses a nonlinear approximation technique (Newton-Raphson) to determine maximum likelihood estimates of model parameters. The group with the higher log-likelihood probability was used as the LR prediction. Cross-validated hit-rate accuracy of PDA and LR models was estimated using the leave-one-out procedure. McNemar's (1947) statistic for correlated proportions was used in the statistical comparisons of PDA and LR hit rate estimates for separate-group and total-sample proportions (z = 2.58, a =.01). Total-sample and separate-group cross-validated classification accuracy obtained by PDA was not significantly different from that obtained by LR in any of the 31 data sets for which maximum likelihood estimates of LR model parameters could be calculated. This was true regardless of assumptions made about population sizes (i.e., equal or unequal). Neither theoretical nor data-based considerations were helpful in predicting these results. Although it does not appear from these data to make a difference which classification model is used, use of the method described in this study for comparing PDA and LR models will enable researchers to select the optimal classification model for a specific data set, regardless of data conditions.
Model
Digital Document
Publisher
Florida Atlantic University
Description
The primary aim of software engineering is to produce quality software that is delivered on time, within budget, and fulfils all its requirements. A timely estimation of software quality can serve as a prerequisite in achieving high reliability of software-based systems. More specifically, software quality assurance efforts can be prioritized for targeting program modules that are most likely to have a high number of faults. Software quality estimation models are generally of two types: a classification model that predicts the class membership of modules into two or more quality-based classes, and a quantitative prediction model that estimates the number of faults (or some other software quality factor) that are likely to occur in software modules. In the literature, a variety of techniques have been developed for software quality estimation, most of which are suited for either prediction or classification but not for both, e.g., the multiple linear regression (only for prediction) and logistic regression (only for classification).
Model
Digital Document
Publisher
Florida Atlantic University
Description
The focus of this thesis is to statistically model violent crime rates against population over the years 1960-2009 for the United States. We approach this question as to be of interest since the trend of population for individual states follows different patterns. We propose here a method which employs cubic spline regression modeling. First we introduce a minimum/maximum algorithm that will identify potential knots. Then we employ least squares estimation to find potential regression coefficients based upon the cubic spline model and the knots chosen by the minimum/maximum algorithm. We then utilize the best subsets regression method to aid in model selection in which we find the minimum value of the Bayesian Information Criteria. Finally, we preent the R2adj as a measure of overall goodness of fit of our selected model. We have found among the fifty states and Washington D.C., 42 out of 51 showed an R2adj value that was greater than 90%. We also present an overall model of the United States. Also, we show additional applications our algorithm for data which show a non linear association. It is hoped that our method can serve as a unified model for violent crime rate over future years.
Model
Digital Document
Publisher
Florida Atlantic University
Description
For a segmented regression system with an unknown change-point over two domains of a predictor, a new empirical likelihood ratio test statistic is proposed to test the null hypothesis of no change. The proposed method is a non-parametric method which releases the assumption of the error distribution. Under the null hypothesis of no change, the proposed test statistic is shown empirically Gumbel distributed with robust location and scale parameters under various parameter settings and error distributions. Under the alternative hypothesis with a change-point, the comparisons with two other methods (Chen's SIC method and Muggeo's SEG method) show that the proposed method performs better when the slope change is small. A power analysis is conducted to illustrate the performance of the test. The proposed method is also applied to analyze two real datasets: the plasma osmolality dataset and the gasoline price dataset.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Every day, business owners make important decisions trying to increase productivity. Smaller, family-owned companies, however, have a financial disadvantage over larger corporations. Through the analysis of one small business, Gardens Pool Supply, we provide the owners with answers to questions on how to reduce costs and increase profits.