Combinatorial group theory

Model
Digital Document
Publisher
Florida Atlantic University
Description
Class imbalance tends to cause inferior performance in data mining learners,
particularly with regard to predicting the minority class, which generally imposes
a higher misclassification cost. This work explores the benefits of using genetic
algorithms (GA) to develop classification models which are better able to deal with
the problems encountered when mining datasets which suffer from class imbalance.
Using GA we evolve configuration parameters suited for skewed datasets for three
different learners: artificial neural networks, 0 4.5 decision trees, and RIPPER. We
also propose a novel technique called evolutionary sampling which works to remove
noisy and unnecessary duplicate instances so that the sampled training data will
produce a superior classifier for the imbalanced dataset. Our GA fitness function
uses metrics appropriate for dealing with class imbalance, in particular the area
under the ROC curve. We perform extensive empirical testing on these techniques
and compare the results with seven exist ing sampling methods.
Model
Digital Document
Publisher
Florida Atlantic University
Description
The main objective of this thesis was to find the full automorphism groups of
finite Desarguesian planes. A set of homologies were used to generate the automorphism
group when the order of the plane was prime. When the order was a prime
power Pa,a ≠ 1 the Frobenius automorphism was added to the set of homologies,
and then the full automorphism group was generated. The Frobenius automorphism
was found by using the planar ternary ring derived from a coordinatization of the
plane.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Imbalanced class distributions typically cause poor classifier performance on the minority class, which also tends to be the class with the highest cost of mis-classification. Data sampling is a common solution to this problem, and numerous sampling techniques have been proposed to address it. Prior research examining the performance of these techniques has been narrow and limited. This work uses thorough empirical experimentation to compare the performance of seven existing data sampling techniques using five different classifiers and four different datasets. The work addresses which sampling techniques produce the best performance in the presence of class unbalance, which classifiers are most robust to the problem, as well as which sampling techniques perform better or worse with each classifier. Extensive statistical analysis of these results is provided, in addition to an examination of the qualitative effects of the sampling techniques on the types of predictions made by the C4.5 classifier.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Machine learning techniques allow useful insight to be distilled from the increasingly massive repositories of data being stored. As these data mining techniques can only learn patterns actually present in the data, it is important that the desired knowledge be faithfully and discernibly contained therein. Two common data quality issues that often affect important real life classification applications are class noise and class imbalance. Class noise, where dependent attribute values are recorded erroneously, misleads a classifier and reduces predictive performance. Class imbalance occurs when one class represents only a small portion of the examples in a dataset, and, in such cases, classifiers often display poor accuracy on the minority class. The reduction in classification performance becomes even worse when the two issues occur simultaneously. To address the magnified difficulty caused by this interaction, this dissertation performs thorough empirical investigations of several techniques for dealing with class noise and imbalanced data. Comprehensive experiments are performed to assess the effects of the classification techniques on classifier performance, as well as how the level of class imbalance, level of class noise, and distribution of class noise among the classes affects results. An empirical analysis of classifier based noise detection efficiency appears first. Subsequently, an intelligent data sampling technique, based on noise detection, is proposed and tested. Several hybrid classifier ensemble techniques for addressing class noise and imbalance are introduced. Finally, a detailed empirical investigation of classification filtering is performed to determine best practices.
Model
Digital Document
Publisher
Florida Atlantic University
Description
This dissertation contains results of the candidate's research on the generalized discrete logarithm problem (GDLP) and its applications to cryptology, in non-abelian groups. The projective special linear groups PSL(2; p), where p is a prime, represented by matrices over the eld of order p, are investigated as potential candidates for implementation of the GDLP. Our results show that the GDLP with respect to specic pairs of PSL(2; p) generators is weak. In such cases the groups PSL(2; p) are not good candidates for cryptographic applications which rely on the hardness of the GDLP. Results are presented on generalizing existing cryptographic primitives and protocols based on the hardness of the GDLP in non-abelian groups. A special instance of a cryptographic primitive dened over the groups SL(2; 2n), the Tillich-Zemor hash function, has been cryptanalyzed. In particular, an algorithm for constructing collisions of short length for any input parameter is presented. A series of mathematical results are developed to support the algorithm and to prove existence of short collisions.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Possibly the largest problem when working in bioinformatics is the large amount of data to sift through to find useful information. This thesis shows that the use of feature selection (a method of removing irrelevant and redundant information from the dataset) is a useful and even necessary technique to use in these large datasets. This thesis also presents a new method in comparing classes to each other through the use of their features. It also provides a thorough analysis of the use of various feature selection techniques and classifier in different scenarios from bioinformatics. Overall, this thesis shows the importance of the use of feature selection in bioinformatics.
Model
Digital Document
Publisher
Florida Atlantic University
Description
One of the greatest challenges to data mining is erroneous or noisy data. Several studies have noted the weak performance of classification models trained from low quality data. This dissertation shows that low quality data can also impact the effectiveness of feature selection, and considers the effect of class noise on various feature ranking techniques. It presents a novel approach to feature ranking based on ensemble learning and assesses these ensemble feature selection techniques in terms of their robustness to class noise. It presents a noise-based stability analysis that measures the degree of agreement between a feature ranking techniques output on a clean dataset versus its outputs on the same dataset but corrupted with different combinations of noise level and noise distribution. It then considers classification performances from models built with a subset of the original features obtained after applying feature ranking techniques on noisy data. It proposes the focused ensemble feature ranking as a noise-tolerant approach to feature selection and compares focused ensembles with general ensembles in terms of the ability of the selected features to withstand the impact of class noise when used to build classification models. Finally, it explores three approaches for addressing the combined problem of high dimensionality and class imbalance. Collectively, this research shows the importance of considering class noise when performing feature selection.
Model
Digital Document
Publisher
Florida Atlantic University
Description
The minimal logarithmic signature conjecture states that in any finite simple group there are subsets Ai, 1 i s such that the size jAij of each Ai is a prime or 4 and each element of the group has a unique expression as a product Qs i=1 ai of elements ai 2 Ai. Logarithmic signatures have been used in the construction of several cryptographic primitives since the late 1970's [3, 15, 17, 19, 16]. The conjecture is shown to be true for various families of simple groups including cyclic groups, An, PSLn(q) when gcd(n; q 1) is 1, 4 or a prime and several sporadic groups [10, 9, 12, 14, 18]. This dissertation is devoted to proving that the conjecture is true for a large class of simple groups of Lie type called classical groups. The methods developed use the structure of these groups as isometry groups of bilinear or quadratic forms. A large part of the construction is also based on the Bruhat and Levi decompositions of parabolic subgroups of these groups. In this dissertation the conjecture is shown to be true for the following families of simple groups: the projective special linear groups PSLn(q), the projective symplectic groups PSp2n(q) for all n and q a prime power, and the projective orthogonal groups of positive type + 2n(q) for all n and q an even prime power. During the process, the existence of minimal logarithmic signatures (MLS's) is also proven for the linear groups: GLn(q), PGLn(q), SLn(q), the symplectic groups: Sp2n(q) for all n and q a prime power, and for the orthogonal groups of plus type O+ 2n(q) for all n and q an even prime power. The constructions in most of these cases provide cyclic MLS's. Using the relationship between nite groups of Lie type and groups with a split BN-pair, it is also shown that every nite group of Lie type can be expressed as a disjoint union of sets, each of which has an MLS.
Model
Digital Document
Publisher
Florida Atlantic University
Description
A logarithmic signature (LS) for a nite group G is an ordered tuple = [A1;A2; : : : ;An] of subsets Ai of G, such that every element g 2 G can be expressed uniquely as a product g = a1a2 : : : ; an, where ai 2 Ai. Logarithmic signatures were dened by Magliveras in the late 1970's for arbitrary nite groups in the context of cryptography. They were also studied for abelian groups by Hajos in the 1930's. The length of an LS is defined to be `() = Pn i=1 jAij. It can be easily seen that for a group G of order Qk j=1 pj mj , the length of any LS for G satises `() Pk j=1mjpj . An LS for which this lower bound is achieved is called a minimal logarithmic signature (MLS). The MLS conjecture states that every finite simple group has an MLS. If the conjecture is true then every finite group will have an MLS. The conjecture was shown to be true by a number of researchers for a few classes of finite simple groups. However, the problem is still wide open. This dissertation addresses the MLS conjecture for the classical simple groups. In particular, it is shown that MLS's exist for the symplectic groups Sp2n(q), the orthogonal groups O 2n(q0) and the corresponding simple groups PSp2n(q) and 2n(q0) for all n 2 N, prime power q and even prime power q0. The existence of an MLS is also shown for all unitary groups GUn(q) for all odd n and q = 2s under the assumption that an MLS exists for GUn 1(q). The methods used are very general and algorithmic in nature and may be useful for studying all nite simple groups of Lie type and possibly also the sporadic groups. The blocks of logarithmic signatures constructed in this dissertation have cyclic structure and provide a sort of cyclic decomposition for these classical groups.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Collaborative filtering (CF), a very successful recommender system, is one of the applications of data mining for incomplete data. The main objective of CF is to make accurate recommendations from highly sparse user rating data. My contributions to this research topic include proposing the frameworks of imputation-boosted collaborative filtering (IBCF) and imputed neighborhood based collaborative filtering (INCF). We also proposed a model-based CF technique, TAN-ELR CF, and two hybrid CF algorithms, sequential mixture CF and joint mixture CF. Empirical results show that our proposed CF algorithms have very good predictive performances. In the investigation of applying imputation techniques in mining incomplete data, we proposed imputation-helped classifiers, and VCI predictors (voting on classifications from imputed learning sets), both of which resulted in significant improvement in classification performance for incomplete data over conventional machine learned classifiers, including kNN, neural network, one rule, decision table, SVM, logistic regression, decision tree (C4.5), random forest, and decision list (PART), and the well known Bagging predictors. The main imputation techniques involved in these algorithms include EM (expectation maximization) and BMI (Bayesian multiple imputation).