Stability analysis of feature selection approaches with low quality data

File
Contributors
Publisher
Florida Atlantic University
Date Issued
2011
Description
One of the greatest challenges to data mining is erroneous or noisy data. Several studies have noted the weak performance of classification models trained from low quality data. This dissertation shows that low quality data can also impact the effectiveness of feature selection, and considers the effect of class noise on various feature ranking techniques. It presents a novel approach to feature ranking based on ensemble learning and assesses these ensemble feature selection techniques in terms of their robustness to class noise. It presents a noise-based stability analysis that measures the degree of agreement between a feature ranking techniques output on a clean dataset versus its outputs on the same dataset but corrupted with different combinations of noise level and noise distribution. It then considers classification performances from models built with a subset of the original features obtained after applying feature ranking techniques on noisy data. It proposes the focused ensemble feature ranking as a noise-tolerant approach to feature selection and compares focused ensembles with general ensembles in terms of the ability of the selected features to withstand the impact of class noise when used to build classification models. Finally, it explores three approaches for addressing the combined problem of high dimensionality and class imbalance. Collectively, this research shows the importance of considering class noise when performing feature selection.
Note

by Wilker Altidor.

Language
Type
Form
Extent
xix,, 235 p. : ill. (some col.)
Identifier
748562609
OCLC Number
748562609
Additional Information
by Wilker Altidor.
Thesis (Ph.D.)--Florida Atlantic University, 2011.
Includes bibliography.
Electronic reproduction. Boca Raton, Fla., 2011. Mode of access: World Wide Web.
Date Backup
2011
Date Text
2011
Date Issued (EDTF)
2011
Extension


FAU
FAU
admin_unit="FAU01", ingest_id="ing10360", creator="creator:NBURWICK", creation_date="2011-09-06 10:06:25", modified_by="super:SPATEL", modification_date="2011-09-06 10:17:18"

IID
FADT3174501
Issuance
monographic
Person Preferred Name

Altidor, Wilker.
Graduate College
Physical Description

electronic
xix,, 235 p. : ill. (some col.)
Title Plain
Stability analysis of feature selection approaches with low quality data
Use and Reproduction
http://rightsstatements.org/vocab/InC/1.0/
Origin Information


Boca Raton, Fla.

monographic
Florida Atlantic University
2011
Physical Location
FBoU FAUER
Place

Boca Raton, Fla.
Title
Stability analysis of feature selection approaches with low quality data
Other Title Info

Stability analysis of feature selection approaches with low quality data