Classification techniques for noisy and imbalanced data

File
Contributors
Publisher
Florida Atlantic University
Date Issued
2009
Description
Machine learning techniques allow useful insight to be distilled from the increasingly massive repositories of data being stored. As these data mining techniques can only learn patterns actually present in the data, it is important that the desired knowledge be faithfully and discernibly contained therein. Two common data quality issues that often affect important real life classification applications are class noise and class imbalance. Class noise, where dependent attribute values are recorded erroneously, misleads a classifier and reduces predictive performance. Class imbalance occurs when one class represents only a small portion of the examples in a dataset, and, in such cases, classifiers often display poor accuracy on the minority class. The reduction in classification performance becomes even worse when the two issues occur simultaneously. To address the magnified difficulty caused by this interaction, this dissertation performs thorough empirical investigations of several techniques for dealing with class noise and imbalanced data. Comprehensive experiments are performed to assess the effects of the classification techniques on classifier performance, as well as how the level of class imbalance, level of class noise, and distribution of class noise among the classes affects results. An empirical analysis of classifier based noise detection efficiency appears first. Subsequently, an intelligent data sampling technique, based on noise detection, is proposed and tested. Several hybrid classifier ensemble techniques for addressing class noise and imbalance are introduced. Finally, a detailed empirical investigation of classification filtering is performed to determine best practices.
Note

by Amri Napolitano.

Language
Type
Form
Extent
xvii, 218 p. : ill.
Identifier
501313430
OCLC Number
501313430
Additional Information
by Amri Napolitano.
Thesis (Ph.D.)--Florida Atlantic University, 2009.
Includes bibliography.
Electronic reproduction. Boca Raton, Fla., 2009. Mode of access: World Wide Web.
Date Backup
2009
Date Text
2009
Date Issued (EDTF)
2009
Extension


FAU
FAU
admin_unit="FAU01", ingest_id="ing4967", creator="creator:SPATEL", creation_date="2010-01-22 15:59:13", modified_by="super:SPATEL", modification_date="2011-04-13 11:37:57"

IID
FADT369201
Issuance
monographic
Person Preferred Name

Napolitano, Amri E.
Graduate College
Physical Description

electronic
xvii, 218 p. : ill.
Title Plain
Classification techniques for noisy and imbalanced data
Use and Reproduction
http://rightsstatements.org/vocab/InC/1.0/
Origin Information


Boca Raton, Fla.

monographic
Florida Atlantic University
2009
Physical Location
FBoU FAUER
Place

Boca Raton, Fla.
Title
Classification techniques for noisy and imbalanced data
Other Title Info

Classification techniques for noisy and imbalanced data