Generalized Feature Embedding Learning for Clustering and Classication

File
Publisher
Florida Atlantic University
Date Issued
2018
EDTF Date Created
2018
Description
Data comes in many di erent shapes and sizes. In real life applications it is
common that data we are studying has features that are of varied data types. This
may include, numerical, categorical, and text. In order to be able to model this data
with machine learning algorithms, it is required that the data is typically in numeric
form. Therefore, for data that is not originally numerical, it must be transformed to
be able to be used as input into these algorithms.
Along with this transformation it is common that data we study has many
features relative to the number of samples in the data. It is often desirable to reduce
the number of features that are being trained in a model to eliminate noise and reduce
time in training. This problem of high dimensionality can be approached through
feature selection, feature extraction, or feature embedding. Feature selection seeks to
identify the most essential variables in a dataset that will lead to a parsimonious model
and high performing results, while feature extraction and embedding are techniques
that utilize a mathematical transformation of the data into a represented space. As a
byproduct of using a new representation, we are able to reduce the dimension greatly
without sacri cing performance. Oftentimes, by using embedded features we observe a gain in performance.
Though extraction and embedding methods may be powerful for isolated machine
learning problems, they do not always generalize well. Therefore, we are motivated
to illustrate a methodology that can be applied to any data type with little
pre-processing. The methods we develop can be applied in unsupervised, supervised,
incremental, and deep learning contexts. Using 28 benchmark datasets as examples
which include di erent data types, we construct a framework that can be applied for
general machine learning tasks.
The techniques we develop contribute to the eld of dimension reduction and
feature embedding. Using this framework, we make additional contributions to eigendecomposition
by creating an objective matrix that includes three main vital components.
The rst being a class partitioned row and feature product representation
of one-hot encoded data. Secondarily, the derivation of a weighted adjacency matrix
based on class label relationships. Finally, by the inner product of these aforementioned
values, we are able to condition the one-hot encoded data generated from the
original data prior to eigenvector decomposition. The use of class partitioning and
adjacency enable subsequent projections of the data to be trained more e ectively
when compared side-to-side to baseline algorithm performance. Along with this improved
performance, we can adjust the dimension of the subsequent data arbitrarily.
In addition, we also show how these dense vectors may be used in applications to
order the features of generic data for deep learning.
In this dissertation, we examine a general approach to dimension reduction and
feature embedding that utilizes a class partitioned row and feature representation, a
weighted approach to instance similarity, and an adjacency representation. This general
approach has application to unsupervised, supervised, online, and deep learning.
In our experiments of 28 benchmark datasets, we show signi cant performance gains
in clustering, classi cation, and training time.
Note

Includes bibliography.

Language
Type
Extent
128 p.
Identifier
FA00013063
Additional Information
Includes bibliography.
Dissertation (Ph.D.)--Florida Atlantic University, 2018.
FAU Electronic Theses and Dissertations Collection
Date Backup
2018
Date Created Backup
2018
Date Text
2018
Date Created (EDTF)
2018
Date Issued (EDTF)
2018
Extension


FAU

IID
FA00013063
Person Preferred Name

Golinko, Eric David

author

Graduate College
Physical Description

application/pdf
128 p.
Title Plain
Generalized Feature Embedding Learning for Clustering and Classication
Use and Reproduction
Copyright © is held by the author, with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
http://rightsstatements.org/vocab/InC/1.0/
Origin Information

2018
2018
Florida Atlantic University

Boca Raton, Fla.

Physical Location
Florida Atlantic University Libraries
Place

Boca Raton, Fla.
Sub Location
Digital Library
Title
Generalized Feature Embedding Learning for Clustering and Classication
Other Title Info

Generalized Feature Embedding Learning for Clustering and Classication