Text Mining

Model
Digital Document
Publisher
Florida Atlantic University
Description
The amount of textual data that produce every minute on the internet is extremely high. Processing of this tremendous volume of mostly unstructured data is not a straightforward function. But the enormous amount of useful information that lay down on them motivate scientists to investigate efficient and effective techniques and algorithms to discover meaningful patterns. Social network applications provide opportunities for people around the world to be in contact and share their valuable knowledge, such as chat, comments, and discussion boards. People usually do not care about spelling and accurate grammatical construction of a sentence in everyday life conversations. Therefore, extracting information from such datasets are more complicated. Text mining can be a solution to this problem. Text mining is a knowledge
discovery process used to extract patterns from natural language. Application of text mining techniques on social networking websites can reveal a significant amount of information. Text mining in conjunction with social networks can be used for finding a general opinion about any special subject, human thinking patterns, and group identification. In this study, we investigate machine learning methods in textual data in six chapters.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Many current application domains of machine learning and arti cial intelligence
involve knowledge discovery from text, such as sentiment analysis, document
ontology, and spam detection. Humans have years of experience and training with
language, enabling them to understand complicated, nuanced text passages with relative
ease. A text classi er attempts to emulate or replicate this knowledge so that
computers can discriminate between concepts encountered in text; however, learning
high-level concepts from text, such as those found in many applications of text classi-
cation, is a challenging task due to the many challenges associated with text mining
and classi cation. Recently, classi ers trained using arti cial neural networks have
been shown to be e ective for a variety of text mining tasks. Convolutional neural
networks have been trained to classify text from character-level input, automatically
learn high-level abstract representations and avoiding the need for human engineered
features.
This dissertation proposes two new techniques for character-level learning,
log(m) character embedding and convolutional window classi cation. Log(m) embedding
is a new character-vector representation for text data that is more compact and memory e cient than previous embedding vectors. Convolutional window classi
cation is a technique for classifying long documents, i.e. documents with lengths
exceeding the input dimension of the neural network. Additionally, we investigate the
performance of convolutional neural networks combined with long short-term memory
networks, explore how document length impacts classi cation performance and
compare performance of neural networks against non-neural network-based learners
in text classi cation tasks.