Unsupervised Text Classification as Topic Annotation

Bharati, Nitu

Please use this identifier to cite or link to this item: https://elibrary.tucl.edu.np/handle/123456789/8967

Title:	Unsupervised Text Classification as Topic Annotation
Authors:	Bharati, Nitu
Keywords:	Text Classification;Identify
Issue Date:	Nov-2011
Publisher:	Pulchowk Campus
Institute Name:	Institute of Engineering
Level:	Masters
Citation:	Masters of Science in Information and Communication Engineering,
Abstract:	Multi-class Text Classification is the task of classifying a given text into one or more than one classes taken form a set of predefined classes. A class can be a topic of a text, for example, a class of any text about a movie can be ``entertainment’’. In this research I investigate unsupervised learning to accurately identify the topic of a given text. The cost involved in labeling a large amount of data and availability of huge amount of unlabeled data makes unsupervised learning an ideal choice. The probabilistic algorithm used for text classification can be termed as topic modeling and is capable to extract multiple topics within a single text of a document. LDA model used in this report exploits co-occurrence patterns of words in documents to extract semantically meaningful probabilistic clusters of words called topics .Each of those clusters is labeled using the significant terms selected in each cluster. Semantic distance between the significant terms from the clusters and Wikipedia documents is measured to identify labels for each cluster.
Description:	Multi-class Text Classification is the task of classifying a given text into one or more than one classes taken form a set of predefined classes.
URI:	https://elibrary.tucl.edu.np/handle/123456789/8967
Appears in Collections:	Electronics and Computer Engineering

Files in This Item:

File	Description	Size	Format
066MSI612pdf.pdf		920.16 kB	Adobe PDF	View/Open

Show full item record

TUCL eLibrary

Easy and open access to all types of digital resources of TUCL