Please use this identifier to cite or link to this item:
https://elibrary.tucl.edu.np/handle/123456789/8967
Title: | Unsupervised Text Classification as Topic Annotation |
Authors: | Bharati, Nitu |
Keywords: | Text Classification;Identify |
Issue Date: | Nov-2011 |
Publisher: | Pulchowk Campus |
Institute Name: | Institute of Engineering |
Level: | Masters |
Citation: | Masters of Science in Information and Communication Engineering, |
Abstract: | Multi-class Text Classification is the task of classifying a given text into one or more than one classes taken form a set of predefined classes. A class can be a topic of a text, for example, a class of any text about a movie can be ``entertainment’’. In this research I investigate unsupervised learning to accurately identify the topic of a given text. The cost involved in labeling a large amount of data and availability of huge amount of unlabeled data makes unsupervised learning an ideal choice. The probabilistic algorithm used for text classification can be termed as topic modeling and is capable to extract multiple topics within a single text of a document. LDA model used in this report exploits co-occurrence patterns of words in documents to extract semantically meaningful probabilistic clusters of words called topics .Each of those clusters is labeled using the significant terms selected in each cluster. Semantic distance between the significant terms from the clusters and Wikipedia documents is measured to identify labels for each cluster. |
Description: | Multi-class Text Classification is the task of classifying a given text into one or more than one classes taken form a set of predefined classes. |
URI: | https://elibrary.tucl.edu.np/handle/123456789/8967 |
Appears in Collections: | Electronics and Computer Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
066MSI612pdf.pdf | 920.16 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.