Word Embedding Based Feature Extraction for Nepali News Classification

Chaudhary, Ramesh Kumar

Please use this identifier to cite or link to this item: https://elibrary.tucl.edu.np/handle/123456789/10997

Full metadata record

DC Field	Value	Language
dc.contributor.author	Chaudhary, Ramesh Kumar	-
dc.date.accessioned	2022-06-09T05:03:31Z	-
dc.date.available	2022-06-09T05:03:31Z	-
dc.date.issued	2019	-
dc.identifier.uri	https://elibrary.tucl.edu.np/handle/123456789/10997	-
dc.description.abstract	A major challenge in topic classification (TC) is the high dimensionality of the feature space. Therefore, feature extraction (FE) plays a vital role in topic classification in particular and text mining in general. FE based on cosine similarity score is commonly used to reduce the dimensionality of datasets with tens or hundreds of thousands of features, which can be impossible to process further. In this study, TF-IDF (Term Frequency Inverse Document Frequency) term weighting is used to extract features. Selecting relevant features and determining how to encode them for a learning machine method have a vast impact on the learning machine methods ability to extract a good model. Count based feature extraction methods is compared with word to vector feature extraction techniques for Nepali news classification. The results show good classification performance when using the feature extraction techniques based on word to vector for less number of classes and drastically decrease the performance for large sample size. On the other hand result of classification count based technique shows consistent nearly performance for any number of classes. The overall performance of the TF-IDF (Term Frequency Inverse Document Frequency) is far better than both word to vector techniques.	en_US
dc.language.iso	en_US	en_US
dc.publisher	Department of Computer Science and Information Technology	en_US
dc.subject	Topic classification	en_US
dc.subject	Feature extraction	en_US
dc.subject	Cosine similarity score	en_US
dc.subject	TF-IDF	en_US
dc.title	Word Embedding Based Feature Extraction for Nepali News Classification	en_US
dc.type	Thesis	en_US
local.institute.title	Central Department of Computer Science and Information Technology	en_US
local.academic.level	Masters	en_US
Appears in Collections:	Computer Science & Information Technology

Files in This Item:

File	Description	Size	Format
final thesis.pdf		1.1 MB	Adobe PDF	View/Open

Show simple item record

TUCL eLibrary

Easy and open access to all types of digital resources of TUCL