Word Embedding Based Feature Extraction for Nepali News Classification

dc.contributor.authorChaudhary, Ramesh Kumar
dc.date.accessioned2022-06-09T05:03:31Z
dc.date.available2022-06-09T05:03:31Z
dc.date.issued2019
dc.description.abstractA major challenge in topic classification (TC) is the high dimensionality of the feature space. Therefore, feature extraction (FE) plays a vital role in topic classification in particular and text mining in general. FE based on cosine similarity score is commonly used to reduce the dimensionality of datasets with tens or hundreds of thousands of features, which can be impossible to process further. In this study, TF-IDF (Term Frequency Inverse Document Frequency) term weighting is used to extract features. Selecting relevant features and determining how to encode them for a learning machine method have a vast impact on the learning machine methods ability to extract a good model. Count based feature extraction methods is compared with word to vector feature extraction techniques for Nepali news classification. The results show good classification performance when using the feature extraction techniques based on word to vector for less number of classes and drastically decrease the performance for large sample size. On the other hand result of classification count based technique shows consistent nearly performance for any number of classes. The overall performance of the TF-IDF (Term Frequency Inverse Document Frequency) is far better than both word to vector techniques.en_US
dc.identifier.urihttps://hdl.handle.net/20.500.14540/10997
dc.language.isoen_USen_US
dc.publisherDepartment of Computer Science and Information Technologyen_US
dc.subjectTopic classificationen_US
dc.subjectFeature extractionen_US
dc.subjectCosine similarity scoreen_US
dc.subjectTF-IDFen_US
dc.titleWord Embedding Based Feature Extraction for Nepali News Classificationen_US
dc.typeThesisen_US
local.academic.levelMastersen_US
local.institute.titleCentral Department of Computer Science and Information Technologyen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
final thesis.pdf
Size:
1.08 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: