Please use this identifier to cite or link to this item: https://elibrary.tucl.edu.np/handle/123456789/11151
Full metadata record
DC FieldValueLanguage
dc.contributor.authorDahal, Yub Raj-
dc.date.accessioned2022-06-13T06:25:03Z-
dc.date.available2022-06-13T06:25:03Z-
dc.date.issued2019-
dc.identifier.urihttps://elibrary.tucl.edu.np/handle/123456789/11151-
dc.description.abstractClustering is an important technique to separate data categories based on their feature similarity. Clustering belong to unsupervised type of machine learning algorithms. Among many clustering algorithms, three representative algorithms namely K-means, X-means and Expectation Maximization are experimented for the Nepali news clustering problem in this research work. News clustering is the task of categorizing news into groups that share similar interests. Clustering algorithms are evaluated for optimal performances based on cluster evaluation metrics and execution time. Evaluation metrics used are Dunn index, DB index and CH index. Execution time includes clustering time and training time. TF-IDF is used as a news embedding representation. Algorithms are also evaluated with reduced feature dimensions by applying PCA. To select the winner algorithm and setting the values of DB index, training time and clustering time must be lower and value of CH index and Dunn index must be higher. So, based upon the evaluation results, we conclude the winning algorithm and strategies in some states as follows. When feature dimension is high (>= 10000) K-Means perform better then others. When applied PCA to reduce feature space, EM algorithm better performs than others. With reduced feature space, K-Means still performs better then X-Means clustering algorithm.en_US
dc.language.isoen_USen_US
dc.publisherDepartment of Computer Science and Information Technologyen_US
dc.subjectNews clusteringen_US
dc.subjectNatural language processingen_US
dc.subjectK-Meansen_US
dc.subjectPCAen_US
dc.titleComparative Study of Clustering Algorithms for Nepali Newsen_US
dc.typeThesisen_US
local.institute.titleCentral Department of Computer Science and Information Technologyen_US
local.academic.levelMastersen_US
Appears in Collections:Computer Science & Information Technology

Files in This Item:
File Description SizeFormat 
thesis (4).pdf985.28 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.