Nepali Document Clustering using K-Means, Mini-Batch K-Means, and DBSCAN

dc.contributor.authorMaharjan, Aman
dc.date.accessioned2022-04-28T06:55:51Z
dc.date.available2022-04-28T06:55:51Z
dc.date.issued2018
dc.description.abstractAutomated document clustering is the process of grouping documents into a small sets of meaningful and coherent collections. This research evaluates K-Means, Mini-Batch K-Means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithms using four performance measures: Homogeneity, Completeness, V-Measure and Silhouette Coefficient in the context of Nepali documents. Features extraction is done using Term Frequency– Inverse Document Frequency (TFIDF) and TFIDF+ Latent Semantic Indexing (LSI) combination. The empirical results shows that Mini-Batch K-Means performs better when using TFIDF only and K-Means performs better when using TFIDF + LSI. Similarly, in time constrained environments, the clustering time of Mini-Batch K-Means is better than other two algorithms.en_US
dc.identifier.urihttps://hdl.handle.net/20.500.14540/10020
dc.language.isoen_USen_US
dc.publisherDepartment of Computer Science and Information Technologyen_US
dc.subjectNepali document clusteringen_US
dc.subjectMini-Batch K-Meansen_US
dc.subjectDBSCANen_US
dc.subjectMachine learningen_US
dc.titleNepali Document Clustering using K-Means, Mini-Batch K-Means, and DBSCANen_US
dc.typeThesisen_US
local.academic.levelMastersen_US
local.institute.titleCentral Department of Computer Science and Information Technologyen_US
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
thesis.pdf
Size:
608.61 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: