Nepali Document Clustering using K-Means, Mini-Batch K-Means, and DBSCAN
dc.contributor.author | Maharjan, Aman | |
dc.date.accessioned | 2022-04-28T06:55:51Z | |
dc.date.available | 2022-04-28T06:55:51Z | |
dc.date.issued | 2018 | |
dc.description.abstract | Automated document clustering is the process of grouping documents into a small sets of meaningful and coherent collections. This research evaluates K-Means, Mini-Batch K-Means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithms using four performance measures: Homogeneity, Completeness, V-Measure and Silhouette Coefficient in the context of Nepali documents. Features extraction is done using Term Frequency– Inverse Document Frequency (TFIDF) and TFIDF+ Latent Semantic Indexing (LSI) combination. The empirical results shows that Mini-Batch K-Means performs better when using TFIDF only and K-Means performs better when using TFIDF + LSI. Similarly, in time constrained environments, the clustering time of Mini-Batch K-Means is better than other two algorithms. | en_US |
dc.identifier.uri | https://hdl.handle.net/20.500.14540/10020 | |
dc.language.iso | en_US | en_US |
dc.publisher | Department of Computer Science and Information Technology | en_US |
dc.subject | Nepali document clustering | en_US |
dc.subject | Mini-Batch K-Means | en_US |
dc.subject | DBSCAN | en_US |
dc.subject | Machine learning | en_US |
dc.title | Nepali Document Clustering using K-Means, Mini-Batch K-Means, and DBSCAN | en_US |
dc.type | Thesis | en_US |
local.academic.level | Masters | en_US |
local.institute.title | Central Department of Computer Science and Information Technology | en_US |