Nepali Document Clustering using K-Means, Mini-Batch K-Means, and DBSCAN

dc.contributor.author	Maharjan, Aman
dc.date.accessioned	2022-04-28T06:55:51Z
dc.date.available	2022-04-28T06:55:51Z
dc.date.issued	2018
dc.description.abstract	Automated document clustering is the process of grouping documents into a small sets of meaningful and coherent collections. This research evaluates K-Means, Mini-Batch K-Means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithms using four performance measures: Homogeneity, Completeness, V-Measure and Silhouette Coefficient in the context of Nepali documents. Features extraction is done using Term Frequency– Inverse Document Frequency (TFIDF) and TFIDF+ Latent Semantic Indexing (LSI) combination. The empirical results shows that Mini-Batch K-Means performs better when using TFIDF only and K-Means performs better when using TFIDF + LSI. Similarly, in time constrained environments, the clustering time of Mini-Batch K-Means is better than other two algorithms.	en_US
dc.identifier.uri	https://hdl.handle.net/20.500.14540/10020
dc.language.iso	en_US	en_US
dc.publisher	Department of Computer Science and Information Technology	en_US
dc.subject	Nepali document clustering	en_US
dc.subject	Mini-Batch K-Means	en_US
dc.subject	DBSCAN	en_US
dc.subject	Machine learning	en_US
dc.title	Nepali Document Clustering using K-Means, Mini-Batch K-Means, and DBSCAN	en_US
dc.type	Thesis	en_US
local.academic.level	Masters	en_US
local.institute.title	Central Department of Computer Science and Information Technology	en_US

Files

Now showing 1 - 1 of 1

Now showing 1 - 1 of 1