Nepali Document Clustering using DBSCAN and OPTICS Algorithm
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Department of Computer Science & Information Technology
Abstract
Automated document clustering is the process of grouping documents into a small sets of
meaningful collections based on similarity between them. This research evaluates density based
clustering algorithms namely Density-Based Spatial Clustering of Applications with Noise
(DBSCAN) and Ordering points to Identify Cluster Structure(OPTICS) algorithms using four
performance metrics: Homogeneity, Completeness, V-Measure and Silhouette Coefficient on
Nepali dataset. Features extraction is done using combination of Term Frequency – Inverse
Document Frequency (TFIDF) with Latent Semantic Indexing (LSI). The results based on the
performance metrics mentioned above shows that clustering result of DBSCAN is slightly better
than OPTICS algorithm. The time required for processing is better for DBSCAN algorithm.