Browsing by Subject "Clustering"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Comparative Study of K-means, Expectation-Maximization and Density Based Clustering Algorithm(Department of Computer Science and Information Technology, 2018) Upadhaya, DeepaData mining is the process of analyzing data from different perspectives and summarizing it into useful information. This dissertation entitled ―”Comparative Study of K-means, Expectation-maximization and density Based Clustering Algorithm” is one of the implementation of Data Mining in which the datasets of “Heart Disease and Thyroid Disease Data Set” are used. There is a wide range of algorithms available for clustering. This research presents a comparative study of clustering algorithms. In experiments, the accuracy and time taken by algorithms is evaluated by comparing the results on heart disease and thyroid disease datasets , which is obtained from the UCI and KEEL repository using WEKA tool. All total 597 data of heart disease datasets and 3772 data of Thyroid disease datasets are use for implementing the algorithm. Heart disease use 14 attributes and thyroid disease use 30 attributes. Expectation-maximization clustering and Density based clustering takes more time to form clusters for both datasets (heart disease and thyroid disease datasets).Simple K-means clustering algorithms forms clusters with less time and more accuracy than other algorithms for heart disease and thyroid disease datasets . In terms of time and accuracy K-means produces better results as compared to other algorithms.Item Nepali Document Clustering using DBSCAN and OPTICS Algorithm(Department of Computer Science & Information Technology, 2018) Maharjan, PrabinAutomated document clustering is the process of grouping documents into a small sets of meaningful collections based on similarity between them. This research evaluates density based clustering algorithms namely Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Ordering points to Identify Cluster Structure(OPTICS) algorithms using four performance metrics: Homogeneity, Completeness, V-Measure and Silhouette Coefficient on Nepali dataset. Features extraction is done using combination of Term Frequency – Inverse Document Frequency (TFIDF) with Latent Semantic Indexing (LSI). The results based on the performance metrics mentioned above shows that clustering result of DBSCAN is slightly better than OPTICS algorithm. The time required for processing is better for DBSCAN algorithm.