Browsing by Subject "Clustering algorithm"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Analysis of MST based clustering algorithm with different threshold values(Department of Computer Science and Information Technology, 2016) Pant, LalitClustering analysis has been an emerging research issue in data mining due to its variety of applications. Many algorithms are proposed so far, however each algorithm has been its own merits and demerits and cannot work for real situation. The MST based clustering algorithms have been widely used due to their ability to detect cluster with irregular boundaries. In this dissertation the clustering algorithm is inspired by MST. In this dissertation the MST based clustering algorithm has been analyzed using different threshold value on MST and measured by validity index. Given the MST over data set, select or reject the edges of MST in process of forming the clusters, depending on the threshold value. Validity index is the ratio of intra cluster distance and inter cluster distance. Thresholds are taken by mean, standard deviation and mean + standard deviation of MST. These thresholds are evaluated by validity index. Smallest value of validity index is select for best clustering and best threshold value. The algorithm has been tested on the randomly generated data sets and as well as real world data sets. Keywords: Clustering Algorithm, MST, Validity Index, Threshold ValuesItem News clustering system based on text mining(Department of Computer Science and Information Technology, 2016) Shahi, DeniData mining is the process of analyzing data from different perspectives and summarizing it into useful information. This dissertation entitled ―News Clustering System based on Text Mining” is one of the implementation of Data Mining in which the similar type articles of different Newspapers are grouped together which is in English language. In this work, documents from different newspapers’ sites are retrieved i.e. Information Extraction (IE) using crawler then document preprocessing is applied. Parser parses the data into article heading and corresponding links, then the headings are split into individual terms and a list of distinct terms are maintained. Then the porter steaming algorithm is applied over the distinct terms collection. Steaming minimizes the vocabulary size (i.e. no. of terms will be minimized). TF-IDF of individual heading is calculated. This process represents individual content and heading in to n-dimensional vector space (n is the number of distinct terms in the article). Finally, K-means algorithm is implemented to group the news. The Efficiency of K-means Clustering Algorithm has been analyzed for different values of initial number of cluster seeds (K) and different iterations (I). The result analysis is on seven days news data. The result obtained by the experiment shows that the result is efficient with the initial clusters seed 12 (K=12), Iterations to maintain the constant cluster centers in K-means clustering depends upon the number of data sets and running time is also directly proportional to the number of iterations and number of initial clusters seeds. Keywords: Data Mining, Information Extraction, Document Preprocessing, Porter Stemming Algorithm, TF-IDF, K-means Clustering Algorithm