TUCL Repository :: Browsing by Subject "Natural language processing"

Browsing by Subject "Natural language processing"

Now showing 1 - 2 of 2

Comparative Study of Clustering Algorithms for Nepali News
(Department of Computer Science and Information Technology, 2019) Dahal, Yub Raj
Clustering is an important technique to separate data categories based on their feature similarity. Clustering belong to unsupervised type of machine learning algorithms. Among many clustering algorithms, three representative algorithms namely K-means, X-means and Expectation Maximization are experimented for the Nepali news clustering problem in this research work. News clustering is the task of categorizing news into groups that share similar interests. Clustering algorithms are evaluated for optimal performances based on cluster evaluation metrics and execution time. Evaluation metrics used are Dunn index, DB index and CH index. Execution time includes clustering time and training time. TF-IDF is used as a news embedding representation. Algorithms are also evaluated with reduced feature dimensions by applying PCA. To select the winner algorithm and setting the values of DB index, training time and clustering time must be lower and value of CH index and Dunn index must be higher. So, based upon the evaluation results, we conclude the winning algorithm and strategies in some states as follows. When feature dimension is high (>= 10000) K-Means perform better then others. When applied PCA to reduce feature space, EM algorithm better performs than others. With reduced feature space, K-Means still performs better then X-Means clustering algorithm.
Performance analysis of Naive Bayes and support vector machine algorithm on classification of Nepali opinion text
(Department of Computer Science and Information Technology, 2022) Shrestha, Nishchhal
Opinion is a subjective expression of individual on something. These are views, emotions or sentiments. The opinion helps individual and organization to make decision about the certain things. The opinion classification is the process of analyzing the view or opinion using the natural language processing techniques. The Naïve Bayes and Support Vector Machine (SVM) algorithm are supervised machine learning algorithm for classification. Most of the researches in opinion classification are done in English language but it is important to perform the opinion classification in Nepali language as the amount of data in Nepali is increasing rapidly in the form of blog, review, opinion column in newspaper. Nepali sentences were collected from the opinion section of different online portal of national newspaper in this study. The python programming language was used for implementing both algorithms with NLTK library and output were analyzed on the basis of performance metrics. The accuracy of SVM is 85% which is higher than accuracy of Naïve Bayes algorithm i.e. 83% on preprocessed the data. The accuracy of both algorithms was improved after preprocessing as compared to without preprocessing the data. The Study concluded SVM model was the best model with higher values of performance metrics and is recommended for opinion classification of Nepali text data over the Naïve Bayes algorithm.

Browsing by Subject "Natural language processing"

Results Per Page

Sort Options