Browsing by Subject "Feature extraction"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Hybrid Feature Selection and Feature Extraction Based Ensemble Method in Classification(Department of Computer Science and Information Technology, 2015) Pandey, RajeshEnsemble methods have been called the most influential development in Data Mining and Machine Learning in the past decade. The idea of ensemble learning is to employ multiple learners and combine their predictions. In this thesis, a novel method is proposed to build an ensemble of classifiers based on feature selection: Random selection, Relief and feature extraction: Principal component analysis method. The feature selection process chooses optimal subset of features according to objective function whereas feature extraction process maps the high dimensional dataset into lower dimensional dataset using the linear combination of original features. These feature selection and extraction method helps to produce diverse as well as accurate set of ensemble classifiers. A comparison of proposed method is made with the Bagging, AdaBoost, feature selection based NN, feature extraction based NN and also with plain NN using 22 benchmark dataset. The result obtained by the proposed method outperformed other algorithms with the following distribution: NN (14 cases), Random-NN (13 cases), Relief-NN (15 cases), PCANN (19 cases), AdaBoost (14 cases), Bagging (15 cases). Keywords: Ensemble methods, feature selection, feature extraction, Relief, Principal component analysis, AdaBoost, Bagging, NN, Random-NN, Relief-NN, PCA-NNItem Off-line Nepali Handwritten Character Recognition Using MLP and RBF Neural Networks(Department of Computer Science & Information Technology, 2012) Pant, Ashok KumarAn off-line Nepali handwriting recognition, based on the neural networks, is described in this research work. For the recognition of off-line handwritings with high classification rate a good set of features as a descriptor of image is required. Two important categories of the features are described, geometric and statistical features for extracting information from character images. Directional features are extracted from geometry of skeletonized character image and statistical features are extracted from the pixel distribution of skeletonized character image. The research primarily concerned with the problem of isolated handwritten character recognition for Nepali language. Multilayer Perceptron (MLP)& Radial Basis Function (RBF) classifiers are used for classification. The principal contributions presented here are preprocessing, feature extraction and MLP& RBF classifiers. The another important contribution is the creation of benchmark dataset for off-line Nepali handwritings. There are three datasets for Nepali handwritten numerals, Nepali handwritten vowels and Nepali handwritten consonants respectively. Nepali handwritten numeral dataset contains total 288 samples for each 10 classes of Nepali numerals, Nepali handwritten vowel dataset contains 221 samples for each 12 classes of Nepali vowels and Nepali handwritten consonant dataset contains 205 samples for each 36 classes of Nepali consonants. The strength of this research is efficient feature extraction and the comprehensive classification schemes due to which, the recognition accuracy of 94.44% is obtained for Nepali handwritten numeral dataset, 86.04% is obtained for Nepali handwritten vowel dataset and 80.25% is obtained for Nepali handwritten consonant dataset. Keywords: Off-line handwriting recognition, Image processing, Neural networks, Multilayer perceptron, Radial basis function, Preprocessing, Feature extraction, Nepali handwritten datasetsItem Performance Analysis of Nepali Text Classification using Back Propagation and Naive Bayes Algorithm(Department of Computer Science& Information Technology, 2014) Maharjan, JamunaAutomated document classification is the task of assigning the given document into some class of interest. Text classification is the subset of document classification as document can be text, image, music, etc. Document classification has many applications in library science, information science, computer science and others. It can be used for intellectual categorization of documents, indexing of documents, filtering of spams, routing of emails, identification of language, classification of genre, etc. The problem of automated document classification can be solved in supervised, unsupervised or semi-supervised way. Most of the learning and classification algorithms use document attributes and human inference to learn and classify given documents. In this dissertation work, many Natural Language Processing (NLP) techniques are used for document processing and attribute selection. And, two learning based classification techniques are used namely, Artificial Neural Network(ANN) and Naive Bayes Classifier. ANN is a microbiological model of leaning system and Naive Bayes Classifier is a probability based classification technique. For the evaluation of the system, we have created Nepali text datasets for five class of documents: Business, Crime, Education, Health and Sports. There are two separate datasets for training and testing of the system. Training set contains total 1253 documents with 243 for Business, 147 for Crime, 250 for Education, 270 for Health, and 343 for Sports. Similarly, testing dataset contains total 89 documents with 19 for Business, 20 for Crime, 12 for Education, 19 for Health, and 19 for Sports. Training and testing is done by splitting training set into two sets while keeping the testing set unique. Experimentation results show, feed-forward multilayer perceptron based neural network classifier has lower classification error rate than Naive Bayes based classifier. MLP classification system has the average system accuracy rate of 87:55%, system error rate of 12:44%, precision rate of 80:29% recall rate of 93:41% and f-score rate of 86:55%. Similarly, Naive Bayes classification system has the average system accuracy rate of 87:09%, system error rate of 12:90%, precision rate of 79:37% recall rate of 93:87% and f-score rate of 86:05%. Keywords: Automated Document Categorization, Text Classification, Natural language processing, Nepali language, Preprocessing, Feature extraction, Artificial Neural Networks, Multilayer Perceptron, Naive Bayes ClassifierItem Word Embedding Based Feature Extraction for Nepali News Classification(Department of Computer Science and Information Technology, 2019) Chaudhary, Ramesh KumarA major challenge in topic classification (TC) is the high dimensionality of the feature space. Therefore, feature extraction (FE) plays a vital role in topic classification in particular and text mining in general. FE based on cosine similarity score is commonly used to reduce the dimensionality of datasets with tens or hundreds of thousands of features, which can be impossible to process further. In this study, TF-IDF (Term Frequency Inverse Document Frequency) term weighting is used to extract features. Selecting relevant features and determining how to encode them for a learning machine method have a vast impact on the learning machine methods ability to extract a good model. Count based feature extraction methods is compared with word to vector feature extraction techniques for Nepali news classification. The results show good classification performance when using the feature extraction techniques based on word to vector for less number of classes and drastically decrease the performance for large sample size. On the other hand result of classification count based technique shows consistent nearly performance for any number of classes. The overall performance of the TF-IDF (Term Frequency Inverse Document Frequency) is far better than both word to vector techniques.