TUCL Repository :: Browsing by Subject "Feature extraction"

Browsing by Subject "Feature extraction"

Now showing 1 - 4 of 4

Hybrid Feature Selection and Feature Extraction Based Ensemble Method in Classification
(Department of Computer Science and Information Technology, 2015) Pandey, Rajesh
Ensemble methods have been called the most influential development in Data Mining and Machine Learning in the past decade. The idea of ensemble learning is to employ multiple learners and combine their predictions. In this thesis, a novel method is proposed to build an ensemble of classifiers based on feature selection: Random selection, Relief and feature extraction: Principal component analysis method. The feature selection process chooses optimal subset of features according to objective function whereas feature extraction process maps the high dimensional dataset into lower dimensional dataset using the linear combination of original features. These feature selection and extraction method helps to produce diverse as well as accurate set of ensemble classifiers. A comparison of proposed method is made with the Bagging, AdaBoost, feature selection based NN, feature extraction based NN and also with plain NN using 22 benchmark dataset. The result obtained by the proposed method outperformed other algorithms with the following distribution: NN (14 cases), Random-NN (13 cases), Relief-NN (15 cases), PCANN (19 cases), AdaBoost (14 cases), Bagging (15 cases). Keywords: Ensemble methods, feature selection, feature extraction, Relief, Principal component analysis, AdaBoost, Bagging, NN, Random-NN, Relief-NN, PCA-NN
Off-line Nepali Handwritten Character Recognition Using MLP and RBF Neural Networks
(Department of Computer Science & Information Technology, 2012) Pant, Ashok Kumar
An off-line Nepali handwriting recognition, based on the neural networks, is described in this research work. For the recognition of off-line handwritings with high classiﬁcation rate a good set of features as a descriptor of image is required. Two important categories of the features are described, geometric and statistical features for extracting information from character images. Directional features are extracted from geometry of skeletonized character image and statistical features are extracted from the pixel distribution of skeletonized character image. The research primarily concerned with the problem of isolated handwritten character recognition for Nepali language. Multilayer Perceptron (MLP)& Radial Basis Function (RBF) classiﬁers are used for classiﬁcation. The principal contributions presented here are preprocessing, feature extraction and MLP& RBF classiﬁers. The another important contribution is the creation of benchmark dataset for off-line Nepali handwritings. There are three datasets for Nepali handwritten numerals, Nepali handwritten vowels and Nepali handwritten consonants respectively. Nepali handwritten numeral dataset contains total 288 samples for each 10 classes of Nepali numerals, Nepali handwritten vowel dataset contains 221 samples for each 12 classes of Nepali vowels and Nepali handwritten consonant dataset contains 205 samples for each 36 classes of Nepali consonants. The strength of this research is efﬁcient feature extraction and the comprehensive classiﬁcation schemes due to which, the recognition accuracy of 94.44% is obtained for Nepali handwritten numeral dataset, 86.04% is obtained for Nepali handwritten vowel dataset and 80.25% is obtained for Nepali handwritten consonant dataset. Keywords: Off-line handwriting recognition, Image processing, Neural networks, Multilayer perceptron, Radial basis function, Preprocessing, Feature extraction, Nepali handwritten datasets
Performance Analysis of Nepali Text Classiﬁcation using Back Propagation and Naive Bayes Algorithm
(Department of Computer Science& Information Technology, 2014) Maharjan, Jamuna
Automated document classiﬁcation is the task of assigning the given document into some class of interest. Text classiﬁcation is the subset of document classiﬁcation as document can be text, image, music, etc. Document classiﬁcation has many applications in library science, information science, computer science and others. It can be used for intellectual categorization of documents, indexing of documents, ﬁltering of spams, routing of emails, identiﬁcation of language, classiﬁcation of genre, etc. The problem of automated document classiﬁcation can be solved in supervised, unsupervised or semi-supervised way. Most of the learning and classiﬁcation algorithms use document attributes and human inference to learn and classify given documents. In this dissertation work, many Natural Language Processing (NLP) techniques are used for document processing and attribute selection. And, two learning based classiﬁcation techniques are used namely, Artiﬁcial Neural Network(ANN) and Naive Bayes Classiﬁer. ANN is a microbiological model of leaning system and Naive Bayes Classiﬁer is a probability based classiﬁcation technique. For the evaluation of the system, we have created Nepali text datasets for ﬁve class of documents: Business, Crime, Education, Health and Sports. There are two separate datasets for training and testing of the system. Training set contains total 1253 documents with 243 for Business, 147 for Crime, 250 for Education, 270 for Health, and 343 for Sports. Similarly, testing dataset contains total 89 documents with 19 for Business, 20 for Crime, 12 for Education, 19 for Health, and 19 for Sports. Training and testing is done by splitting training set into two sets while keeping the testing set unique. Experimentation results show, feed-forward multilayer perceptron based neural network classiﬁer has lower classiﬁcation error rate than Naive Bayes based classiﬁer. MLP classiﬁcation system has the average system accuracy rate of 87:55%, system error rate of 12:44%, precision rate of 80:29% recall rate of 93:41% and f-score rate of 86:55%. Similarly, Naive Bayes classiﬁcation system has the average system accuracy rate of 87:09%, system error rate of 12:90%, precision rate of 79:37% recall rate of 93:87% and f-score rate of 86:05%. Keywords: Automated Document Categorization, Text Classiﬁcation, Natural language processing, Nepali language, Preprocessing, Feature extraction, Artiﬁcial Neural Networks, Multilayer Perceptron, Naive Bayes Classiﬁer
Word Embedding Based Feature Extraction for Nepali News Classification
(Department of Computer Science and Information Technology, 2019) Chaudhary, Ramesh Kumar
A major challenge in topic classification (TC) is the high dimensionality of the feature space. Therefore, feature extraction (FE) plays a vital role in topic classification in particular and text mining in general. FE based on cosine similarity score is commonly used to reduce the dimensionality of datasets with tens or hundreds of thousands of features, which can be impossible to process further. In this study, TF-IDF (Term Frequency Inverse Document Frequency) term weighting is used to extract features. Selecting relevant features and determining how to encode them for a learning machine method have a vast impact on the learning machine methods ability to extract a good model. Count based feature extraction methods is compared with word to vector feature extraction techniques for Nepali news classification. The results show good classification performance when using the feature extraction techniques based on word to vector for less number of classes and drastically decrease the performance for large sample size. On the other hand result of classification count based technique shows consistent nearly performance for any number of classes. The overall performance of the TF-IDF (Term Frequency Inverse Document Frequency) is far better than both word to vector techniques.

Browsing by Subject "Feature extraction"

Results Per Page

Sort Options