Automatic Text Summarization System for Nepali Language Based on Sentence Extraction
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Department of Computer Science and I.T.
Abstract
Automated text summarization is a generic problem in the Natural Language Processing (NLP)
community. It has grabbed great attention recently as the amount of information increases
throughout the world, online and offline. As the volume and availability of data increases, it
causes redundancy and scatterness over the world. So, there is the need of effective and powerful
tool to summarize text documents automatically. So far, many researches have been done
for English and other European languages with high performance. However, Nepali language
still suffers from the little attentions and researches in this field.
In this dissertation, a method has been proposed, which lets us to summarize Nepali text documents
automatically based on sentence extraction techniques. The various stages involved in
this approach which are: text preprocessing, feature extraction, sentence scoring and ranking,
and summary generation. The proposed system is tested with various datasets collected from
different sources such as books, newspapers, article, reports, etc. Automated evaluation techniques
are used to validate the proposed system against the manual summaries. The overall
accuracy of the proposed system is achieved as 79:18% precision, 71:77% recall and 75:02%
F-Score. Cosine similarity measure gives overall similarity of 91:16% between manual summary
and system summary.