A Comparative Study of Naive Bayesian Spam Filtering Using Word Distribution and Trigrams

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Department of Computer Science and I.T.

Abstract

A comparative study of Naive Bayesian spam filter is done on the basis of tokenization. The study is focused on the reliability and accuracy of the spam filter between word-based tokenization and trigram-based tokenization. Both of the filters are implemented using the same classifier and trainer. The results of the study is that word-based spam filtering is better when the amount of pre-categorized emails available for training are limited and when the resources available for the classification process were limited as well. For sufficient amount of resources and emails, the results suggest that trigram-based spam filtering is better due to its higher reliability and accuracy.

Description

Citation