Features Reweighting and Similarity Coefficient Based Method for Email Spam Filtering
- 1 Faculty of Engineering, Karary University, Omdurman, Sudan
- 2 Department of Foundation, Inaya Medical College, Sudan
- 3 University of Science and Technology - Khartoum, Sudan
Abstract
Spam is flooding the Internet with many copies of the same message, in an attempt to force the message on people who would not otherwise choose to receive it. Anti spam by determining whether or not an incoming email is spam has become an important problem. One of the main characters or the problem of Spam filtering is its high dimension of space feature. For this reason, we need a reducing stage of dimensions. This study tried to cover this side from spam detection techniques by study the effect of re-weight of features. The works started by applying similarity coefficient in the dataset and then re-weight the features in the dataset and applying similarity coefficient in the new data set. Finally make a Comparison between the result before and after re-weight and Comparison with feature selection method. The objective of this Thesis is: Study the similarity coefficient (Cosine and Dice) and Study the effects of the important feature to other features through the re-weight process. The most important results of this study are: Reweighting process did not improve the success rate of any of the two methods (Cosine and Dice). Also, Feature selection method led to improve detection in Cosine, while reweighting method not improve detection any of (Cosine or Dice).
DOI: https://doi.org/10.3844/ajassp.2017.983.993
Copyright: © 2017 Ali Ahmed, Ammar Ahmed E. Elhadi and Ahmed Osman Ali Elsiddig. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 4,039 Views
- 1,828 Downloads
- 0 Citations
Download
Keywords
- Spam
- Spam Filtering
- Feature Selection
- Similarity Coefficient