Bayesian spam filters, which are a type of scoring content-based spam filters, analyze the contents of the mail, and calculate the probability of the message being spam. It builds up a list of characteristics of elements that are typically spam as well as good emails. The advantage of the Bayesian spam filters is that they build up the list of characteristics themselves, and do not depend on the manually built list.
Bayesian spam filters more or less try to emulate how you personally identify your spam emails. One look at an email tells you whether the email is genuine or spam. The probability that you will characterize a good mail as spam is ‘zero’. Ideally, it would be great if spam filters do work in the same way. At least, the Bayesian spam filters are trying in this direction. Spam Filtering
Suppose that the word ‘textile’ often appears in your legitimate mails, but never in your spam mails, then there is zero probability of the word ‘textile’ indicating spam. On the other hand, the words ‘Nigeria’ and ‘lottery’ quite often and at times most exclusively, appear as spams – made famous by the 419 scams out of Nigeria and elsewhere in Africa.
For Bayesian spam filters, these two words ‘Nigeria’ and ‘lottery’ have every probability of being found in spam emails – as much as 100 percent.
Whenever you receive a new message, the Bayesian spam filter analyzes it, and calculates, by using the individual characteristics, the probability of it being a spam. If it so happens that your message contains both words, ‘textiles’ and ‘Nigeria’ or ‘lottery’, the Bayesian spam filter cannot ascertain whether the message is a genuine one or a spam. It will further analyze other characteristics that will allow it to assess the probability of classifying the message as either, spam or legitimate. Bayesian Spam Filters – Adapting Automatically
Once you have classified the message, as shown above, it can be used to further train the spam filter. This is how it works. In the above scenario:
• If the message is analyzed as being spam, then the probability of the word ‘textile’ indicating legitimate mail is lessened.
• If the message is analyzed as being legitimate mail, then the probability of the words, ‘Nigeria’ or ‘lottery’ – whichever was used – needs to be re-analyzed and re-considered as spam.
The advantage of Bayesian spam filters is that they self adapt by learning from their own decisions, as well as the user’s decisions – if made manually. This automatic adaptability of the Bayesian spam filters is excellent for individual email users. Most spam emails have very similar and at times identical characteristics, whereas the characteristics of legitimate mails are different for each individual.