A week ago, I had a little schnapps idea - maybe it would better to consider email Spam until proven otherwise. Perhaps we shouldn't concentrate on identifying Spam, but rather real mail.
As it happens, my private email a) gets a lot of Spam, because it receives messages addressed to several different accounts which are published on the net, and b) my email is sent to two separate computers, each running its own copy of the latest version of Thunderbird. This makes it possible to compare the traditional "Mark Spam" method with the "Mark Good" method. The tool is the same, the only difference is which mail get put into the filter.
Mark Good is really simple - if the email is one that I want, click on the "Junk" button. Thunderbird marks it as junk and deletes it. It also learns to recognize junk and automatically deletes similar messages in the future. I then recover the good messages from the Trash, and then really delete everything left in my Inbox.
In some ways, the results of the two approaches are quite similar. In others, the differences are quite dramatic, and given a choice, I much prefer the "Mark Good" approach.
As I have have not used my home computer in a while (that it runs at 500 MHz and my Notebook at 2GHz might explain the infrequent usage), there were 2310 messages waiting for download yesterday. So using the T-Birds Bayesian Spam filter in its current training state, I downloaded all the messages, letting the Spam filter do its thing. Per thousand messages, there were 124 "really good" messages (i.e. those messages that I, using my human judgement, care about). Of those, 60 -- nearly half -- were classified as bad (Spam) and 19 Spam messages were left in the inbox. All in all, 79 of 1000 messages were put in the wrong category.
A week ago, I starting using the Mark Good approach on my laptop. Before starting, I reset the learning database, used a day's worth of messages to train the Spam filter (by identifying good email as Spam in the UI. Good mail gets put in the Junk folder, everything else stays in the Inbox). During the 6 1/2 days since, I have received 935 messages. Normalizing again, 102 per 1000 were "really good". Of those, 98 were recognized properly, only 4 were left together with the Spam and only 63 per 1000 where misclassified.
I'm not sure if the difference between 6.3% and 7.9% is really that significant, but what is dramatic is how the two approaches handle their mistakes differently.
Mark Good put 59 Spam mails together with the good ones. Mark Spam put 60 good emails together with the Spam. Which is preferable? To my eyes, no contest: It is much easier to sort out 60 Spams from 102 Goods, than to find 60 Goods among 936 Spams.
Only one thing troubles me about the mark good approach: what about the 4 / 1000 real mails that get classified among the Spam: - would I really make the effort to go find these mails? Or would they just get lost in the Spam?
|
Experiment
"Mark Good"
|
Control
"Mark Spam"
|
Messages Processed
|
935
|
2310
|
per 1000 Messages:
|
1000 |
1000 |
| Really Good |
102 |
124 |
| Considered Good |
156 |
64 |
| Considered Spam |
844 |
936 |
| False Good |
59 |
19 |
| False Spam |
4 |
60 |
| Wrongly filtered |
63 |
79 |
What's next? Well it would be great if someone would verify these findings. Even better if someone from the Thunderbird group would look into this, maybe build this approach into a beta version....?