Bogofilter seems to fail decoding base64
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bogofilter |
Confirmed
|
Undecided
|
Unassigned | ||
bogofilter (Ubuntu) |
Fix Released
|
High
|
Loïc Minier | ||
Lucid |
Fix Released
|
High
|
Loïc Minier |
Bug Description
Binary package hint: bogofilter-bdb
Description: Ubuntu 8.04.1
Release: 8.04
Package: bogofilter-bdb
Source-Package: bogofilter
Version: 1.1.5-2ubuntu5
During the last days I received a lot of similar spam that passed bogofilter marked as Ham. Even after tagging a lot of mails (>50) this was not improved. Neither for already tagged mails nor for new mails.
Looking on the plain mail text I found out that the mails although plain text with cp1251 formatting were base64 encoded. Thus I first assumed that bogofilter might be unable of handling base64 encoding. But actually this is integrated since version 0.10 and should therefore be still in 1.1.5-2ubuntu5 as I have installed here.
A brief test brought up the following:
Test:
I tagged one of the spam mails using a new database with "bogofilter -s" and compared the database contents (retrieved via "bogoutil -d") with another new database were I tagged the same mail but with decoded body and subject.
Result:
In the first DB only information on header fields was present. In the second DB there was also information regarding the body of the mail.
Thus I conclude that bogofilter did not manage to decode the mail - whereas KMail does this flawlessly.
I attach an mbox folder with a selection of mails.
Related branches
Changed in bogofilter: | |
status: | Fix Committed → Confirmed |
Hi Christian,
thanks for providing some samples of faulty messages and a hint to the cause of the problem.
Apparently bogofilter (including upstream version 1.2.0) indeed has issues with decoding the message bodies; at least bogolexer doesn't come up with body tokens.
WRT Ubuntu Core Developers as maintainers, please forward such reports upstream. We're not actively monitoring distributor package bugs, so this isn't ever gonna get fixed unless you forward reports on short notice. Letting reports linger for half a year isn't useful.