Ubuntu
bogofilter package

Bogofilter seems to fail decoding base64

Bug #320829 reported by Christian Frommeyer on 2009-01-24

This bug affects 1 person

	Status	Importance	Assigned to
Bogofilter	Confirmed	Undecided	Unassigned
bogofilter (Ubuntu)	Fix Released	High	Loïc Minier
Lucid	Fix Released	High	Loïc Minier

Bug Description

Binary package hint: bogofilter-bdb

Description: Ubuntu 8.04.1
Release: 8.04
Package: bogofilter-bdb
Source-Package: bogofilter
Version: 1.1.5-2ubuntu5

During the last days I received a lot of similar spam that passed bogofilter marked as Ham. Even after tagging a lot of mails (>50) this was not improved. Neither for already tagged mails nor for new mails.

Looking on the plain mail text I found out that the mails although plain text with cp1251 formatting were base64 encoded. Thus I first assumed that bogofilter might be unable of handling base64 encoding. But actually this is integrated since version 0.10 and should therefore be still in 1.1.5-2ubuntu5 as I have installed here.

A brief test brought up the following:

Test:
I tagged one of the spam mails using a new database with "bogofilter -s" and compared the database contents (retrieved via "bogoutil -d") with another new database were I tagged the same mail but with decoded body and subject.

Result:
In the first DB only information on header fields was present. In the second DB there was also information regarding the body of the mail.

Thus I conclude that bogofilter did not manage to decode the mail - whereas KMail does this flawlessly.

I attach an mbox folder with a selection of mails.

Related branches

lp:ubuntu/lucid/bogofilter

Revision history for this message

Christian Frommeyer (debian-frommeyer) wrote on 2009-01-24:

MBox with several of the problematic emails Edit (39.3 KiB, text/plain)

Revision history for this message

Matthias Andree (matthias-andree) wrote on 2009-07-30:

Hi Christian,

thanks for providing some samples of faulty messages and a hint to the cause of the problem.

Apparently bogofilter (including upstream version 1.2.0) indeed has issues with decoding the message bodies; at least bogolexer doesn't come up with body tokens.

WRT Ubuntu Core Developers as maintainers, please forward such reports upstream. We're not actively monitoring distributor package bugs, so this isn't ever gonna get fixed unless you forward reports on short notice. Letting reports linger for half a year isn't useful.

Changed in bogofilter:
status:	New → Confirmed
Changed in bogofilter (Ubuntu):
status:	New → Confirmed

Revision history for this message

Matthias Andree (matthias-andree) wrote on 2009-07-30:

This is an upstream bogofilter bug.

The lexer (that extracts words from messages) misattributes part of the base64 message part to the header, splits the long base64 line in two pieces, trashes part of the first, then drops it on the floor, and the second part that is properly attributed to the body wasn't split out at a four-character boundary, so the base64 decoder is out of synch and produces garbage.

Sorry for that.

Revision history for this message

Matthias Andree (matthias-andree) wrote on 2009-07-31:

Fixed in bogofilter's upstream Subversion repository. Relevant commit is r6848. Unfortunately, it's non-trivial, so that revision may have to be backported manually.

Thanks, Christian, for the test cases.

Changed in bogofilter:
status:	Confirmed → Fix Committed

Revision history for this message

Matthias Andree (matthias-andree) wrote on 2009-07-31:

r6850 is also required to fix up an indentation issue in r6848.

Revision history for this message

Matthias Andree (matthias-andree) wrote on 2009-08-02:

bogofilter 1.2.1 has just been released, it fixes this bug and a quoted-printable bug that failed to recognize =\r (<- ANSI-C escape notation) sequences at line ends. Please upgrade or backport the fixes.

Matthias Andree (matthias-andree) on 2010-04-07

Changed in bogofilter:
status:	Fix Committed → Confirmed

Revision history for this message

Loïc Minier (lool) wrote on 2010-04-10:

Actually this bug uncovers an important issue with parsing of the first line of the body; bumping to high.

Changed in bogofilter (Ubuntu):
status:	Confirmed → Fix Committed
importance:	Undecided → Medium
assignee:	nobody → Loïc Minier (lool)
Changed in bogofilter (Ubuntu Lucid):
importance:	Medium → High

Revision history for this message

Launchpad Janitor (janitor) wrote on 2010-04-14:

This bug was fixed in the package bogofilter - 1.2.1-0ubuntu1

---------------
bogofilter (1.2.1-0ubuntu1) lucid; urgency=low

  * New upstream bugfix release; LP: #557468.
    - Fixes parsing of the first line of the body in MIME messages;
      LP: #320829.
-- Loic Minier <email address hidden> Sat, 10 Apr 2010 11:08:53 +0200