exception (utf8 codec) in pipermail

Bug #1778363 reported by Olivier GERARD
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
GNU Mailman
Incomplete
Undecided
Mark Sapiro

Bug Description

After upgrading Mailman from 2.1.23 to 2.1.26-1 on Debian, things went smoothly,
the list's mbox is updated but the archives are not updated.
In the error log, one sees, for every message

Jun 23 19:59:03 2018 (20419) SHUNTING: 1529776742.660616+f4a3eea82ed27ce3f481064f194162863c62b280
Jun 23 21:35:24 2018 (20419) Uncaught runner exception: 'utf8' codec can't decode byte 0xaa in position 26: invalid start byte
Jun 23 21:35:24 2018 (20419) Traceback (most recent call last):
  File "/var/lib/mailman/Mailman/Queue/Runner.py", line 119, in _oneloop
    self._onefile(msg, msgdata)
  File "/var/lib/mailman/Mailman/Queue/Runner.py", line 190, in _onefile
    keepqueued = self._dispose(mlist, msg, msgdata)
  File "/var/lib/mailman/Mailman/Queue/ArchRunner.py", line 77, in _dispose
    mlist.ArchiveMail(msg)
  File "/var/lib/mailman/Mailman/Archiver/Archiver.py", line 214, in ArchiveMail
    h.processUnixMailbox(f)
  File "/var/lib/mailman/Mailman/Archiver/pipermail.py", line 596, in processUnixMailbox
    self.add_article(a)
  File "/var/lib/mailman/Mailman/Archiver/pipermail.py", line 640, in add_article
    author = fixAuthor(article.decoded['author'])
  File "/var/lib/mailman/Mailman/Archiver/pipermail.py", line 63, in fixAuthor
    while i>0 and (L[i-1][0] in lowercase or
UnicodeDecodeError: 'utf8' codec can't decode byte 0xaa in position 26: invalid start byte

This is always the same complaint. I have checked shunted messages and the mbox itself
and I have not found any 0xaa value in them.

Revision history for this message
Mark Sapiro (msapiro) wrote :

This may have something to do with the archive database. You can try the script at https://www.msapiro.net/scripts/hddump to dump the database for the affected period with --verbose and look for values of 'author' and 'decoded'['author']. Is there anything unusual in those or anything with a number like 'Doe, John 3rd'.

If you can post one of the shunted message files or email it to <email address hidden> if you don't want to post it, I'll see if I can duplicate this, but also, please see https://wiki.list.org/x/12812344 .

Changed in mailman:
assignee: nobody → Mark Sapiro (msapiro)
status: New → Incomplete
Revision history for this message
Jonathan Tullett (j+launchpad-net) wrote :

I am experiencing the same issue migrating from 2.1.16 (on Ubuntu 14.04) to 2.1.26 (18.04).

Revision history for this message
Jonathan Tullett (j+launchpad-net) wrote :

As this was a blocker for the migration, I rsynced the /usr/lib/mailman directory from the 2.1.16 installation onto the new machine and the bin/arch/wipe worked again.

Nothing I looked at on the author or decoded author looked out of place.

Revision history for this message
Mark Sapiro (msapiro) wrote :

Are you saying that given the same input mbox file that 'bin/arch --wipe' throws the UnicodeDecodeError with Mailman 2.1.26 but not with 2.1.16? If so, that's strange as there are no changes between 2.1.16 and 2.1.26 in pipermail.py in the area of 'fixAuthor', and there don't seem to be any Debian patches in this area either.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.