addAttachment() crashes with UnicodeDecodeError:

Bug #353805 reported by Martin Pitt
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
launchpadlib
High
Leonard Richardson

Bug Description

The retracer just crashed with

Traceback (most recent call last):
  File "/tmp/tmp0Lq7Ny/usr/bin/apport-retrace", line 569, in <module>
    crashdb.update(crashid, report)
  File "/usr/lib/python2.6/dist-packages/apport/crashdb_impl/launchpad.py", line 270, in update
    is_patch=False)
  File "/usr/lib/python2.6/dist-packages/launchpadlib/resource.py", line 337, in __call__
    args[key] = simplejson.dumps(value, cls=DatetimeJSONEncoder)
  File "/var/lib/python-support/python2.6/simplejson/__init__.py", line 243, in dumps
    **kw).encode(obj)
  File "/var/lib/python-support/python2.6/simplejson/encoder.py", line 360, in encode
    return encode_basestring_ascii(o)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb6 in position 627: unexpected code byte

on
            bug.addAttachment(comment='',
                    #content_type=?
                    data=report['ThreadStacktrace'],
                    description='ThreadStacktrace.txt (retraced)',
                    filename='ThreadStacktrace.txt',
                    is_patch=False)

when processing bug 353433.

Revision history for this message
Martin Pitt (pitti) wrote :

Reproducer (works on staging):

b=lp.bugs[89040]
b.addAttachment(comment='', data=open('thread.txt').read(), description='test353805', filename='test353805', is_patch=False)

This apparently happens because something in launchpadlib assumes that data is UTF-8 encoded. While stack traces are by and large text, they can have binary stuff in it, such as this one.

Revision history for this message
Martin Pitt (pitti) wrote :

This even happens if I supply an explicit content type:

>>> b.addAttachment(comment='', data=open('/home/martin/thread.txt').read(), description='test353805', filename='test353805', is_patch=False, content_type='application/octet-stream')

with application/octet-stream it shouldn't make any assumptions about the content at least.

Also, in the Launchpad web UI I could attach thread.txt just fine, and it's even treated as text/plain (which is good for stack traces).

Is there a quick workaround for this?

Revision history for this message
Martin Pitt (pitti) wrote :

Easier reproducer without external file:

>>> b=lp.bugs[89040]
>>> b.addAttachment(comment='', data='"]\xb6"\n', description='test353805', filename='test353805', is_patch=False)

I work around this with

  data=report['Stacktrace'].decode('UTF-8', 'replace')

for now, but I'd like to revert this soon, since it is destructive.

Revision history for this message
Leonard Richardson (leonardr) wrote :

Obviously launchpadlib should not be running binary data through simplejson. The question is how launchpadlib is supposed to know that binary data is appropriate for that particular argument, 'data'.

The 'data' argument is published as an IBytes field in Launchpad. If the WADL for that field specified its type as 'binary', launchpadlib and wadllib could be programmed to handle it correctly. We'd probably have to change lazr.restful, wadllib, and lazr.restfulclient, but all of the changes would be fairly small. I'd prefer to wait on this until we have the lazr.restful and lazr.restfulclient separations complete.

Changed in launchpadlib:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Leonard Richardson (leonardr) wrote :

I got whacked by this bug, so I'm going to start fixing it tomorrow.

Revision history for this message
Leonard Richardson (leonardr) wrote :

Before we close out this bug, we should add a launchpadlib branch containing an integration test. All the tests written so far are unit tests.

Changed in launchpadlib:
assignee: nobody → Leonard Richardson (leonardr)
status: Triaged → In Progress
Revision history for this message
Leonard Richardson (leonardr) wrote :

Cross-posted from launchpad-users in case anyone interested is watching this bug.

The fix for this bug means that anyone who's currently happy sending *non-binary data* into addAttachment, will have to upgrade to the new version of launchpadlib. I'd like to know if anyone is sending plain text diffs or crash reports to addAttachment. If so, how difficult would it be for you to upgrade? This way I can decide whether to put in backwards compatibility code.

Revision history for this message
Martin Pitt (pitti) wrote : Re: [Bug 353805] Re: addAttachment() crashes with UnicodeDecodeError:

Leonard Richardson [2009-05-11 20:03 -0000]:
> The fix for this bug means that anyone who's currently happy sending
> *non-binary data* into addAttachment, will have to upgrade to the new
> version of launchpadlib. I'd like to know if anyone is sending plain
> text diffs or crash reports to addAttachment. If so, how difficult would
> it be for you to upgrade? This way I can decide whether to put in
> backwards compatibility code.

For me and apport this would be okay, just takes some work to backport
the packages to hardy/intrepid/jaunty.

Revision history for this message
Fernando Casanova (fcasanova) wrote :

Hi,

I'm interested in a fix for this bug. I would like to add binary attachments to launchpad, but this bug is preventing me from doing it.

Please correct me if I'm saying something *VERY* stupid.
For backwards compatibility I would suggest to leave the current addAttachment as it is and add 'addAttachmentBinary', or something like that, where we don't pass binary data to simplejson.

Rgds,

Fer

Revision history for this message
Leonard Richardson (leonardr) wrote :

The backwards compatibility problem is not that launchpadlib needs to do different things for binary and non-binary data. The problem is that when we change the web service, existing installations of launchpadlib will stop working. As people upgrade to the new version the problem will go away. The question is whether it's acceptable to tell anyone who has problems to upgrade to the latest version.

Revision history for this message
Michael Hofmann (mh21) wrote :

The same problem can be seen in Bug #408605, with project_release.addFile(). Most release tarballs are actually binary :-).

Revision history for this message
Michael Hofmann (mh21) wrote :

Maybe an additional data_base64/file_content_base64 parameter?

Revision history for this message
Leonard Richardson (leonardr) wrote :

This should be fixed in launchpadlib 1.5.1. In fact there's a script in samples/ for creating a release file.

Changed in launchpadlib:
status: In Progress → Fix Committed
Changed in launchpadlib:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers