datedir2amqp aborts 'UnicodeDecodeError: 'utf8' codec can't decode byte 0x85 in position 5: unexpected code byte'

Bug #960784 reported by Steve McInerney
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Python OOPS Date-dir repository
Triaged
Low
Unassigned
python-oops-datedir2amqp
Triaged
Low
Unassigned

Bug Description

copy of a breaking oops is on carob:~spm/OOPS-4e169a85a619c554dae80a4e3f1a2c27

Traceback (most recent call last):
  File "/srv/oops-datedir2amqp/python-oops-datedir2amqp/bin/datedir2amqp", line 26, in <module>
    oops_datedir2amqp.main()
  File "/srv/oops-datedir2amqp/python-oops-datedir2amqp-rev-4/oops_datedir2amqp/main.py", line 82, in main
    repo.republish(publisher)
  File "/srv/oops-datedir2amqp/python-oops-datedir2amqp-rev-4/eggs/oops_datedir_repo-0.0.10-py2.6.egg/oops_datedir_repo/repository.py", line 170, in republish
    report = serializer.read(report_file)
  File "/srv/oops-datedir2amqp/python-oops-datedir2amqp-rev-4/eggs/oops_datedir_repo-0.0.10-py2.6.egg/oops_datedir_repo/serializer.py", line 53, in read
    return serializer_bson.read(StringIO(content))
  File "/srv/oops-datedir2amqp/python-oops-datedir2amqp-rev-4/eggs/oops_datedir_repo-0.0.10-py2.6.egg/oops_datedir_repo/serializer_bson.py", line 65, in read
    report = bson.loads(fp.read())
  File "/srv/oops-datedir2amqp/python-oops-datedir2amqp-rev-4/eggs/bson-0.3.2-py2.6.egg/bson/__init__.py", line 75, in loads
    return decode_document(data, 0)[1]
  File "/srv/oops-datedir2amqp/python-oops-datedir2amqp-rev-4/eggs/bson-0.3.2-py2.6.egg/bson/codec.py", line 240, in decode_document
    base, name, value = decode_element(data, base)
  File "/srv/oops-datedir2amqp/python-oops-datedir2amqp-rev-4/eggs/bson-0.3.2-py2.6.egg/bson/codec.py", line 232, in decode_element
    return decode_func(data, base)
  File "/srv/oops-datedir2amqp/python-oops-datedir2amqp-rev-4/eggs/bson-0.3.2-py2.6.egg/bson/codec.py", line 253, in decode_document_element
    base, value = decode_document(data, base)
  File "/srv/oops-datedir2amqp/python-oops-datedir2amqp-rev-4/eggs/bson-0.3.2-py2.6.egg/bson/codec.py", line 240, in decode_document
    base, name, value = decode_element(data, base)
  File "/srv/oops-datedir2amqp/python-oops-datedir2amqp-rev-4/eggs/bson-0.3.2-py2.6.egg/bson/codec.py", line 232, in decode_element
    return decode_func(data, base)
  File "/srv/oops-datedir2amqp/python-oops-datedir2amqp-rev-4/eggs/bson-0.3.2-py2.6.egg/bson/codec.py", line 163, in decode_string_element
    base, name = decode_cstring(data, base + 1)
  File "/srv/oops-datedir2amqp/python-oops-datedir2amqp-rev-4/eggs/bson-0.3.2-py2.6.egg/bson/codec.py", line 121, in decode_cstring
    return (base + length, buf.getvalue().decode("utf8"))
  File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x85 in position 5: unexpected code byte

Steve McInerney (spm)
tags: added: canonical-losa-lp
Changed in python-oops-datedir2amqp:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Robert Collins (lifeless) wrote :

I get a similar error with the mongo BSON library:
>>> bson.BSON(s).decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/robertc/source/launchpad/oops-datedir2amqp/working/eggs/pymongo-2.1.1-py2.7-linux-x86_64.egg/bson/__init__.py", line 539, in decode
    (document, _) = _bson_to_dict(self, as_class, tz_aware)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x85 in position 5: invalid start byte

I wonder if this is a damaged file and not worth handling (low incident rate, should examine when it happens), or whether its a genuine bug.

Changed in python-oops-datedir-repo:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Robert Collins (lifeless) wrote :

So this is horribly meta: the culprit file is itself reporting a unicodedecode error. I wonder if we have some metabug that leads to bad bson ? That would be a bson encoding error on the server side though, vs an oops bug.

I'm going to mark this low and attach the oops file, if it happens again or we can determine a cause, we can tackle at root. I'm uncomfortable just catching this exception for now.

Changed in python-oops-datedir-repo:
importance: Critical → Low
Changed in python-oops-datedir2amqp:
importance: Critical → Low
Revision history for this message
Robert Collins (lifeless) wrote :

The offending oops. Its an anonymous request from Googlebot to +login, so no personal details or anything else to worry about.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.