UnicodeDecodeError from broken package descriptions

Bug #1053749 reported by Mathias Burén on 2012-09-21
316
This bug affects 72 people
Affects Status Importance Assigned to Milestone
dpkg (Ubuntu)
High
Unassigned
Quantal
High
Unassigned
Raring
High
Unassigned
ubuntu-drivers-common (Ubuntu)
High
Martin Pitt
Quantal
High
Unassigned
Raring
High
Martin Pitt

Bug Description

Attempting to launch software-properties-gtk results in this:

$ software-properties-gtk
gpg: /tmp/tmpsw0n10/trustdb.gpg: trustdb created
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/UbuntuDrivers/detect.py", line 162, in packages_for_modalias
    cache_map = packages_for_modalias.cache_maps[apt_cache_hash]
KeyError: 3989481

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/software-properties-gtk", line 103, in <module>
    app = SoftwarePropertiesGtk(datadir=options.data_dir, options=options, file=file)
  File "/usr/lib/python3/dist-packages/softwareproperties/gtk/SoftwarePropertiesGtk.py", line 178, in __init__
    self.init_drivers()
  File "/usr/lib/python3/dist-packages/softwareproperties/gtk/SoftwarePropertiesGtk.py", line 1097, in init_drivers
    self.devices = detect.system_device_drivers()
  File "/usr/lib/python3/dist-packages/UbuntuDrivers/detect.py", line 415, in system_device_drivers
    for pkg, pkginfo in system_driver_packages(apt_cache).items():
  File "/usr/lib/python3/dist-packages/UbuntuDrivers/detect.py", line 319, in system_driver_packages
    for p in packages_for_modalias(apt_cache, alias):
  File "/usr/lib/python3/dist-packages/UbuntuDrivers/detect.py", line 164, in packages_for_modalias
    cache_map = _apt_cache_modalias_map(apt_cache)
  File "/usr/lib/python3/dist-packages/UbuntuDrivers/detect.py", line 129, in _apt_cache_modalias_map
    m = package.candidate.record['Modaliases']
  File "/usr/lib/python3/dist-packages/apt/package.py", line 429, in record
    return Record(self._records.record)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xeb in position 114: invalid continuation byte

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: software-properties-gtk 0.92.6
ProcVersionSignature: Ubuntu 3.5.0-15.22-generic 3.5.4
Uname: Linux 3.5.0-15-generic x86_64
ApportVersion: 2.5.2-0ubuntu4
Architecture: amd64
Date: Fri Sep 21 08:54:17 2012
InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Alpha amd64 (20120905.2)
PackageArchitecture: all
SourcePackage: software-properties
UpgradeStatus: No upgrade log present (probably fresh install)

Mathias Burén (mathias-buren) wrote :
Mathias Burén (mathias-buren) wrote :
Mathias Burén (mathias-buren) wrote :

/etc/apt/sources.list.d$ for I in *.list;do echo $I;cat $I;echo;done
google-chrome.list
### THIS FILE IS AUTOMATICALLY CONFIGURED ###
# You may comment out this entry, but any other modifications may be lost.
deb http://dl.google.com/linux/chrome/deb/ stable main

mozillateam-firefox-next-quantal.list
deb http://ppa.launchpad.net/mozillateam/firefox-next/ubuntu quantal main

virtualbox.list
deb http://download.virtualbox.org/virtualbox/debian quantal contrib

webupd8team-java-quantal.list
deb http://ppa.launchpad.net/webupd8team/java/ubuntu quantal main
deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu quantal main

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in software-properties (Ubuntu):
status: New → Confirmed
Jan Henke (jhe) wrote :

Push, this bug is a serious breaker. I needs to be fixed in quantal asap!

Moving and rebuilding cache did not work for me.

/var/cache/apt/pkgcache.bin is still the same after rebuild.

Error looks like this when starting the app still.

http://paste.ubuntu.com/1290050/

$ software-properties-gtk
gpg: /tmp/tmp1_cuxf/trustdb.gpg: trustdb created
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/UbuntuDrivers/detect.py", line 162, in packages_for_modalias
    cache_map = packages_for_modalias.cache_maps[apt_cache_hash]
KeyError: 3315533

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/software-properties-gtk", line 103, in <module>
    app = SoftwarePropertiesGtk(datadir=options.data_dir, options=options, file=file)
  File "/usr/lib/python3/dist-packages/softwareproperties/gtk/SoftwarePropertiesGtk.py", line 178, in __init__
    self.init_drivers()
  File "/usr/lib/python3/dist-packages/softwareproperties/gtk/SoftwarePropertiesGtk.py", line 1097, in init_drivers
    self.devices = detect.system_device_drivers()
  File "/usr/lib/python3/dist-packages/UbuntuDrivers/detect.py", line 415, in system_device_drivers
    for pkg, pkginfo in system_driver_packages(apt_cache).items():
  File "/usr/lib/python3/dist-packages/UbuntuDrivers/detect.py", line 319, in system_driver_packages
    for p in packages_for_modalias(apt_cache, alias):
  File "/usr/lib/python3/dist-packages/UbuntuDrivers/detect.py", line 164, in packages_for_modalias
    cache_map = _apt_cache_modalias_map(apt_cache)
  File "/usr/lib/python3/dist-packages/UbuntuDrivers/detect.py", line 129, in _apt_cache_modalias_map
    m = package.candidate.record['Modaliases']
  File "/usr/lib/python3/dist-packages/apt/package.py", line 429, in record
    return Record(self._records.record)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xeb in position 114: invalid continuation byte

$ grep-available -r . | iconv -f utf-8 -t ucs-2le > /dev/null; echo $?
iconv: illegal input sequence at position 226269
1

$ for F in /var/lib/apt/lists/*Packages; do iconv -f utf-8 -t ucs-2le $F > /dev/null || echo $F; done
$ echo $?
0

Steve Langasek (vorlon) wrote :

This issue has been reported on IRC today. The problem seems to trace back to a locally-installed package with a non-utf8 maintainer field:

Package: davmail
Maintainer: Micka�l Guessant <email address hidden>

Of course, this package fails to comply with Debian policy, but that clearly didn't stop the user from being able to install it - which means the data is in the system and we need to be able to cope with it.

I'm not sure if this needs to be fixed in ubuntu-drivers or in python-apt. Reassigning to ubuntu-drivers for the moment.

Changed in software-properties (Ubuntu):
importance: Undecided → High
status: Confirmed → Triaged
affects: software-properties (Ubuntu) → ubuntu-drivers-common (Ubuntu)
Changed in ubuntu-drivers-common (Ubuntu Quantal):
status: New → Triaged
importance: Undecided → High

$ grep-available -r . | iconv -f utf-8 -t ucs-2le 1> /dev/null
iconv: illegal input sequence at position 226269

$ grep-available -r . | head -c 226300 | tail -n 1
Maintainer: Micka�l Guessant <email address hidden>

$ grep-available -r . | head -c 226300 | tail -n 6

Package: davmail
Priority: extra
Section: mail
Installed-Size: 5401
Maintainer: Micka�l Guessant <email address hidden>

$ aptitude show davmail
Package: davmail
State: installed
Automatically installed: no
Version: 3.9.9-1976-1
Priority: extra
Section: mail
Maintainer: Micka?l Guessant <email address hidden>
Architecture: all
Uncompressed Size: 5,531 k
Depends: openjdk-7-jre | openjdk-6-jre | sun-java6-jre, libswt-gtk-3-java | libswt-gtk-3.6-java | libswt-gtk-3.5-java | libswt-gtk-3.4-java
Description: DavMail POP/IMAP/SMTP/Caldav/Carddav/LDAP Exchange Gateway
 Ever wanted to get rid of Outlook ? DavMail is a POP/IMAP/SMTP/Caldav/Carddav/LDAP exchange gateway allowing users to use any mail/calendar client (e.g. Thunderbird
 with Lightning or Apple iCal) with an Exchange server, even from the internet or behind a firewall through Outlook Web Access. DavMail now includes an LDAP gateway to
 Exchange global address book and user personal contacts to allow recipient address completion in mail compose window and full calendar support with attendees free/busy
 display. The main goal of DavMail is to provide standard compliant protocols in front of proprietary Exchange. This means LDAP for global address book, SMTP to send
 messages, IMAP to browse messages on the server in any folder, POP to retrieve inbox messages only, Caldav for calendar support and Carddav for personal contacts sync.
 Thus any standard compliant client can be used with Microsoft Exchange. DavMail gateway is implemented in java and should run on any platform. Releases are tested on
 Windows, Linux (Ubuntu) and Mac OSX. Tested successfully with the Iphone (gateway running on a server).

 http://davmail.sourceforge.net

Editing /var/lib/dpkg/available to remove the odd character had no effect on fixing the root cause.

$ grep-status -r . | iconv -f utf-8 -t ucs-2le 1> /dev/null; echo $?
iconv: illegal input sequence at position 223829
1

Removing the odd character from /var/lib/dpkg/status as well FIXES THE ISSUE. Huge thanks to slangasek in #ubuntu-devel for the troubleshooting.

mondhs (mondhs) wrote :

I confirm that clean up /var/lib/dpkg/status fixed the issue.

As I do not know right way to do so: I opened the file ans saved with gedit:

sudo gedit /var/lib/dpkg/status

gedit warned me couple of time that I could corupt the file, but actualy it fixed me software-properties-gtk.

Thank to Kristian for a hint.

akanewsted (akanewsted) wrote :

Opened with gedit and saved.. also worked for me,

thank you mondhs

Martin Pitt (pitti) on 2012-10-21
Changed in ubuntu-drivers-common (Ubuntu Quantal):
status: Triaged → Invalid
Changed in ubuntu-drivers-common (Ubuntu Raring):
status: Triaged → Invalid
teranex (teranex) wrote :

I had this problem with the eid-mw and eid-viewer packages. These are provided by the Belgian governement for our electronic passports so I could have guessed that those would be the problem... anyway, I edited both files, changed the 'ë' in e and it fixed the problem

rod singleton (rod40cool) wrote :

Confirmed fixed with me also using gedit as per #19. Davmail was the culprit for me.

Thanks Kristian & mondhs

xyloman (xyloman) wrote :

Davmail was also the issue for me. Opening /var/lib/dpkg/status in gedit and saving it resolved the issue with opening sofware-properties-gtk.

Jan-Åke Larsson (jalar) wrote :

Davmail was the issue for me too. Anyone cared to tell mr Guessant?

Martin Pitt (pitti) on 2012-10-23
Changed in ubuntu-drivers-common (Ubuntu Quantal):
status: Invalid → Confirmed
Changed in ubuntu-drivers-common (Ubuntu Raring):
status: Invalid → Confirmed
Martin Pitt (pitti) on 2012-10-23
affects: ubuntu-drivers-common (Ubuntu Raring) → dpkg (Ubuntu Raring)
summary: - software-properties-gtk cannot launch
+ installing davmail breaks /var/lib/dpkg/available

Arguably dpkg could be enforcing the policy requirement that all package fields be UTF8-encoded. However, that doesn't help users who have already installed this package - dpkg isn't going to scrub this data for already-seen packages. The consumers of this data really need to cope with the wrong encodings.

Changed in ubuntu-drivers-common (Ubuntu Quantal):
status: New → Confirmed
importance: Undecided → High
Changed in ubuntu-drivers-common (Ubuntu Raring):
status: New → Confirmed
importance: Undecided → High
Martin Pitt (pitti) on 2012-10-23
Changed in dpkg (Ubuntu Quantal):
status: Confirmed → Invalid
Changed in ubuntu-drivers-common (Ubuntu Raring):
status: Confirmed → Triaged
Martin Pitt (pitti) on 2012-10-23
Changed in ubuntu-drivers-common (Ubuntu Raring):
assignee: nobody → Martin Pitt (pitti)
Changed in ubuntu-drivers-common (Ubuntu Quantal):
status: Confirmed → Triaged
Changed in dpkg (Ubuntu Raring):
status: Confirmed → Invalid
Martin Pitt (pitti) on 2012-10-23
summary: - installing davmail breaks /var/lib/dpkg/available
+ UnicodeDecodeError from broken package descriptions

I drop him (Micka�l Guessant, davmail) a note.

Mickaël Guessant (mguessan) wrote :

Confirmed: it's a bug in ant-deb-task which does not force file encoding => target control file encoding depends on build platform encoding

Mickaël Guessant (mguessan) wrote :

Fixed for next release: force UTF-8 file.encoding at build time

pabroome@gmail.com (pabroome) wrote :

This worked for me too jut had toedit the status file and save as UTF-8 many many thanks!

Paul

Rofko (lukejtmason) wrote :

I have had this problem twice in a couple of days - first with davmail, which I just removed using apt-get, and then with another programme, Scrivener (beta for linux, but very reputable) , which placed illegal characters in the same way. Removed them and saved - resolved the problem.
Rfk

Martin Pitt (pitti) on 2012-11-07
Changed in ubuntu-drivers-common (Ubuntu Raring):
status: Triaged → Fix Committed
dir schneid (d-schneid) on 2012-11-08
Changed in ubuntu-drivers-common (Ubuntu Raring):
status: Fix Committed → Fix Released
Changed in ubuntu-drivers-common (Ubuntu Raring):
status: Fix Released → Fix Committed
Martin Pitt (pitti) wrote :

I forgot to close the bug in the changelog:

ubuntu-drivers-common (1:0.2.72) raring; urgency=low

  [ Matthias Klose ]
  * Build-depend on python3-all.

  [ Dmitrijs Ledkovs ]
  * Use /usr/bin/python3 shebang.

  [ Martin Pitt ]
  * debian/tests/system: Fix duplicate output of error message for test
    failures.
  * tests/ubuntu_drivers.py, test_devices_detect_plugins(): Fix failure if
    special.py occurs first in the output. This bug was triggered by Python
    3.3's new hash randomization behaviour. (LP: #1071997)
  * UbuntuDrivers/detect.py: Fix UnicodeDecodeError crash when encountering a
    package with invalid UTF-8 encoding. Just skip those packages instead. Add
    test to tests/ubuntu_drivers.py.

 -- Martin Pitt <email address hidden> Wed, 07 Nov 2012 15:47:19 +0100

Changed in ubuntu-drivers-common (Ubuntu Raring):
status: Fix Committed → Fix Released
cyd (cyd) wrote :

Removed davmail and working

adamski99 (adamsomerville) wrote :

so i dont have any reference to davmail in /var/lib/dpkg/source or /var/lib/dpkg/available here on 12.10, can someone sugest a way to find the offending character?

cheers

kenan (kenan23) on 2012-12-22
Changed in ubuntu-drivers-common (Ubuntu Quantal):
status: Triaged → Fix Released
Derek (bugs-m8y) wrote :

adamski99 - you can use the various iconv commands in the comments to locate the problematic line in your file, then edit it. You'd probably want to mention the package here, too.

Personally, it was davmail. Thanks for the fix.

Xavier Claessens (zdra) wrote :

For belgians: This bug happens if you install beid packages provided by the gov, because the maintainer's name is not valid UTF8.

Schlomo Schapiro (sschapiro) wrote :

Extremely annoying. I can imagine that most "users" actually have no chance of fixing this!

My problem is that the error remains evean after fixing the bad davmail packager. Any ideas what else to check?

Dustin Falgout (lots0logs) wrote :

I am also experiencing this issue on a Mint Linux 14 Cinnamon. I tried finding invalid characters in the files listed but there were none. I saved each file with gedit which did not help. I did have davmail installed for a day, but I uninstalled it before this error started.

software-properties-gtk --debug
Fontconfig warning: "/etc/fonts/conf.d/50-user.conf", line 9: reading configurations from ~/.fonts.conf is deprecated.
gpg: /tmp/tmpoqfkhc/trustdb.gpg: trustdb created
ENABLED COMPS: {'import', 'main', 'backport', 'upstream'}
INTERNET COMPS: {'import', 'main', 'backport', 'upstream'}
MAIN SOURCES
 URI: http://packages.linuxmint.com/
 Comps: ['main', 'upstream', 'import', 'backport']
 Enabled: True
 Valid: True
 MatchURI: packages.linuxmint.com
 BaseURI: http://packages.linuxmint.com/

CHILD SOURCES
CDROM SOURCES
SOURCE CODE SOURCES
DISABLED SOURCES
ISV
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/UbuntuDrivers/detect.py", line 162, in packages_for_modalias
    cache_map = packages_for_modalias.cache_maps[apt_cache_hash]
KeyError: 3953453

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/software-properties-gtk", line 103, in <module>
    app = SoftwarePropertiesGtk(datadir=options.data_dir, options=options, file=file)
  File "/usr/lib/python3/dist-packages/softwareproperties/gtk/SoftwarePropertiesGtk.py", line 178, in __init__
    self.init_drivers()
  File "/usr/lib/python3/dist-packages/softwareproperties/gtk/SoftwarePropertiesGtk.py", line 1097, in init_drivers
    self.devices = detect.system_device_drivers()
  File "/usr/lib/python3/dist-packages/UbuntuDrivers/detect.py", line 415, in system_device_drivers
    for pkg, pkginfo in system_driver_packages(apt_cache).items():
  File "/usr/lib/python3/dist-packages/UbuntuDrivers/detect.py", line 319, in system_driver_packages
    for p in packages_for_modalias(apt_cache, alias):
  File "/usr/lib/python3/dist-packages/UbuntuDrivers/detect.py", line 164, in packages_for_modalias
    cache_map = _apt_cache_modalias_map(apt_cache)
  File "/usr/lib/python3/dist-packages/UbuntuDrivers/detect.py", line 129, in _apt_cache_modalias_map
    m = package.candidate.record['Modaliases']
  File "/usr/lib/python3/dist-packages/apt/package.py", line 429, in record
    return Record(self._records.record)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xeb in position 74: invalid continuation byte

serpass (serpass) wrote :

I had this problem caused by " zygrib" in Xubuntu (voyager) 12.10

Paul Anderson (paulimach) wrote :

i have slightly different bug: instead of position 114 it is position 796, like doug above who has position 74, that must have something to do with it?

Brandon Raabe (brandocorp) wrote :

Wanted to post and confirm that I had the same problem, and the fix in posts 16 and 17 fixed this for me. Wanted to say thanks!

Using some of the above suggestions, I am still unable to identify the problematic character.

grep-status -r . | iconv -f utf-8 -t ucs-2le 1> /dev/null; echo $?
iconv: illegal input sequence at position 1907586
1

Sooo... What file do I need to investigate?

I don't have /var/lib/dpkg/source, and /var/lib/dpkg/available doesn't have a line number 1907586.
Possibly 1907586 is a character number rather than a line number, but I don't know how to seek to character number.

It seems the odd characters do in fact have a valid UTF-8 encoding, but for some reason they have been encoded incorrectly. I was able to fix them as follows:

cat /var/lib/dpkg/status | iconv -c -f utf-8 -t utf-8 > /tmp/status.fixed
cat /var/lib/dpkg/available | iconv -c -f utf-8 -t utf-8 > /tmp/available.fixed

Now you still have to replace the originals with the fixed copies. In my case, there were about 100 offending packages:

hwdata ("Noël Köthe" -> "Noël Köthe")
shared-mime-info ("Sebastian Dröge" -> "Sebastian Dröge")
glines
...

I have the impression there is a structural root cause for this, it's not just about a rare and obscure package with a rogue character.

Neil Danziger (dnzgr) wrote :

I also was affected by this bug, caused by the third party package Scrivener (see comment #30 above by Rofko (lukejtmason)), and resolved it by editing removing the improperly encoded characters from /var/lib/dpkg/status.

g.bruno (g-bruno) wrote :

After upgrading from Ubuntu 12.04 LTS to 14.04.1 LTS I have the same error, slightly different:

root@amd8:/home/helmut# software-properties-gtk
Traceback (most recent call last):
  File "/usr/bin/software-properties-gtk", line 101, in <module>
    app = SoftwarePropertiesGtk(datadir=options.data_dir, options=options, file=file)
  File "/usr/lib/python3/dist-packages/softwareproperties/gtk/SoftwarePropertiesGtk.py", line 169, in __init__
    self.show_keys()
  File "/usr/lib/python3/dist-packages/softwareproperties/gtk/SoftwarePropertiesGtk.py", line 846, in show_keys
    for key in self.apt_key.list():
  File "/usr/lib/python3/dist-packages/softwareproperties/AptAuth.py", line 75, in list
    for line in p:
  File "/usr/lib/python3.4/codecs.py", line 313, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 1440: invalid start byte

I tried the methods described above with gedit and "cat /var/lib/dpkg/available | iconv -c -f utf-8 -t utf-8 > /tmp/available.fixed
root@amd8:/var/lib/dpkg# cat /var/lib/dpkg/available | iconv -c -f utf-8 -t utf-8 > /tmp/available.fixed, but the error is still present. I did not install davmail etc.

Can anyone help me? Ubuntu 14.04.1 is quite new, perhaps are there other reasons.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.