Comment 6 for bug 1543899

Ulli Horlacher (framstag) wrote :

This bug is still there in ubuntu 16.04.1!

root@diaspora:~# lsb_release -d
Description: Ubuntu 16.04.1 LTS

root@diaspora:~# update-apt-xapian-index -vf
(...)
Rebuilding Xapian index... 0%Traceback (most recent call last):
  File "/usr/sbin/update-apt-xapian-index", line 111, in <module>
    indexer.rebuild(opts.pkgfile)
  File "/usr/lib/python3/dist-packages/axi/indexer.py", line 758, in rebuild
    self.buildIndex(dbdir, generator)
  File "/usr/lib/python3/dist-packages/axi/indexer.py", line 733, in buildIndex
    for doc in documents:
  File "/usr/lib/python3/dist-packages/axi/indexer.py", line 580, in gen_documents_apt
    yield self.get_document_from_apt(pkg)
  File "/usr/lib/python3/dist-packages/axi/indexer.py", line 543, in get_document_from_apt
    addon.obj.index(document, pkg)
  File "/usr/share/apt-xapian-index/plugins/descriptions.py", line 108, in index
    self.indexer.index_text_without_positions(version.raw_description)
  File "/usr/lib/python3/dist-packages/apt/package.py", line 499, in raw_description
    return self._records.long_desc
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 48: invalid continuation byte

I found a workaround:

root@diaspora:~# LC_ALL=en_US.utf8 update-apt-xapian-index -vf
(...)
Writing value information to /var/lib/apt-xapian-index/values.
Writing prefix information to /var/lib/apt-xapian-index/prefixes.
Writing documentation to /var/lib/apt-xapian-index/README.
root@diaspora:~#

==> no more UTF errors!

(I also have to modify /etc/cron.weekly/apt-xapian-index !)

update-apt-xapian-index cannot handle non-UFT locale! I have:

root@diaspora:~# locale
LANG=en_US.ISO-8859-15
LANGUAGE=en_US:en
LC_CTYPE="en_US.ISO-8859-15"
LC_NUMERIC="en_US.ISO-8859-15"
LC_TIME=en_DK.UTF-8
LC_COLLATE="en_US.ISO-8859-15"
LC_MONETARY="en_US.ISO-8859-15"
LC_MESSAGES="en_US.ISO-8859-15"
LC_PAPER="en_US.ISO-8859-15"
LC_NAME="en_US.ISO-8859-15"
LC_ADDRESS="en_US.ISO-8859-15"
LC_TELEPHONE="en_US.ISO-8859-15"
LC_MEASUREMENT="en_US.ISO-8859-15"
LC_IDENTIFICATION="en_US.ISO-8859-15"
LC_ALL=

Switching my system completly to en_US.utf8 is NOT an option for me, for
several reasons.