src:texinfo fails to import (importer) or download (pull-debian-source) with ASCII decoding issue

Bug #1700846 reported by Nish Aravamudan on 2017-06-27
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
usd-importer
Undecided
Robie Basak
ubuntu-dev-tools (Ubuntu)
Undecided
Robie Basak

Bug Description

06/27/2017 12:50:58 - DEBUG:Updating importer/ubuntu/breezy-devel to importer/ubuntu/breezy-security
06/27/2017 12:51:00 - DEBUG:Cache hit on (<gitubuntu.git_repository.GitUbuntuRepository object at 0x7f0257da97b8>, 'af4fca4aa53c2a05132803d35e451a8e0caf44fe', <ChangelogField.version: 1>) {} = 4.7-2.2ubuntu2.1..
06/27/2017 12:51:00 - DEBUG:Cache hit on (<gitubuntu.git_repository.GitUbuntuRepository object at 0x7f0257da97b8>, 'af4fca4aa53c2a05132803d35e451a8e0caf44fe', <ChangelogField.previous_version: 2>) {} = 4.7-2.2ubuntu2..
06/27/2017 12:51:00 - DEBUG:Executing: sh -c 'echo 1f4bae2aaa704ea4a731a341b45e49e089fc1bb4:debian/changelog | git cat-file --batch --follow-symlinks | sed -n '1{/^[^ ]* blob/!{p;q1}};2,$p' | dpkg-parsechangelog -l- -n1 -SVersion'
06/27/2017 12:51:00 - DEBUG:Executing: sh -c 'echo 1f4bae2aaa704ea4a731a341b45e49e089fc1bb4:debian/changelog | git cat-file --batch --follow-symlinks | sed -n '1{/^[^ ]* blob/!{p;q1}};2,$p' | dpkg-parsechangelog -l- -n1 -o1 -SVersion'
06/27/2017 12:51:00 - DEBUG:Cache hit on (<gitubuntu.git_repository.GitUbuntuRepository object at 0x7f0257da97b8>, 'f32b9f41e23679b8375bc37842fcbf584ac824f3', <ChangelogField.maintainer: 3>) {} = Kees Cook <email address hidden>..
06/27/2017 12:51:00 - DEBUG:Cache hit on (<gitubuntu.git_repository.GitUbuntuRepository object at 0x7f0257da97b8>, 'f32b9f41e23679b8375bc37842fcbf584ac824f3', <ChangelogField.date: 4>) {} = Fri, 3 Nov 2006 17:08:46 -080..
06/27/2017 12:51:00 - DEBUG:Executing: git commit-tree f32b9f41e23679b8375bc37842fcbf584ac824f3 -p af4fca4aa53c2a05132803d35e451a8e0caf44fe -p 1f4bae2aaa704ea4a731a341b45e49e089fc1bb4 -F /tmp/tmpgzlnl4_t
06/27/2017 12:51:00 - DEBUG:Cache hit on (<gitubuntu.git_repository.GitUbuntuRepository object at 0x7f0257da97b8>, '3be8275ab70e9085b0bb52d544e59d6f58966802', <ChangelogField.version: 1>) {} = 4.8.dfsg.1-3..
06/27/2017 12:51:00 - DEBUG:Cache hit on (<gitubuntu.git_repository.GitUbuntuRepository object at 0x7f0257da97b8>, '3be8275ab70e9085b0bb52d544e59d6f58966802', <ChangelogField.previous_version: 2>) {} = 4.8.dfsg.1-2..
06/27/2017 12:51:04 - INFO:Importing patches-unapplied 4.8.dfsg.1-4 to ubuntu/feisty
/usr/lib/python3/dist-packages/debian/deb822.py:216: UnicodeWarning: decoding from utf-8 failed; attempting to detect the true encoding
  UnicodeWarning)
06/27/2017 12:51:11 - DEBUG:EUC-JP Japanese prober hit error at byte 19
06/27/2017 12:51:11 - DEBUG:EUC-KR Korean prober hit error at byte 19
06/27/2017 12:51:11 - DEBUG:CP949 Korean prober hit error at byte 19
06/27/2017 12:51:11 - DEBUG:EUC-TW Taiwan prober hit error at byte 19
06/27/2017 12:51:11 - DEBUG:utf-8 not active
06/27/2017 12:51:11 - DEBUG:CP932 Japanese confidence = 0.01
06/27/2017 12:51:11 - DEBUG:EUC-JP not active
06/27/2017 12:51:11 - DEBUG:GB2312 Chinese confidence = 0.01
06/27/2017 12:51:11 - DEBUG:EUC-KR not active
06/27/2017 12:51:11 - DEBUG:CP949 not active
06/27/2017 12:51:11 - DEBUG:Big5 Chinese confidence = 0.01
06/27/2017 12:51:11 - DEBUG:EUC-TW not active
06/27/2017 12:51:11 - DEBUG:windows-1251 Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:KOI8-R Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:ISO-8859-5 Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:MacCyrillic Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:IBM866 Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:IBM855 Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:ISO-8859-7 Greek confidence = 0.0
06/27/2017 12:51:11 - DEBUG:windows-1253 Greek confidence = 0.0
06/27/2017 12:51:11 - DEBUG:ISO-8859-5 Bulgairan confidence = 0.01
06/27/2017 12:51:11 - DEBUG:windows-1251 Bulgarian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:TIS-620 Thai confidence = 0.01
06/27/2017 12:51:11 - DEBUG:ISO-8859-9 Turkish confidence = 0.7729647837244535
06/27/2017 12:51:11 - DEBUG:windows-1255 Hebrew confidence = 0.0
06/27/2017 12:51:11 - DEBUG:windows-1255 Hebrew confidence = 0.0
06/27/2017 12:51:11 - DEBUG:windows-1255 Hebrew confidence = 0.0
06/27/2017 12:51:11 - DEBUG:windows-1251 Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:KOI8-R Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:ISO-8859-5 Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:MacCyrillic Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:IBM866 Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:IBM855 Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:ISO-8859-7 Greek confidence = 0.0
06/27/2017 12:51:11 - DEBUG:windows-1253 Greek confidence = 0.0
06/27/2017 12:51:11 - DEBUG:ISO-8859-5 Bulgairan confidence = 0.01
06/27/2017 12:51:11 - DEBUG:windows-1251 Bulgarian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:TIS-620 Thai confidence = 0.01
06/27/2017 12:51:11 - DEBUG:ISO-8859-9 Turkish confidence = 0.7729647837244535
06/27/2017 12:51:11 - DEBUG:windows-1255 Hebrew confidence = 0.0
06/27/2017 12:51:11 - DEBUG:windows-1255 Hebrew confidence = 0.0
06/27/2017 12:51:11 - DEBUG:windows-1255 Hebrew confidence = 0.0
06/27/2017 12:51:20 - DEBUG:Executing: git checkout --orphan master
06/27/2017 12:51:20 - DEBUG:Executing: git reset --hard
06/27/2017 12:51:20 - DEBUG:Executing: git clean -f -d
Traceback (most recent call last):
  File "/home/nacc/work/usd-importer/gitubuntu/importer.py", line 1094, in import_publishes
    import_func(srcpkg_information)
  File "/home/nacc/work/usd-importer/gitubuntu/importer.py", line 753, in import_unapplied_spi
    GitUbuntuDsc(spi.dsc_pathname),
  File "/home/nacc/work/usd-importer/gitubuntu/dsc.py", line 22, in __init__
    super(GitUbuntuDsc, self).__init__(dscf)
  File "/usr/lib/python3/dist-packages/debian/deb822.py", line 1251, in __init__
    self._bytes(s, encoding) for s in sequence)
  File "/usr/lib/python3/dist-packages/debian/deb822.py", line 649, in split_gpg_and_payload
    for line in sequence:
  File "/usr/lib/python3/dist-packages/debian/deb822.py", line 1251, in <genexpr>
    self._bytes(s, encoding) for s in sequence)
  File "/usr/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 314: invalid start byte

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/nacc/work/usd-importer/bin/git-ubuntu", line 18, in <module>
    main()
  File "/home/nacc/work/usd-importer/gitubuntu/__main__.py", line 203, in main
    args.func(args)
  File "/home/nacc/work/usd-importer/gitubuntu/importer.py", line 1240, in main
    ubuntu_head_versions=ubuntu_head_versions)
  File "/home/nacc/work/usd-importer/gitubuntu/importer.py", line 1110, in import_publishes
    raise GitUbuntuImportException(msg) from e
gitubuntu.importer.GitUbuntuImportException: Unable to import patches-unapplied 4.8.dfsg.1-4 to ubuntu
06/27/2017 12:51:21 - INFO:Leaving /tmp/tmpt998hlld as directed

Related branches

Nish Aravamudan (nacc) wrote :

Robie, can you look at this and see if you can reproduce? I see it on my bastion and on my laptop with 17.10.

Robie Basak (racb) on 2017-06-28
tags: added: import-edge-case
Robie Basak (racb) wrote :

Reproduced on 16.04.

Robie Basak (racb) wrote :

"pull-debian-source -d texinfo 4.8.dfsg.1-4" also fails with a similar error.

Robie Basak (racb) wrote :

However "pull-lp-source -d texinfo 4.8.dfsg.1-4" succeeds.

Nish Aravamudan (nacc) wrote :

On 17.10:

$ pull-debian-source -d texinfo 4.8.dfsg.1-4
pull-debian-source: Downloading texinfo version 4.8.dfsg.1-4
pull-debian-source: Using rmadison for component determination
pull-debian-source: Guessing component from most recent upload
/usr/lib/python2.7/dist-packages/debian/deb822.py:216: UnicodeWarning: decoding from utf-8 failed; attempting to detect the true encoding
  UnicodeWarning)
pull-debian-source: Error: Signature on texinfo_4.8.dfsg.1-4.dsc could not be verified
pull-debian-source: Error: Failed to download: Signature on texinfo_4.8.dfsg.1-4.dsc could not be verified

$ pull-lp-source -d texinfo 4.8.dfsg.1-4
pull-lp-source: Downloading texinfo version 4.8.dfsg.1-4
/usr/lib/python2.7/dist-packages/debian/deb822.py:216: UnicodeWarning: decoding from utf-8 failed; attempting to detect the true encoding
  UnicodeWarning)
pull-lp-source: Downloading texinfo_4.8.dfsg.1.orig.tar.gz from archive.ubuntu.com (1.837 MiB)
pull-lp-source: Downloading texinfo_4.8.dfsg.1.orig.tar.gz from launchpad.net (1.837 MiB)
pull-lp-source: Downloading texinfo_4.8.dfsg.1-4.diff.gz from archive.ubuntu.com (0.097 MiB)
pull-lp-source: Downloading texinfo_4.8.dfsg.1-4.diff.gz from launchpad.net (0.097 MiB)

Robie Basak (racb) wrote :

The cause is that the original uploaded dsc contains invalid UTF-8:

Uploaders: Frank K<FC>ster <email address hidden>

Changed in usd-importer:
assignee: nobody → Robie Basak (racb)
status: New → In Progress
Robie Basak (racb) wrote :

Debian policy 3.8.1.0 first mandated UTF-8 in control files in March 2009. The dsc file in question is from November 2006. So it was valid by the policy that applied at the time. This suggests that we must be able to handle non-UTF-8 correctly for historical source packages.

With an undefined codec, perhaps errors='replace' would be appropriate.

Robie Basak (racb) wrote :

Looks like pull-debian-source already opens as binary, but pull-debian-source is still Python 2 and in that case the autodetection appears to fail. Converting pull-debian-source to Python 3 with no other direct change fixes it.

So we need two fixes: one for the importer, and one in ubuntu-dev-tools.

summary: - src:texinfo fails to import with ASCII decoding issue
+ src:texinfo fails to import (importer) or download (pull-debian-source)
+ with ASCII decoding issue
Changed in ubuntu-dev-tools (Ubuntu):
assignee: nobody → Robie Basak (racb)
status: New → In Progress
Robie Basak (racb) on 2017-06-30
tags: added: hash-abi-break
Nish Aravamudan (nacc) on 2017-07-14
Changed in usd-importer:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers