Debian imports crash on non-UTF-8 filenames

Bug #1917449 reported by Colin Watson
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Colin Watson

Bug Description

The Debian importer recently started crashing like this:

2021-03-01 16:27:08 ERROR Unhandled exception
Traceback (most recent call last):
  File "/srv/debian-import.launchpad.net/production/launchpad-rev-cd61f0bfc5208dd4b58a15e953892eaabba1e0b8/lib/lp/services/scripts/base.py", line 108, in log_unhandled_exceptions_func
    return func(self, *args, **kw)
  File "/srv/debian-import.launchpad.net/production/launchpad-rev-cd61f0bfc5208dd4b58a15e953892eaabba1e0b8/lib/lp/services/scripts/base.py", line 376, in lock_and_run
    self.run(use_web_security=use_web_security, isolation=isolation)
  File "/srv/debian-import.launchpad.net/production/launchpad-rev-cd61f0bfc5208dd4b58a15e953892eaabba1e0b8/lib/lp/services/scripts/base.py", line 108, in log_unhandled_exceptions_func
    return func(self, *args, **kw)
  File "/srv/debian-import.launchpad.net/production/launchpad-rev-cd61f0bfc5208dd4b58a15e953892eaabba1e0b8/lib/lp/services/scripts/base.py", line 330, in run
    self.main()
  File "/srv/debian-import.launchpad.net/production/launchpad/scripts/gina.py", line 87, in main
    run_gina(self.options, self.txn, target_section)
  File "/srv/debian-import.launchpad.net/production/launchpad-rev-cd61f0bfc5208dd4b58a15e953892eaabba1e0b8/lib/lp/soyuz/scripts/gina/runner.py", line 98, in run_gina
    import_sourcepackages(distro, packages_map, package_root, importer_handler)
  File "/srv/debian-import.launchpad.net/production/launchpad-rev-cd61f0bfc5208dd4b58a15e953892eaabba1e0b8/lib/lp/soyuz/scripts/gina/runner.py", line 164, in import_sourcepackages
    distro, source, package_root, importer_handler)
  File "/srv/debian-import.launchpad.net/production/launchpad-rev-cd61f0bfc5208dd4b58a15e953892eaabba1e0b8/lib/lp/soyuz/scripts/gina/runner.py", line 128, in attempt_source_package_import
    distro, source, package_root, importer_handler)
  File "/srv/debian-import.launchpad.net/production/launchpad-rev-cd61f0bfc5208dd4b58a15e953892eaabba1e0b8/lib/lp/soyuz/scripts/gina/runner.py", line 181, in do_one_sourcepackage
    source_data.process_package(distro, package_root)
  File "/srv/debian-import.launchpad.net/production/launchpad-rev-cd61f0bfc5208dd4b58a15e953892eaabba1e0b8/lib/lp/soyuz/scripts/gina/packages.py", line 261, in process_package
    self.do_package(distro_name, archive_root)
  File "/srv/debian-import.launchpad.net/production/launchpad-rev-cd61f0bfc5208dd4b58a15e953892eaabba1e0b8/lib/lp/soyuz/scripts/gina/packages.py", line 385, in do_package
    archive_root)
  File "/srv/debian-import.launchpad.net/production/launchpad-rev-cd61f0bfc5208dd4b58a15e953892eaabba1e0b8/lib/lp/soyuz/scripts/gina/packages.py", line 153, in read_dsc
    shutil.rmtree(source_dir)
  File "/usr/lib/python2.7/shutil.py", line 241, in rmtree
    fullname = os.path.join(path, name)
  File "/usr/lib/python2.7/posixpath.py", line 73, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 1: ordinal not in range(128)
2021-03-01 16:27:08 INFO OOPS-c56cc0966e277b161de975f9c778a41e

This is due to a combination of two changes:

 * https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/398367, deployed to production on 2021-02-24, I think specifically line 1364 which caused the package and version attributes (among others) of SourcePackageData to be Unicode objects;
 * https://tracker.debian.org/news/1234379/accepted-aspell-is-0511-0-1-source-into-unstable/, which introduced a non-UTF-8 file name.

This combination triggered https://bugs.python.org/issue24672, roughly as follows:

  $ dpkg-source -x aspell-is_0.51.1-0-1.dsc
  $ python2 -c 'import shutil; shutil.rmtree(u"aspell-is-0.51.1-0")'
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/usr/lib/python2.7/shutil.py", line 264, in rmtree
      fullname = os.path.join(path, name)
    File "/usr/lib/python2.7/posixpath.py", line 73, in join
      path += '/' + b
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 1: ordinal not in range(128)

To avoid this Python 2 bug, we'll need to make sure that the string we pass to shutil.rmtree is a str object, not a unicode object.

Related branches

Colin Watson (cjwatson)
summary: - Debian imports crash on non-ASCII filenames
+ Debian imports crash on non-UTF-8 filenames
Colin Watson (cjwatson)
Changed in launchpad:
status: In Progress → Fix Committed
Colin Watson (cjwatson)
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.