pbr chokes when description contains unicode characters

Bug #1704472 reported by Sorin Sbarnea on 2017-07-14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Herve Beraud

Bug Description

Reproductible even with the latest pbr: pbr-3.1.1

python3.6 setup.py egg_info [20:41:39]
ERROR:root:Error parsing
Traceback (most recent call last):
  File "/Users/ssbarnea/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pbr/core.py", line 111, in pbr
    attrs = util.cfg_to_args(path, dist.script_args)
  File "/Users/ssbarnea/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pbr/util.py", line 251, in cfg_to_args
    kwargs = setup_cfg_to_setup_kwargs(config, script_args)
  File "/Users/ssbarnea/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pbr/util.py", line 315, in setup_cfg_to_setup_kwargs
    value += description_file.read().strip() + '\n\n'
  File "/Users/ssbarnea/.pyenv/versions/3.6.0/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 33762: ordinal not in range(128)

Sorin Sbarnea (ssbarnea) wrote :

This is a serious issue because the chance of having Unicode characters inside description is huge if you append the automatically generated ChangeLog to it.

Jan Vlčinský (jan-vlcinsky) wrote :

The problem is, that when the file is open, no encoding is specified so it is dependent on current default console encoding what is definitely not deterministic.

Fix is easy: open the file using "utf-8" encoding.

See http://git.openstack.org/cgit/openstack-dev/pbr/tree/pbr/util.py?h=3.1.1#n313 where

`description_file = open(filename)`

shall change to:

`description_file = open(filename, encoding="utf-8")`

It assumes the file is encoded in UTF-8, what seems reasonable (but would be great to publish such assumption).

Other option would be to read file encoding from somewhere else such as from `description-content-type` field, but this seems a bit too much effort without much advantage.

Ben Nemec (bnemec) on 2018-04-27
Changed in pbr:
status: New → Confirmed
importance: Undecided → High

Fix proposed to branch: master
Review: https://review.openstack.org/564874

Changed in pbr:
assignee: nobody → Ben Nemec (bnemec)
status: Confirmed → In Progress
Changed in pbr:
assignee: Ben Nemec (bnemec) → Herve Beraud (herveberaud)

Reviewed: https://review.opendev.org/564874
Committed: https://git.openstack.org/cgit/openstack/pbr/commit/?id=3b102a551bb2518682a0da4e6065feeb7f20807a
Submitter: Zuul
Branch: master

commit 3b102a551bb2518682a0da4e6065feeb7f20807a
Author: Ben Nemec <email address hidden>
Date: Fri Apr 27 20:11:53 2018 +0000

    Read description file as utf-8

    Currently pbr fails if the description file contains unicode
    characters. To fix this we need to open the description file as
    utf-8 explicitly. Since open() in Python 2 doesn't support an
    encoding parameter, use io.open() which works on both 2 and 3.

    Co-Authored-By: Hervé Beraud<email address hidden>

    Change-Id: I1bee502ac84b474cc9db5523d2437a8c0a861c00
    Closes-Bug: 1704472

Changed in pbr:
status: In Progress → Fix Released

This issue was fixed in the openstack/pbr 5.3.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers