PBR

pbr chokes when description contains unicode characters

Bug #1704472 reported by Sorin Sbarnea
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
PBR
Fix Released
High
Herve Beraud

Bug Description

Reproductible even with the latest pbr: pbr-3.1.1

python3.6 setup.py egg_info [20:41:39]
ERROR:root:Error parsing
Traceback (most recent call last):
  File "/Users/ssbarnea/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pbr/core.py", line 111, in pbr
    attrs = util.cfg_to_args(path, dist.script_args)
  File "/Users/ssbarnea/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pbr/util.py", line 251, in cfg_to_args
    kwargs = setup_cfg_to_setup_kwargs(config, script_args)
  File "/Users/ssbarnea/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pbr/util.py", line 315, in setup_cfg_to_setup_kwargs
    value += description_file.read().strip() + '\n\n'
  File "/Users/ssbarnea/.pyenv/versions/3.6.0/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 33762: ordinal not in range(128)

Tags: unicode
Revision history for this message
Sorin Sbarnea (ssbarnea) wrote :

This is a serious issue because the chance of having Unicode characters inside description is huge if you append the automatically generated ChangeLog to it.

Revision history for this message
Jan Vlčinský (jan-vlcinsky) wrote :

The problem is, that when the file is open, no encoding is specified so it is dependent on current default console encoding what is definitely not deterministic.

Fix is easy: open the file using "utf-8" encoding.

See http://git.openstack.org/cgit/openstack-dev/pbr/tree/pbr/util.py?h=3.1.1#n313 where

`description_file = open(filename)`

shall change to:

`description_file = open(filename, encoding="utf-8")`

It assumes the file is encoded in UTF-8, what seems reasonable (but would be great to publish such assumption).

Other option would be to read file encoding from somewhere else such as from `description-content-type` field, but this seems a bit too much effort without much advantage.

Ben Nemec (bnemec)
Changed in pbr:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to pbr (master)

Fix proposed to branch: master
Review: https://review.openstack.org/564874

Changed in pbr:
assignee: nobody → Ben Nemec (bnemec)
status: Confirmed → In Progress
Changed in pbr:
assignee: Ben Nemec (bnemec) → Herve Beraud (herveberaud)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to pbr (master)

Reviewed: https://review.opendev.org/564874
Committed: https://git.openstack.org/cgit/openstack/pbr/commit/?id=3b102a551bb2518682a0da4e6065feeb7f20807a
Submitter: Zuul
Branch: master

commit 3b102a551bb2518682a0da4e6065feeb7f20807a
Author: Ben Nemec <email address hidden>
Date: Fri Apr 27 20:11:53 2018 +0000

    Read description file as utf-8

    Currently pbr fails if the description file contains unicode
    characters. To fix this we need to open the description file as
    utf-8 explicitly. Since open() in Python 2 doesn't support an
    encoding parameter, use io.open() which works on both 2 and 3.

    Co-Authored-By: Hervé Beraud<email address hidden>

    Change-Id: I1bee502ac84b474cc9db5523d2437a8c0a861c00
    Closes-Bug: 1704472

Changed in pbr:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/pbr 5.3.0

This issue was fixed in the openstack/pbr 5.3.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.