"first argument must be string or compiled pattern" with python 2.7.12-7

Bug #1644003 reported by Jelmer Vernooij on 2016-11-22
112
This bug affects 16 people
Affects Status Importance Assigned to Milestone
Bazaar
High
Unassigned

Bug Description

bzr's monkeypatching of various functions in the 're' module breaks when it's used with python 2.7.12-7 (in Debian):

% LANG=C LC_ALL=C bzr pull
first argument must be string or compiled pattern
first argument must be string or compiled pattern
first argument must be string or compiled pattern
first argument must be string or compiled pattern
bzr: ERROR: exceptions.TypeError: first argument must be string or compiled pattern

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 930, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 1121, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 673, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 697, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/cleanup.py", line 136, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/cleanup.py", line 166, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/builtins.py", line 1210, in run
    self.outf.write(gettext("Using saved parent location: %s\n") % display_url)
  File "/usr/lib/python2.7/dist-packages/bzrlib/lazy_import.py", line 117, in __call__
    return obj(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/i18n.py", line 40, in gettext
    install()
  File "/usr/lib/python2.7/dist-packages/bzrlib/i18n.py", line 92, in install
    _translations = install_translations(lang)
  File "/usr/lib/python2.7/dist-packages/bzrlib/i18n.py", line 114, in install_translations
    fallback=True)
  File "/usr/lib/python2.7/gettext.py", line 554, in translation
    t = _translations.setdefault(key, class_(fp))
  File "/usr/lib/python2.7/gettext.py", line 255, in __init__
    self._parse(fp)
  File "/usr/lib/python2.7/gettext.py", line 391, in _parse
    self.plural = c2py(plural)
  File "/usr/lib/python2.7/gettext.py", line 177, in c2py
    result, nexttok = _parse(_tokenize(plural))
  File "/usr/lib/python2.7/gettext.py", line 114, in _parse
    nexttok = next(tokens)
  File "/usr/lib/python2.7/gettext.py", line 85, in _tokenize
    for mo in re.finditer(_token_pattern, plural):
  File "/usr/lib/python2.7/re.py", line 190, in finditer
    return _compile(pattern, flags).finditer(string)
  File "/usr/lib/python2.7/re.py", line 247, in _compile
    raise TypeError, "first argument must be string or compiled pattern"
TypeError: first argument must be string or compiled pattern

bzr 2.8.0dev1 on python 2.7.12 (Linux-4.8.0-1-amd64-x86_64-with-debian-
    stretch-sid)
arguments: ['/usr/bin/bzr', 'pull']
plugins: bash_completion[2.8.0dev1], changelog_merge[2.8.0dev1],
    etckeeper[unknown], fastimport[0.14.0dev], grep[2.8.0dev1],
    launchpad[2.8.0dev1], netrc_credential_store[2.8.0dev1],
    news_merge[2.8.0dev1], po_merge[2.8.0dev1], weave_fmt[2.8.0dev1]
encoding: 'ascii', fsenc: 'utf8', lang: 'C'

*** Bazaar has encountered an internal error. This probably indicates a
    bug in Bazaar. You can help us fix it by filing a bug report at
        https://bugs.launchpad.net/bzr/+filebug
    including this traceback and a description of the problem.

Related branches

Vincent Ladeuil (vila) on 2016-11-25
summary: - breaks with bzr 2.7.12-7
+ breaks with python 2.7.12-7
Vincent Ladeuil (vila) on 2016-11-25
Changed in bzr:
importance: Undecided → High
status: New → Confirmed

> bzr's monkeypatching of various functions in the 're' module breaks when it's used with python 2.7.12-7 (in Debian):

I can't reproduce :-/

$ bzr --version
Bazaar (bzr) 2.8.0dev1
  Python interpreter: /usr/bin/python 2.7.12
  Python standard library: /usr/lib/python2.7
  Platform: Linux-4.4.0-47-generic-x86_64-with-debian-stretch-sid
  bzrlib: /usr/lib/python2.7/dist-packages/bzrlib

Setting LANG and LC_ALL makes no difference here.

Do you have more details around your diagnostic (what is that first argument that is neither a string nor a compiled pattern ?) or better ways to reproduce ?

Changed in bzr:
importance: High → Undecided
status: Confirmed → Incomplete
Russel Winder (russel) wrote :
Download full text (3.5 KiB)

Things like bzr status etc. also work, but as soon as you try pull or incoming or some such command in a branch, it fails. This is a branch I have of a mainline on Launchpad. If I pull or worse ask for help on pull, see the following. Thus the importance is Critical as bzr is unusable.

|> bzr help pull
bzr: ERROR: exceptions.TypeError: first argument must be string or compiled pattern

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 930, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 1121, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 673, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 697, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/cleanup.py", line 136, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/cleanup.py", line 166, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 1138, in ignore_pipe
    result = func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/builtins.py", line 4856, in run
    bzrlib.help.help(topic)
  File "/usr/lib/python2.7/dist-packages/bzrlib/help.py", line 54, in help
    outfile.write(source.get_help_text(shadowed_terms))
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 466, in get_help_text
    i18n.install() # Install i18n only for get_help_text for now.
  File "/usr/lib/python2.7/dist-packages/bzrlib/i18n.py", line 92, in install
    _translations = install_translations(lang)
  File "/usr/lib/python2.7/dist-packages/bzrlib/i18n.py", line 114, in install_translations
    fallback=True)
  File "/usr/lib/python2.7/gettext.py", line 554, in translation
    t = _translations.setdefault(key, class_(fp))
  File "/usr/lib/python2.7/gettext.py", line 255, in __init__
    self._parse(fp)
  File "/usr/lib/python2.7/gettext.py", line 391, in _parse
    self.plural = c2py(plural)
  File "/usr/lib/python2.7/gettext.py", line 177, in c2py
    result, nexttok = _parse(_tokenize(plural))
  File "/usr/lib/python2.7/gettext.py", line 114, in _parse
    nexttok = next(tokens)
  File "/usr/lib/python2.7/gettext.py", line 85, in _tokenize
    for mo in re.finditer(_token_pattern, plural):
  File "/usr/lib/python2.7/re.py", line 190, in finditer
    return _compile(pattern, flags).finditer(string)
  File "/usr/lib/python2.7/re.py", line 247, in _compile
    raise TypeError, "first argument must be string or compiled pattern"
TypeError: first argument must be string or compiled pattern

bzr 2.8.0dev1 on python 2.7.12 (Linux-4.8.0-1-amd64-x86_64-with-debian-
    stretch-sid)
arguments: ['/usr/bin/bzr', 'help', 'pull']
plugins: bash_completion[2.8.0dev1], bookmarks[2.3.0dev], bzrtools[2.6.0],
    changelog_merge[2.8.0dev1], explorer[1.3.0], fastimport[0.14.0dev],
    grep[2.8.0dev1], launchpad[2.8.0dev1], netrc_credential_store[2.8.0de...

Read more...

Vincent Ladeuil (vila) wrote :

@Rusell: I do realize it's a critical issue for you, but without a way to reproduce it's hard to propose a fix or workarounds.

In both reports translations / locale are involved.

Jelmer is suggesting that the 're' monkeypatching is the culprit, you could verify that by commenting the:

    re.compile = _real_re_compile

line in bzrlib/lazy_regexp.py

Changed in bzr:
importance: Undecided → High
Vincent Ladeuil (vila) wrote :

In both reports translations / locale are involved so trying various locales may give more info to help diagnosing/reproducing the issue.

On Fri, Nov 25, 2016 at 05:35:05PM -0000, Vincent Ladeuil wrote:
> In both reports translations / locale are involved so trying various
> locales may give more info to help diagnosing/reproducing the issue.
Note that you need to trigger the regex compilation to reproduce, and
gettext needs to be involved. This seems to involve adding de_DE.UTF-8 to
 /etc/locale.gen and running locale-gen.

Jelmer

Reproduced \o/ Thanks Jelmer !

Adding 'de_DE.UTF-8 UTF-8' to /etc/locale.gen and running 'locale-gen' and then:

$ LANG=de_DE.UTF-8 LC_ALL=de_DE.UTF-8 ./bzr pull

I get the same issue.

@Russel:

Commenting line 115 in bzrlib/lazy_regexp.py:

=== modified file 'bzrlib/lazy_regex.py'
--- bzrlib/lazy_regex.py 2011-12-19 13:23:58 +0000
+++ bzrlib/lazy_regex.py 2016-11-26 10:32:53 +0000
@@ -112,7 +112,7 @@
     This overrides re.compile with lazy_compile. To restore the original
     functionality, call reset_compile().
     """
- re.compile = lazy_compile
+# re.compile = lazy_compile

 def reset_compile():

works around the issue:

LANG=de_DE.UTF-8 LC_ALL=de_DE.UTF-8 BZR_PDB=1 ./bzr pull
Gespeichertes ├╝bergeordnetes Verzeichnis wird verwendet: http://bazaar.launchpad.net/~bzr-pqm/bzr/bzr.dev/
No revisions or tags to pull.
bzr: warning: some compiled extensions could not be loaded; see <https://answers.launchpad.net/bzr/+faq/703>

Can you confirm this works for you while I'll dig to fix ?

Changed in bzr:
assignee: nobody → Vincent Ladeuil (vila)
Vincent Ladeuil (vila) on 2016-11-26
Changed in bzr:
status: Incomplete → Confirmed
Vincent Ladeuil (vila) wrote :

Confirming that, as Jelmer hinted, the issue is lazy_regexp monkey-patching re.compile.

re.compile calls re._compile.

The issue is that, gettext now use re.finditer which calls re._compile with a LazyRegexp created by a first call to re.compile (for _token_pattern for that matter). I.e. it escapes the proxying.

Changed in bzr:
status: Confirmed → In Progress
Vincent Ladeuil (vila) wrote :

More info:

https://bugs.python.org/issue28563 fix https://hg.python.org/cpython/rev/e0cc3fadd7b3 is indeed what creates the issue.

$ ./bzr selftest test_i18n.TestInstall

catches the issue

=== modified file 'bzrlib/lazy_regex.py'
*** bzrlib/lazy_regex.py 2011-12-19 13:23:58 +0000
--- bzrlib/lazy_regex.py 2016-11-26 12:57:20 +0000
***************
*** 131,133 ****
--- 131,141 ----
      raise AssertionError(
          "re.compile has already been overridden as lazy_compile, but this wou\
ld" \
          " cause infinite recursion")
+ # re.finditer get confused if it receives a LazyRegex
+ if getattr(re, 'finditer', None is not None):
+ def finditer_public(pattern, string, flags=0):
+ if isinstance(pattern, LazyRegex):
+ return pattern.finditer(string)
+ else:
+ return _real_re_compile(pattern, flags).finditer(string)
+ re.finditer = finditer_public

makes the test pass again.

I'll prepare a branch asap.

Jelmer Vernooij (jelmer) wrote :

Thanks for the quick patch!

Russel Winder (russel) wrote :

I go the the LJC Open Conference for a day, and it is all fixed :-) Thanks for the persistence and efforts.

Is this likely to get pushed through the Debian Sid and Fedora Rawhide systems as a package update relatively quickly, or should I create a private Bazaar installation in case it takes a while?

I'm uploading to sid today, not sure about rawhide.

On 27 November 2016 08:09:41 GMT+00:00, Russel Winder <email address hidden> wrote:
>I go the the LJC Open Conference for a day, and it is all fixed :-)
>Thanks for the persistence and efforts.
>
>Is this likely to get pushed through the Debian Sid and Fedora Rawhide
>systems as a package update relatively quickly, or should I create a
>private Bazaar installation in case it takes a while?

Sid is my primary platform so that is great, thanks.

Bizarrely everything is still apparently working fine on Rawhide. Python 2.7.12 and Bazaar 2.7.0-14. I see though that the Bazaar package is the Fedora 25 one, it has not been rebuilt for Fedora 26 as yet.

On Sun, Nov 27, 2016 at 12:42:14PM -0000, Russel Winder wrote:
> Sid is my primary platform so that is great, thanks.
Sid upload just happened, barring any archive errors it should appear
in a couple of hours.

> Bizarrely everything is still apparently working fine on Rawhide. Python
> 2.7.12 and Bazaar 2.7.0-14. I see though that the Bazaar package is the
> Fedora 25 one, it has not been rebuilt for Fedora 26 as yet.
It's not 2.7.12 specifically that is problematic, but a patch that
was added in Debian revision 7. This is the relevant line from the
chnagelog:

    - Issue #28563: Fixed possible DoS and arbitrary code execution when handle
      plural form selections in the gettext module. The expression parser now
      supports exact syntax supported by GNU gettext.

Vincent already mentioned the related upstream bug earlier in this
Laucnhpad bug report.

Sid is now fixed, I can bzr pull again.

Thanks Vincent for fixing, and Jelmer for deploying.

I now have been affected by this bug on OSX using macPorts:

bzr 2.6.0 on python 2.7.13 (Darwin-16.3.0-x86_64-i386-64bit)

sh-3.2# port installed | fgrep bzr
  bzr @2.6.0_0 (active)
sh-3.2# bzr --version
Bazaar (bzr) 2.6.0
  Python interpreter: /opt/local/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python 2.7.13
  Python standard library: /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7
  Platform: Darwin-16.3.0-x86_64-i386-64bit
  bzrlib: /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/bzrlib
  Bazaar configuration: /var/root/.bazaar
  Bazaar log file: /var/root/.bzr.log

can you check the Poert file if it contains this fix

Jelmer Vernooij (jelmer) wrote :

This fix is not upstream yet, so it's unlikely to be in the ports file - unless they've cherry-picked it.

Vincent Ladeuil (vila) on 2017-01-15
Changed in bzr:
milestone: none → 2.7.1
Vincent Ladeuil (vila) on 2017-01-16
Changed in bzr:
status: In Progress → Fix Released
Will Thompson (wjt) wrote :

From the patch <http://bazaar.launchpad.net/~bzr-pqm/bzr/2.7/revision/6621>:

# re.finditer get confused if it receives a LazyRegex
if getattr(re, 'finditer', None is not None):
                               ^^^^^^^^^^^^

Surely the comparison should be outside the function call:

if getattr(re, 'finditer', None) is not None:
                                 ^^^^^^^^^^^

In practice this has the same effect: 'None is not None' => 'False' so getattr() returns False if the attr is not found, and something truthy if the attr is found. You could even omit it:

if getattr(re, 'finditer', None):

Vincent Ladeuil (vila) wrote :

@Will, indeed, see bug #1657238 and https://code.launchpad.net/~vila/bzr/1657238-wrong-import/

I was waiting for confirmation there but... nothing happened :-/ So I'll probably go ahead.

Works on my side, thanks.

Vincent Ladeuil (vila) on 2017-02-02
summary: - breaks with python 2.7.12-7
+ "first argument must be string or compiled pattern" with python
+ 2.7.12-7
information type: Public → Private Security
Changed in bzr:
assignee: Vincent Ladeuil (vila) → Francisco Windwillow (peedubs76)
Jelmer Vernooij (jelmer) on 2018-07-03
information type: Private Security → Public
Changed in bzr:
assignee: Francisco Windwillow (peedubs76) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.