undefined name 'unicode' in "wrong" sys.version_info branch

Bug #1585991 reported by Christian Boltz
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Pyflakes
New
Undecided
Unassigned

Bug Description

pyflakes makes it hard to check code that works on python 2 and 3 because some names are only defined in one version.

For example, take this little module:

import sys

def type_is_str(var):
    ''' returns True if the given variable is a str (or unicode string when using python 2)'''
    if type(var) == str:
        return True
    elif sys.version_info[0] < 3 and type(var) == unicode: # python 2 also uses the 'unicode' type
        return True
    else:
        return False

pyflakes3 errors out with
    undefined name 'unicode'
while pyflakes2 validates this code successfully (because unicode is a valid name in py2).

Can you make pyflakes a bit more intelligent so that it skips branches for the "wrong" version? Or, if that is too hard, add a way to skip a line marked with a special comment like
    # PYFLAKES:IGNORE
?

(The py3 testing was done with python3-pyflakes-1.2.3.)

Revision history for this message
asmeurer (asmeurer) wrote :

For what it's worth, your 'type_is_str" is wrong way to check if something is a string. You shouldn't use "type(x) == T". You should instead use "isinstance(x, T)". The more common way to do this is to define

if PY3:
    string_types = (str,)
else:
    string_types = (str, unicode) # You can also use basestring

and then check "isinstance(var, string_types)" in your code.

As to the issue itself, Pyflakes would have to become significantly smarter to be able to parse all the different ways that people check for Python 3. It would need to be able to parse "sys.version_info[0] < 3", and similar constructs in full enough generality to catch every way that people write it, but in a static way. I'm not saying it can't be done, but Pyflakes would have to be about 100x smarter than it is now to be able to handle this sort of thing. I recommend factoring all your Python 2/3 logic into a single file and then don't bother testing that file with pyflakes. Or use a library like six or future.

Revision history for this message
John Vandenberg (jayvdb) wrote :

In most cases you can do the following as a workaround for this pyflakes issue:

if PY3:
    unicode = str

Then unicode is a valid name, even if it isnt used.

IMO if pyflakes was to fix this, it should allow the project to define PYxx names (e.g. PY3) with an environment marker somewhere (tox.in), and then pyflakes can look for those simple names in the code being analyised.

Revision history for this message
asmeurer (asmeurer) wrote :

I'm not a fan of that workaround because most Python 2/3 names like that were the result of Python 3 removing a name (unicode, long, xrange, ...), and redefining them in Python 3 leads to a coding style where you write Python 2 code and use compatibility imports to make it work in Python 3. But I prefer to write Python 3 code and use compatibility imports to make it work in Python 2.

Revision history for this message
John Vandenberg (jayvdb) wrote :

Agreed, but it can work, especially for projects where that is the reality: used primarily on Python 2 for a decade, and Python 3 is supported but is second class.

I find comments in code distasteful in general, but necessary in practise, but I am very opposed to 'ígnore' style comments in pyflakes (use flake8 & flake8-putty if you want that).

What would be nice is a way to declare environment differences where they are needed via comments or another noop, with environment markers to safe guard.

I.e. for the problem in this ticket, inject builtins into the inner most scope at the point in the scope where it is needed only. E.g. (untested)

  import sys
  PY3 = sys.version_info[0] == 3

  def foo(a):
    if PY3:
      return isinstance(a, str)
    else:
      # pyflakes builtin: unicode, basestring, WindowsError; python_version < '3' and sys_platform == 'win32'
      return isinstance(a, basestring)

Then pyflakes can:
1. When that environment marker is True, verify all those builtins exist
2. When that environment marker is False, inject any missing builtins

That way, provided pyflakes is used on all relevant platforms to achieve 100% coverage of these platform specific branches, the declarative comment must be maintained or pyflakes will barf, and the builtins will exist.

The remaining weakness of that is when someone adds code below that if statement, and then uses ùnicode in code which executes in all environments -- pyflakes will not notice.

To provide a fool-proof solution, the injected builtins could only be registered at the top of a function, and the recommended pattern is to have alternative functions for each distinct environment.

i.e.

def _py2_foo(a):
  # pyflakes builtins: ... ; ...
  ...

def _py3_foo(a):
  # pyflakes builtins: ... ; ...
  ...

if PY2:
  foo = _py2_foo
else:
  foo = _py3_foo

Revision history for this message
John Vandenberg (jayvdb) wrote :

(...Comments in code *to assist tools*...)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.