Ubuntu Cloud Archive

Bug #1804062
Comment #14

Comment 14 for bug 1804062

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-10: Fix merged to nova (stable/queens)

#14

Reviewed: https://review.opendev.org/665790
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=dc2963d2a0002d4ddacf50b1f80c470d5de7ec61
Submitter: Zuul
Branch: stable/queens

commit dc2963d2a0002d4ddacf50b1f80c470d5de7ec61
Author: Stephen Finucane <email address hidden>
Date: Wed Jun 12 15:10:59 2019 +0100

Fix double word hacking test

At present, 'pycodestyle' feeds the following string into the 'tokenizer'
library:

["'This is the the best comment'"]

(note the added quotes because this isn't valid Python otherwise)

On previous versions of Python, this tokenizer would parse the string like so:

(3, "'This is the the best comment'", (1, 0), (1, 30), "'This is the the best comment'")
(0, '', (2, 0), (2, 0), '')

where (3 = 'STRING', 0 = 'ENDMARKER')

However, with the fix [1] backported to recent versions of Python, this now
resolves to:

      (3, "'This is the the best comment'", (1, 0), (1, 30), "'This is the the best comment'")
      (4, '', (1, 30), (1, 31), '')
      (0, '', (2, 0), (2, 0), '')

where (3 = 'STRING', 4 = 'NEWLINE', 0 = 'ENDMARKER')

Typically, 'pycodestyle' will run physical line checks on each line as it
parses the token:

https://github.com/PyCQA/pycodestyle/blob/2.5.0/pycodestyle.py#L2036

    For the former case above, the line doesn't include a newline which
    means we never parse a 'NEWLINE' token with a logical line (the fifth
    element of the token tuple) corresponding to our full line. This means
    we don't here but that wasn't an issue previously since there's a
    fallthrough case that handled tokens remaining at the end of the parse:

https://github.com/PyCQA/pycodestyle/blob/2.5.0/pycodestyle.py#L2114-L2116

    Unfortunately, because we now have an additional newline character to
    parse, one that's on a separate line to our test string no less, we run
    logical checks on it:

https://github.com/PyCQA/pycodestyle/blob/2.5.0/pycodestyle.py#L2105-L2107

This is an issue since the logical check wipes stored tokens meaning
we've nothing to check when we get to the fallthrough case:

https://github.com/PyCQA/pycodestyle/blob/2.5.0/pycodestyle.py#L2012

    This fixes changes things so that a newline is included (and also adds
    quotes so it's valid Python, but that's mostly unrelated). This means we
    end up with the following instead:

["'This is the the best comment'\n"]

On both Python without the bugfix and with it, this parses as:

      (3, "'This is the the best comment'", (1, 0), (1, 30), "'This is the the best comment'\n")
      (4, '\n', (1, 30), (1, 31), "'This is the the best comment'\n")
      (0, '', (2, 0), (2, 0), '')

where (3 = 'STRING', 4 = 'NEWLINE', 0 = 'ENDMARKER')

Which triggers things in 'pycodestyle' correctly.

https://github.com/PyCQA/pycodestyle/blob/2.5.0/pycodestyle.py#L2044-L2046

    This isn't _really_ a fix since there's clearly still a bug in either
    'pycodestyle' or Python (I think the latter, since it's adding a newline
    to a file that explicitly doesn't have one), but the chances of us
    hitting this bug in practice are rather low - you'd need to make a
    mistake on the very last line of a file without a newline at the end
    which is something Vim, for example, won't even let you do without
    setting special flags - and therefore it can be reasonably ignored.

[1] https://bugs.python.org/issue33899

    Change-Id: Ia597594e0469c0e83d7ad22b0678390aaebaffe7
    Signed-off-by: Stephen Finucane <email address hidden>
    Closes-Bug: #1804062
    (cherry picked from f545a25cc443c41dcd9bdd028064c28b53f56037)
    (cherry picked from commit 0cb6106b83c33bded9e6cdec7737964c36be8de5)
    (cherry picked from commit 4858074c89838eadeb9eaf9f39917e9fb90acd93)

Reviewed:  https://review.opendev.org/665790
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=dc2963d2a0002d4ddacf50b1f80c470d5de7ec61
Submitter: Zuul
Branch:    stable/queens

commit dc2963d2a0002d4ddacf50b1f80c470d5de7ec61
Author: Stephen Finucane <sfinucan@redhat.com>
Date:   Wed Jun 12 15:10:59 2019 +0100

Fix double word hacking test
    
    At present, 'pycodestyle' feeds the following string into the 'tokenizer'
    library:
    
        ["'This is the the best comment'"]
    
    (note the added quotes because this isn't valid Python otherwise)
    
    On previous versions of Python, this tokenizer would parse the string like so:
    
      (3, "'This is the the best comment'", (1, 0), (1, 30), "'This is the the best comment'")
      (0, '', (2, 0), (2, 0), '')
    
    where (3 = 'STRING', 0 = 'ENDMARKER')
    
    However, with the fix [1] backported to recent versions of Python, this now
    resolves to:
    
      (3, "'This is the the best comment'", (1, 0), (1, 30), "'This is the the best comment'")
      (4, '', (1, 30), (1, 31), '')
      (0, '', (2, 0), (2, 0), '')
    
    where (3 = 'STRING', 4 = 'NEWLINE', 0 = 'ENDMARKER')
    
    Typically, 'pycodestyle' will run physical line checks on each line as it
    parses the token:
    
      https://github.com/PyCQA/pycodestyle/blob/2.5.0/pycodestyle.py#L2036
    
    For the former case above, the line doesn't include a newline which
    means we never parse a 'NEWLINE' token with a logical line (the fifth
    element of the token tuple) corresponding to our full line. This means
    we don't here but that wasn't an issue previously since there's a
    fallthrough case that handled tokens remaining at the end of the parse:
    
      https://github.com/PyCQA/pycodestyle/blob/2.5.0/pycodestyle.py#L2114-L2116
    
    Unfortunately, because we now have an additional newline character to
    parse, one that's on a separate line to our test string no less, we run
    logical checks on it:
    
      https://github.com/PyCQA/pycodestyle/blob/2.5.0/pycodestyle.py#L2105-L2107
    
    This is an issue since the logical check wipes stored tokens meaning
    we've nothing to check when we get to the fallthrough case:
    
      https://github.com/PyCQA/pycodestyle/blob/2.5.0/pycodestyle.py#L2012
    
    This fixes changes things so that a newline is included (and also adds
    quotes so it's valid Python, but that's mostly unrelated). This means we
    end up with the following instead:
    
      ["'This is the the best comment'\n"]
    
    On both Python without the bugfix and with it, this parses as:
    
      (3, "'This is the the best comment'", (1, 0), (1, 30), "'This is the the best comment'\n")
      (4, '\n', (1, 30), (1, 31), "'This is the the best comment'\n")
      (0, '', (2, 0), (2, 0), '')
    
    where (3 = 'STRING', 4 = 'NEWLINE', 0 = 'ENDMARKER')
    
    Which triggers things in 'pycodestyle' correctly.
    
      https://github.com/PyCQA/pycodestyle/blob/2.5.0/pycodestyle.py#L2044-L2046
    
    This isn't _really_ a fix since there's clearly still a bug in either
    'pycodestyle' or Python (I think the latter, since it's adding a newline
    to a file that explicitly doesn't have one), but the chances of us
    hitting this bug in practice are rather low - you'd need to make a
    mistake on the very last line of a file without a newline at the end
    which is something Vim, for example, won't even let you do without
    setting special flags - and therefore it can be reasonably ignored.
    
    [1] https://bugs.python.org/issue33899
    
    Change-Id: Ia597594e0469c0e83d7ad22b0678390aaebaffe7
    Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
    Closes-Bug: #1804062
    (cherry picked from f545a25cc443c41dcd9bdd028064c28b53f56037)
    (cherry picked from commit 0cb6106b83c33bded9e6cdec7737964c36be8de5)
    (cherry picked from commit 4858074c89838eadeb9eaf9f39917e9fb90acd93)