py35 unit test subunit.parser failures

Bug #1728640 reported by Eric Harney on 2017-10-30
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Cinder
Critical
Sean McGinnis

Bug Description

I see 4 hits for this error on Oct 22 and 23. It is currently happening on my Fedora 26 environment.

http://logstash.openstack.org/#/dashboard/file/logstash.json?query=(message:%20%5C%22subunit.parser%5C%22%20AND%20message:%20%5C%22FAILED%5C%22)%20AND%20(filename:console.txt%20OR%20filename:job-output.txt)%20AND%20build_status:%20FAILURE

{1} subunit.parser [0.000173s] ... FAILED

Captured Parser Error:
~~~~~~~~~~~~~~~~~~~~~~
    b'Short read - got 166 bytes, wanted 4262 bytes'

This typically ends in an error like this later on:
2017-10-23 20:57:40.884809 | ubuntu-xenial | File "/home/zuul/src/git.openstack.org/openstack/cinder/.tox/py35/lib/python3.5/site-packages/testtools/content.py", line 108, in iter_text
2017-10-23 20:57:40.884888 | ubuntu-xenial | raise ValueError("Not a text type %r" % self.content_type)
2017-10-23 20:57:40.884977 | ubuntu-xenial | ValueError: Not a text type application/octet-stream

Eric Harney (eharney) wrote :

Full tox -e py35 output

tags: added: tests
Sean McGinnis (sean-mcginnis) wrote :

Good, I had this on my list to look in to. Maybe trying to parse something that's compressed?

Eric Harney (eharney) wrote :

This occurred on my Fedora 26 machine just running

$ tox -e py35 -- cinder.tests.unit.api.contrib

Changed in cinder:
assignee: nobody → Sean McGinnis (sean-mcginnis)
status: New → In Progress

Reviewed: https://review.openstack.org/516777
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=57321ad60c82d2113ff8fb6f60dfb85af85cea00
Submitter: Zuul
Branch: master

commit 57321ad60c82d2113ff8fb6f60dfb85af85cea00
Author: Sean McGinnis <email address hidden>
Date: Tue Oct 31 15:05:12 2017 -0500

    Make test logging setup fixture disable future setup

    Our logging fixture sets up logging in a way that we need to capture
    things during tests. However, some of our tests do things like call
    back into main functions, which then call logging setup again, and
    unwind everything we're doing (including debug message testing).

    This patches out oslo_log setup after the fixture runs, thus ignoring
    calls to it in the future.

    This change originally implemented in Nova with commit
    dc2f4f88ac8d7bcf904908b4683522cd6839ba70.

    Closes-bug: #1728640
    Change-Id: I3f95fd3b83d6b5c5801d3a16a3e73623ed49da29

Changed in cinder:
status: In Progress → Fix Released
Eric Harney (eharney) wrote :

The above fix does not seem to have fully resolved this issue, only made it less common.

Matthew Treinish (treinish) wrote :

So we've been working on debugging this some face to face here at the summit and from what I can tell this being caused by something emitted in the logging output from cinder during the unit test execution is causing the subunit parser to error out on the master stestr process as it's processing the results from the test workers. I haven't been able to track down exactly what or where the specific cause is yet though. But, I haven't been able to get the failure mode when I commented out the logging fixtures at: https://github.com/openstack/cinder/blob/master/cinder/test.py#L151-L152

Changed in cinder:
status: Fix Released → In Progress
Matthew Treinish (treinish) wrote :

My other thought here is that we may be exceeding the max attachment size in subunit with the cinder logs. There are a lot of spurious warnings and other stuff being logged (with IMO no real value, like the stevedore warnings) I wonder if we significantly reduce the logging output from the unit tests if it will correct this issue.

But, I've not been able to confirm this is the issue yet.

Sean McGinnis (sean-mcginnis) wrote :

Not sure the total size of the log is the issues, but maybe the size of the output of individual tests. I have a few patches out there getting rid of some of the crap logs, but no silver bullet yet.

Changed in cinder:
importance: Undecided → Critical

Reviewed: https://review.openstack.org/518452
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=ab78a407f28837da2c674f750e91b2ae3def82c9
Submitter: Zuul
Branch: master

commit ab78a407f28837da2c674f750e91b2ae3def82c9
Author: Sean McGinnis <email address hidden>
Date: Tue Nov 7 23:41:28 2017 -0600

    Change default test log level

    Despite comments stating otherwise, we set our root logging level
    to DEBUG. This causes some unwanted things to be logged to the
    console and large test artifacts.

    There is also some suspicion that this is causing test failures
    with some kind of timing change with stestr causing subunit
    parsing failures.

    This changes the default level to be higher and fixes a few test
    cases that had expected the previous default.

    Change-Id: Ib7e45915898347549c12bc8276df556b272bb259
    Closes-bug: #1728640

Changed in cinder:
status: In Progress → Fix Released
Changed in cinder:
status: Fix Released → Confirmed

Fix proposed to branch: master
Review: https://review.openstack.org/521192

Changed in cinder:
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/521192
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=402ded572da14b8f810fed32e35fa715a664cefb
Submitter: Zuul
Branch: master

commit 402ded572da14b8f810fed32e35fa715a664cefb
Author: Sean McGinnis <email address hidden>
Date: Fri Nov 17 14:56:03 2017 -0600

    Remove ExceptionTestCase

    Since the switch to stestr, it appears any tests that output a large
    amount of logging can cause subparser errors. One of the tests that
    seems to occasionally trigger this is in test_exceptions where it
    loads all classes from cinder/exception.py and asserts that attempting
    to raise the class results in it being raised.

    Many of the exceptions require message formatting values passed in.
    The base CinderException handles this, but it also logs when this is
    the case. So looping through all the exceptions results in a lot of
    log messages for these missing format values.

    Since all this does is assert that raising the exception raises the
    exception, there is little value to this test. Arguably the only thing
    it does is enforce that non-exception classes can't be added to
    exception.py as those would raise a TypeError rather than the expected
    exception, but that does not seem like something we should care about
    enforcing. At least not in a unit test.

    Change-Id: I56c37bb2f5fa20356341e5736745117806463644
    Partial-bug: #1728640

Change abandoned by Sean McGinnis (<email address hidden>) on branch: master
Review: https://review.openstack.org/517712
Reason: Included in I9272f71e0e68268ad9f558ddd1e1183e3ea69806 since all changes appear necessary to get the py35 UTs to pass.

Reviewed: https://review.openstack.org/521555
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=d0fab07245fec2629ffe11e81509037f96c98ff1
Submitter: Zuul
Branch: master

commit d0fab07245fec2629ffe11e81509037f96c98ff1
Author: Sean McGinnis <email address hidden>
Date: Mon Nov 20 08:57:40 2017 -0600

    Suppress UT log messages

    Large logger output during unit test run is causing subunit.parser
    failures. Most of this output is unnecessary for unit tests, so we
    should just suppress this logging.

    Also switching oslo.context args from user and tenant to user_id
    and project_id to get rid of deprecation warnings and move to new
    expected attributes. The minimum version of oslo.context has been
    raised to ensure we have a version with the renamed kwargs, so we
    are safe changing this. Originally planned to change as its own
    change in Iecee4cbd57ac319a02f9bfdf24e3742dbbcb3950, but it causes
    enough log output in the unit tests that failures are preventing
    either patch from going through without the other included.

    Change-Id: I9272f71e0e68268ad9f558ddd1e1183e3ea69806
    Partial-bug: #1728640

This issue was fixed in the openstack/cinder 12.0.0.0b2 development milestone.

Sean McGinnis (sean-mcginnis) wrote :

After the various cleanup, we haven't seen this for some time now. I think as far as what we have control over on the Cinder side, we have done as much as we can.

I do think there is still a latent bug here in the stestr code. If anything else is introduced that causes a lot of logging output during a test run, we are going to hit this issue again. Unless or until the root cause is identified and fixed, we will just have to address that logging output if it happens.

Changed in cinder:
status: In Progress → Fix Released
Adam Spiers (adam.spiers) wrote :

See also bug #1813147 which is the same issue with nova.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers