p35 jobs are failing with subunit.parser ... FAILED

Bug #1813147 reported by Balazs Gibizer
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Glance
Fix Released
Medium
Unassigned
OpenStack Compute (nova)
Fix Released
High
Stephen Finucane

Bug Description

Various py35 jobs[1] are failing recently with:

2019-01-22 00:10:44.708744 | ubuntu-xenial | {0} subunit.parser [0.000360s] ... FAILED
2019-01-22 00:10:44.708947 | ubuntu-xenial |
2019-01-22 00:10:44.709000 | ubuntu-xenial | Captured Parser Error:
2019-01-22 00:10:44.709057 | ubuntu-xenial | ~~~~~~~~~~~~~~~~~~~~~~
2019-01-22 00:10:44.709122 | ubuntu-xenial | b'Short read - got 167 bytes, wanted 4263 bytes'

It seems that when a job fails, the log of the test case that triggered the failure is truncated. Also it feels that it is caused by extensive logging.

There was a past cinder bug[2] that was marked fixed after couple of logging reduction.

There was 11 failures in the last 24 hours[3]

[1] http://logs.openstack.org/58/632158/1/check/cross-nova-py35/afe0282/job-output.txt#_2019-01-22_00_10_44_708744
[2] https://bugs.launchpad.net/cinder/+bug/1728640
[3] http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22subunit.parser%5C%22%20AND%20message:%5C%22FAILED%5C%22%20AND%20voting:1&from=864000s

tags: added: testing
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Added the signature to elastic-recheck https://review.openstack.org/632977

Revision history for this message
Matt Riedemann (mriedem) wrote :

I think I first noticed it on this change which merged on Jan 18:

https://review.openstack.org/#/c/568999/

I don't know if they are related.

Changed in nova:
status: New → Confirmed
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

In about half of the cases the "Captured Parser Error" is missing from the logs, but other than that the symptoms are the same. Execution is aborted and the output of the test case triggered the error is truncated.

Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → High
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

The

    b'Short read - got 167 bytes, wanted 4263 bytes'

error message comes from here [0].

However recent failure[1][2][3] shows a different message:

    Not a text type application/octet-stream

This message comes from [4].

All of this points towards that something strange is printed to the test output that confuses ourt tooling.

I've tried to simply recreate the problem by logging extensively from a test case but that leads to a different error:

     Length too long: 16814020

[0] https://github.com/testing-cabal/subunit/blob/58c3d364266e7550d3e00e34191eed7df2d36d13/python/subunit/v2.py#L418
[1] http://logs.openstack.org/17/573317/48/check/openstack-tox-lower-constraints/08b08c1/job-output.txt
[2] http://logs.openstack.org/28/619528/12/check/openstack-tox-lower-constraints/1d86436/job-output.txt
[3] http://logs.openstack.org/23/630723/3/check/nova-tox-functional-py35/560139d/job-output.txt
[4] https://github.com/testing-cabal/testtools/blob/f865cfefae73e540aedb16ebb19f813cc648f2b6/testtools/content.py#L108

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Cinder hit "Not a text type application/octet-stream" pretty consistently after switching to stestr. The trigger for it seemed to be any time a test had a very large amount of stderr output. Trienish and I tried to track down exactly where that was causing an issue, but unfortunately were not able to find what was causing the issue. It seemed to be something with the py3 handling of the output stream.

We "fixed" it in Cinder by cleaning up the amount of output generated from unit tests. Once I got rid of warning output and unnecessary logging to the point that there wasn't a lot of stdout/stderr generated, we stopped hitting whatever threshold was causing the error to surface.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Yeah that's a good point Sean. We have a lot of warnings in the py35 console output logs:

2019-01-22 00:07:14.688023 | ubuntu-xenial | Captured stderr:
2019-01-22 00:07:14.688079 | ubuntu-xenial | ~~~~~~~~~~~~~~~~
2019-01-22 00:07:14.688266 | ubuntu-xenial | b'/home/zuul/src/git.openstack.org/openstack/nova/.tox/py35/lib/python3.5/site-packages/migrate/versioning/script/py.py:144: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead'
2019-01-22 00:07:14.688383 | ubuntu-xenial | b' if not inspect.getargspec(script_func)[0]:'

2019-01-22 00:08:52.909079 | ubuntu-xenial | b'/home/zuul/src/git.openstack.org/openstack/nova/.tox/py35/lib/python3.5/site-packages/oslo_service/threadgroup.py:193: DeprecationWarning: Calling add_timer() with arguments to the callback function is deprecated. Use add_timer_args() instead.'

2019-01-22 00:09:48.658271 | ubuntu-xenial | Captured stderr:
2019-01-22 00:09:48.658308 | ubuntu-xenial | ~~~~~~~~~~~~~~~~
2019-01-22 00:09:48.658492 | ubuntu-xenial | b'/home/zuul/src/git.openstack.org/openstack/nova/.tox/py35/lib/python3.5/site-packages/rfc3986/uri.py:235: DeprecationWarning: Please use rfc3986.validators.Validator instead. This method will be eventually removed.'
2019-01-22 00:09:48.658577 | ubuntu-xenial | b' DeprecationWarning)'
2019-01-22 00:09:48.658845 | ubuntu-xenial | b"/home/zuul/src/git.openstack.org/openstack/nova/.tox/py35/lib/python3.5/site-packages/sqlalchemy/orm/evaluator.py:99: SAWarning: Evaluating non-mapped column expression 'id' onto ORM instances; this is a deprecated use case. Please make use of the actual mapped columns in ORM-evaluated UPDATE / DELETE expressions."
2019-01-22 00:09:48.658912 | ubuntu-xenial | b' "UPDATE / DELETE expressions." % clause'
2019-01-22 00:09:48.659175 | ubuntu-xenial | b"/home/zuul/src/git.openstack.org/openstack/nova/.tox/py35/lib/python3.5/site-packages/sqlalchemy/orm/evaluator.py:99: SAWarning: Evaluating non-mapped column expression 'updated_at' onto ORM instances; this is a deprecated use case. Please make use of the actual mapped columns in ORM-evaluated UPDATE / DELETE expressions."
2019-01-22 00:09:48.659242 | ubuntu-xenial | b' "UPDATE / DELETE expressions." % clause'
2019-01-22 00:09:48.659271 | ubuntu-xenial | b''

That SAWarning is probably the worst at 866 occurrences.

Revision history for this message
Matt Riedemann (mriedem) wrote :

https://review.openstack.org/#/c/625125/ could help glance if someone can update it.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (3.6 KiB)

Here is an example of where that SAWarning comes from:

nova.tests.unit.api.openstack.compute.test_hosts.HostTestCaseV21.test_show_works_correctly
------------------------------------------------------------------------------------------

Captured pythonlogging:
~~~~~~~~~~~~~~~~~~~~~~~
    b'2019-01-31 17:48:15,014 INFO [248_add_expire_reservations_index] Skipped adding reservations_deleted_expire_idx because an equivalent index already exists.'
    b''

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    b'Traceback (most recent call last):'
    b' File "/home/osboxes/git/nova/nova/tests/unit/api/openstack/compute/test_hosts.py", line 410, in test_show_works_correctly'
    b" db.instance_destroy(ctxt, i_ref1['uuid'])"
    b' File "/home/osboxes/git/nova/nova/db/api.py", line 753, in instance_destroy'
    b' return IMPL.instance_destroy(context, instance_uuid, constraint)'
    b' File "/home/osboxes/git/nova/nova/db/sqlalchemy/api.py", line 170, in wrapper'
    b' return f(*args, **kwargs)'
    b' File "/home/osboxes/git/nova/.tox/py35/lib/python3.5/site-packages/oslo_db/api.py", line 154, in wrapper'
    b' ectxt.value = e.inner_exc'
    b' File "/home/osboxes/git/nova/.tox/py35/lib/python3.5/site-packages/oslo_utils/excutils.py", line 220, in __exit__'
    b' self.force_reraise()'
    b' File "/home/osboxes/git/nova/.tox/py35/lib/python3.5/site-packages/oslo_utils/excutils.py", line 196, in force_reraise'
    b' six.reraise(self.type_, self.value, self.tb)'
    b' File "/home/osboxes/git/nova/.tox/py35/lib/python3.5/site-packages/six.py", line 693, in reraise'
    b' raise value'
    b' File "/home/osboxes/git/nova/.tox/py35/lib/python3.5/site-packages/oslo_db/api.py", line 142, in wrapper'
    b' return f(*args, **kwargs)'
    b' File "/home/osboxes/git/nova/nova/db/sqlalchemy/api.py", line 227, in wrapped'
    b' return f(context, *args, **kwargs)'
    b' File "/home/osboxes/git/nova/nova/db/sqlalchemy/api.py", line 1797, in instance_destroy'
    b' count = query.soft_delete()'
    b' File "/home/osboxes/git/nova/.tox/py35/lib/python3.5/site-packages/oslo_db/sqlalchemy/orm.py", line 32, in soft_delete'
    b' synchronize_session=synchronize_session)'
    b' File "/home/osboxes/git/nova/.tox/py35/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 3486, in update'
    b' update_op.exec_()'
    b' File "/home/osboxes/git/nova/.tox/py35/lib/python3.5/site-packages/sqlalchemy/orm/persistence.py", line 1334, in exec_'
    b' self._do_pre_synchronize()'
    b' File "/home/osboxes/git/nova/.tox/py35/lib/python3.5/site-packages/sqlalchemy/orm/persistence.py", line 1405, in _do_pre_synchronize'
    b' self._additional_evaluators(evaluator_compiler)'
    b' File "/home/osboxes/git/nova/.tox/py35/lib/python3.5/site-packages/sqlalchemy/orm/persistence.py", line 1543, in _additional_evaluators'
    b' expression._literal_as_binds(value))'
    b' File "/home/osboxes/git/nova/.tox/py35/lib/python3.5/site-packages/sqlalchemy/orm/evaluator.py", line 39, in process'
    b' return meth(clause)'
    b' File "/home/osboxes/git/nova/.tox/py35/lib/python3.5/site-packages/sqlalchemy/orm/evaluator.py...

Read more...

Revision history for this message
Matt Riedemann (mriedem) wrote :

I opened bug 1814182 against oslo.db in case it's something in the oslo.db code, but it could be a problem in how nova's data models are done, I'm not sure, need help from Mike Bayer.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Reported sqlalchemy-migrate bug 1814288 for the inspect.getargspec warnings generated from that library.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/634450

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/634451

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/634450
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1fa2e9c3a0afcbaeecbb9e336a4711c2e9add189
Submitter: Zuul
Branch: master

commit 1fa2e9c3a0afcbaeecbb9e336a4711c2e9add189
Author: Matt Riedemann <email address hidden>
Date: Fri Feb 1 11:34:45 2019 -0500

    Ignore SAWarnings for "Evaluating non-mapped column expression"

    These warnings come from oslo.db code (tracked with bug 1814199)
    so there isn't much nova can do about that right now, outside of
    monkey patching oslo.db which is a bad idea.

    Let's ignore the warning until the bug in oslo.db is fixed to
    avoid blowing up our unit/functional test console output logs
    which in turn is intermittently triggering subunit.parser failures.

    Change-Id: Ifdfeadae8b348d788de2cd665544015366271d66
    Related-Bug: #1813147

Revision history for this message
Matt Riedemann (mriedem) wrote :

This is probably the next one to squash:

2019-02-04 16:58:37.674764 | ubuntu-bionic | Captured stderr:
2019-02-04 16:58:37.674800 | ubuntu-bionic | ~~~~~~~~~~~~~~~~
2019-02-04 16:58:37.675041 | ubuntu-bionic | b'/home/zuul/src/git.openstack.org/openstack/nova/.tox/py36/lib/python3.6/site-packages/oslo_service/threadgroup.py:193: DeprecationWarning: Calling add_timer() with arguments to the callback function is deprecated. Use add_timer_args() instead.'
2019-02-04 16:58:37.675100 | ubuntu-bionic | b' DeprecationWarning)'
2019-02-04 16:58:37.675135 | ubuntu-bionic | b''

39 occurrences in a py35 unit test job today.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/634928

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/634928
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cd1823195fac3d18aa58a45c819e6e97151b0f23
Submitter: Zuul
Branch: master

commit cd1823195fac3d18aa58a45c819e6e97151b0f23
Author: Jan Gutter <email address hidden>
Date: Tue Feb 5 14:51:49 2019 +0200

    Ignore some PendingDeprecationWarnings for os-vif

    os-vif got bumped to 1.14.0 in https://review.openstack.org/572082 and
    Nova is touching the VIFPortProfileOVSRepresentor class in a supported
    fashion until at least Train.

    However, that commit contributes to triggering bug 1813147 in check and
    gate. The warning is safe to silence and should be converted to an error
    in Train (once the older path is removed).

    Change-Id: I9d8175c78f068cf9bdad4aeba0941f6ff49a9acc
    Signed-off-by: Jan Gutter <email address hidden>
    Related-Bug: #1813147
    blueprint: generic-os-vif-offloads

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looks like this is our biggest issue in the functional py35 job now:

2019-02-06 21:04:13.978116 | ubuntu-xenial | b'/home/zuul/src/git.openstack.org/openstack/nova/.tox/functional-py35/lib/python3.5/site-packages/oslo_service/threadgroup.py:193: DeprecationWarning: Calling add_timer() with arguments to the callback function is deprecated. Use add_timer_args() instead.'

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/635516

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/634451
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d916a045c19beb0d058fa04e387bc0c4097d5990
Submitter: Zuul
Branch: master

commit d916a045c19beb0d058fa04e387bc0c4097d5990
Author: Matt Riedemann <email address hidden>
Date: Fri Feb 1 11:45:49 2019 -0500

    Ignore sqla-migrate inspect.getargspec deprecation warnings

    This adds a warnings filter to ignore the "inspect.getargspec"
    warnings from sqlalchemy-migrate until bug 1814288 is fixed
    in that library.

    Change-Id: Id8c02e745742980d3dd0a24e6e2b7a5859f3cb88
    Related-Bug: #1813147

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/635516
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c03cc26ee04734b610d13c694cc8b12ad554b4aa
Submitter: Zuul
Branch: master

commit c03cc26ee04734b610d13c694cc8b12ad554b4aa
Author: Matt Riedemann <email address hidden>
Date: Thu Feb 7 09:04:25 2019 -0500

    Fix deprecation warning for threadgroup.add_timer

    This resolves the following deprecation warning:

    b'/home/zuul/src/git.openstack.org/openstack/nova/.tox/functional-py35/
    lib/python3.5/site-packages/oslo_service/threadgroup.py:193:
    DeprecationWarning: Calling add_timer() with arguments to the callback
    function is deprecated. Use add_timer_args() instead.'

    The add_timer_args method was added in 1.34.0:

      Ib2791342263e2b88c045bcc92adc8160f57a0ed6

    So the required version of oslo.service is also updated.

    Change-Id: Id54226dc926839686906d04ecf8d791c0881f82a
    Partial-Bug: #1813147

Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (3.2 KiB)

Hit another failure in a unit test job that failed and dumped a ton into the console log:

http://logs.openstack.org/27/619527/16/gate/openstack-tox-lower-constraints/cd2889d/job-output.txt.gz#_2019-02-08_12_32_26_095767

I'm not exactly sure what is causing it, but there is a ton of this in the logs:

2019-02-08 12:32:26.104934 | ubuntu-xenial | b'2019-02-08 12:32:17,445 ERROR [oslo_messaging.rpc.server] Exception during message handling'
2019-02-08 12:32:26.105103 | ubuntu-xenial | b'Traceback (most recent call last):'
2019-02-08 12:32:26.105596 | ubuntu-xenial | b' File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/lower-constraints/lib/python3.5/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming'
2019-02-08 12:32:26.105775 | ubuntu-xenial | b' res = self.dispatcher.dispatch(message)'
2019-02-08 12:32:26.106238 | ubuntu-xenial | b' File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/lower-constraints/lib/python3.5/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch'
2019-02-08 12:32:26.106450 | ubuntu-xenial | b' return self._do_dispatch(endpoint, method, ctxt, args)'
2019-02-08 12:32:26.106919 | ubuntu-xenial | b' File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/lower-constraints/lib/python3.5/site-packages/oslo_messaging/rpc/dispatcher.py", line 189, in _do_dispatch'
2019-02-08 12:32:26.107114 | ubuntu-xenial | b' ctxt = self.serializer.deserialize_context(ctxt)'
2019-02-08 12:32:26.107452 | ubuntu-xenial | b' File "/home/zuul/src/git.openstack.org/openstack/nova/nova/tests/fixtures.py", line 392, in deserialize_context'
2019-02-08 12:32:26.107689 | ubuntu-xenial | b' ctxt = super(CheatingSerializer, self).deserialize_context(values)'
2019-02-08 12:32:26.108147 | ubuntu-xenial | b' File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/lower-constraints/lib/python3.5/site-packages/fixtures/_fixtures/monkeypatch.py", line 89, in avoid_get'
2019-02-08 12:32:26.108323 | ubuntu-xenial | b' return captured_method(*args, **kwargs)'
2019-02-08 12:32:26.108727 | ubuntu-xenial | b' File "/home/zuul/src/git.openstack.org/openstack/nova/nova/tests/unit/conductor/test_conductor.py", line 124, in fake_deserialize_context'
2019-02-08 12:32:26.108959 | ubuntu-xenial | b" self.assertEqual(self.context.user_id, ctxt_dict['user_id'])"
2019-02-08 12:32:26.109406 | ubuntu-xenial | b' File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/lower-constraints/lib/python3.5/site-packages/testtools/testcase.py", line 411, in assertEqual'
2019-02-08 12:32:26.121688 | ubuntu-xenial | b' self.assertThat(observed, matcher, message)'
2019-02-08 12:32:26.122010 | ubuntu-xenial | b' File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/lower-constraints/lib/python3.5/site-packages/testtools/testcase.py", line 498, in assertThat'
2019-02-08 12:32:26.122098 | ubuntu-xenial | b' raise mismatch_error'
2019-02-08 12:32:26.122297 | ubuntu-xenial | b"testtools.matchers._impl.MismatchError: '5fae60f5cf4642609ddd31f71748beac' != 'fake_user'"

It seems that those assertions don't actually caus...

Read more...

Revision history for this message
Matt Riedemann (mriedem) wrote :

That MismatchError in the fake_deserialize_context just gets swallowed and logged in oslo.messaging:

https://github.com/openstack/oslo.messaging/blob/9.4.0/oslo_messaging/rpc/server.py#L175

So there is not really a point in having that stubbed out just to assert something that doesn't matter to the test anyway.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/635859

Revision history for this message
Matt Riedemann (mriedem) wrote :

Here is another one:

http://logs.openstack.org/27/619527/16/check/openstack-tox-py35/ab8a233/job-output.txt.gz#_2019-02-08_14_57_03_415588

oslo.versionedobjects logs info when comparing objects:

2019-02-08 14:57:03.415588 | ubuntu-xenial | Captured pythonlogging:
2019-02-08 14:57:03.415656 | ubuntu-xenial | ~~~~~~~~~~~~~~~~~~~~~~~
2019-02-08 14:57:03.415873 | ubuntu-xenial | b'2019-02-08 14:56:52,203 INFO [oslo_versionedobjects.fixture] testing obj: MonitorMetricList version: 1.0'
2019-02-08 14:57:03.416090 | ubuntu-xenial | b'2019-02-08 14:56:52,204 INFO [oslo_versionedobjects.fixture] testing obj: MonitorMetricList version: 1.1'
2019-02-08 14:57:03.416301 | ubuntu-xenial | b'2019-02-08 14:56:52,204 INFO [oslo_versionedobjects.fixture] testing obj: AggregateList version: 1.0'
2019-02-08 14:57:03.416511 | ubuntu-xenial | b'2019-02-08 14:56:52,204 INFO [oslo_versionedobjects.fixture] testing obj: AggregateList version: 1.1'

So either we need to change that fixture to log at debug and release it, or change logging levels for oslo.versionedobjects to WARNING in our test runs.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Reported bug 1815223 against oslo.versionedobjects for that logging in the fixture.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Another giant traceback in functional test logs:

http://logs.openstack.org/47/635147/11/check/nova-tox-functional-py35/5c3c2a9/job-output.txt.gz#_2019-02-13_13_33_41_103932

That dumps a ton of traceback from:

b'oslo_db.exception.DBNonExistentTable: (sqlite3.OperationalError) no such table: instances

Since there is some sort of reschedule happening in that test, I'm assuming it's coming from:

{2} nova.tests.functional.regressions.test_bug_1784353.TestRescheduleWithVolumesAttached.test_reschedule_with_volume_attached [] ... inprogress

Which was the first test after that traceback. I'm not sure what the 'no such table' is about, maybe the test is tearing down the DB before the actual service code is done running?

It also doesn't help that we're logging all of the os-traits contents at INFO each time:

b"2019-02-13 13:33:37,217 INFO [placement.objects.resource_provider] Synced traits from os_traits into API DB: {'HW_GPU_RESOLUTION_W7680H4320', 'HW_CPU_X86_SGX', 'HW_GPU_MAX_DISPLAY_HEADS_2', 'COMPUTE_VOLUME_ATTACH', 'HW_GPU_RESOLUTION_W800H600', 'HW_NIC_ACCEL_RSA', 'HW_CPU_X86_BMI', '

Revision history for this message
Matt Riedemann (mriedem) wrote :

We might want to think about defaulting OS_STDERR_CAPTURE to 0.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Although if OS_STDERR_CAPTURE=0 then we wouldn't see deprecation warnings in the logs and we want to know those.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/636957

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/636957
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0840bcccc0ae5700dd5c3d22a3bd6f67d1a8e4bb
Submitter: Zuul
Branch: master

commit 0840bcccc0ae5700dd5c3d22a3bd6f67d1a8e4bb
Author: Matt Riedemann <email address hidden>
Date: Thu Feb 14 08:47:53 2019 -0500

    Change sqlalchemy warnings filter to an error

    The warning should be gone with change
    I192e84ce757d12d33085a209dd58d8ea46fb90fb in
    oslo.db 4.44.0 so this changes the warnings
    filter from ignore to error and bumps the minimum
    required version of oslo.db to include that change.

    Change-Id: If7b1a9613b58476fab8409211512613a8863cdde
    Related-Bug: #1813147
    Related-Bug: #1814199

Changed in nova:
assignee: Matt Riedemann (mriedem) → Balazs Gibizer (balazs-gibizer)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/635859
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=102679a1cf80d7a2191acb3ca84c815d4622a3f1
Submitter: Zuul
Branch: master

commit 102679a1cf80d7a2191acb3ca84c815d4622a3f1
Author: Matt Riedemann <email address hidden>
Date: Fri Feb 8 10:16:34 2019 -0500

    Trim fake_deserialize_context in test_conductor

    The assertions on the user_id and project_id in the
    fake_deserialize_context methods do not actually
    cause tests to fail if they raise MismatchError
    because oslo.messaging just swallows the exception
    and logs the traceback. With enough of these getting
    logged it will cause subunit parser failures in the
    console output because the stream is too large.

    This removes the fake method and just changes the
    stub to be a lambda that returns self.context which
    is what fake_deserialize_context was doing minus
    the project_id/user_id assertions.

    Change-Id: I26b201b410aa1d965dc7a6635c11c8b63b457a71
    Partial-Bug: #1813147

Erno Kuvaja (jokke)
Changed in glance:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Thierry Carrez (ttx) wrote :

The issue is back in Glance py37 tests:
https://review.openstack.org/#/c/649395/
https://review.openstack.org/#/c/649596/

It's probably triggered by the "Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working" deprecation warning coming from all those different Python libraries.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/656633

Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (12.7 KiB)

We seem to be hitting this in nova again with a notifications functional test:

http://logs.openstack.org/06/641806/16/check/nova-tox-functional-py36/6ffca82/job-output.txt.gz#_2019-05-02_03_23_47_450578

2019-05-02 03:23:47.450578 | ubuntu-bionic | {0} nova.tests.functional.notification_sample_tests.test_instance.TestInstanceNotificationSample.test_instance_action [] ... inprogress

That test runs a bunch of actions on a server which generates a lot of logs, plus tracebacks during volume attach:

2019-05-02 03:23:47.482528 | ubuntu-bionic | b'2019-05-02 03:23:44,276 INFO [nova.api.openstack.requestlog] 127.0.0.1 "POST /v2.1/6f70656e737461636b20342065766572/servers/e61aff9b-8ce8-4b02-a726-e90c9c5eaa9c/os-volume_attachments" status: 200 len: 207 microversion: 2.72 time: 0.078540'
2019-05-02 03:23:47.482635 | ubuntu-bionic | b'2019-05-02 03:23:44,314 INFO [nova.compute.manager] Attaching volume 227cc671-f30b-4488-96fd-7d0bf13648d8 to /dev/sdc'
2019-05-02 03:23:47.482747 | ubuntu-bionic | b'2019-05-02 03:23:44,355 ERROR [nova.compute.manager] Failed to attach 227cc671-f30b-4488-96fd-7d0bf13648d8 at /dev/sdc'
2019-05-02 03:23:47.482796 | ubuntu-bionic | b'Traceback (most recent call last):'
2019-05-02 03:23:47.482889 | ubuntu-bionic | b' File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 5628, in _attach_volume'
2019-05-02 03:23:47.482932 | ubuntu-bionic | b' do_driver_attach=True)'
2019-05-02 03:23:47.483021 | ubuntu-bionic | b' File "/home/zuul/src/opendev.org/openstack/nova/nova/virt/block_device.py", line 46, in wrapped'
2019-05-02 03:23:47.483081 | ubuntu-bionic | b' ret_val = method(obj, context, *args, **kwargs)'
2019-05-02 03:23:47.483169 | ubuntu-bionic | b' File "/home/zuul/src/opendev.org/openstack/nova/nova/virt/block_device.py", line 651, in attach'
2019-05-02 03:23:47.483217 | ubuntu-bionic | b' virt_driver, do_driver_attach)'
2019-05-02 03:23:47.483308 | ubuntu-bionic | b' File "/home/zuul/src/opendev.org/openstack/nova/nova/virt/block_device.py", line 629, in _do_attach'
2019-05-02 03:23:47.483347 | ubuntu-bionic | b' do_driver_attach)'
2019-05-02 03:23:47.483451 | ubuntu-bionic | b' File "/home/zuul/src/opendev.org/openstack/nova/nova/virt/block_device.py", line 542, in _volume_attach'
2019-05-02 03:23:47.483508 | ubuntu-bionic | b" self['mount_device'])['connection_info']"
2019-05-02 03:23:47.483625 | ubuntu-bionic | b' File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/mock/mock.py", line 1062, in __call__'
2019-05-02 03:23:47.483684 | ubuntu-bionic | b' return _mock_self._mock_call(*args, **kwargs)'
2019-05-02 03:23:47.483800 | ubuntu-bionic | b' File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py36/lib/python3.6/site-packages/mock/mock.py", line 1128, in _mock_call'
2019-05-02 03:23:47.483850 | ubuntu-bionic | b' ret_val = effect(*args, **kwargs)'
2019-05-02 03:23:47.483970 | ubuntu-bionic | b' File "/home/zuul/src/opendev.org/openstack/nova/nova/tests/functional/notification_sample_tests/test_instance.py", line 1773, in attach_volume'...

Revision history for this message
Matt Riedemann (mriedem) wrote :

For the issue in comment 35 I wonder if we could flush the stdio buffer between runs of each sub-action in that test.

Revision history for this message
Matt Riedemann (mriedem) wrote :

So one issue is we're getting a giant traceback because these tests are using the CastAsCall fixture and we're not registering an expected_exception so we get a big traceback from oslo.messaging:

b'2019-05-02 03:23:44,434 ERROR [oslo_messaging.rpc.server] Exception during message handling'

We could (1) stub that out for the test run or (2) convert the tests to not use the CastAsCall fixture. (2) is probably better long-term but might not be very easy.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Actually the test isn't using the CastAsCall fixture so I'm not sure why we're getting the traceback from oslo.messaging since I thought that was only used if you're doing a synchronous RPC call.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/656844

Revision history for this message
Matt Riedemann (mriedem) wrote :

It's the notification listener in oslo.messaging that logs the traceback, that's why it wasn't the CastAsCall thing I was expecting:

https://github.com/openstack/oslo.messaging/blob/558fc5f4e8aa4b8553ee7522d21cc1583e10a849/oslo_messaging/notify/listener.py#L188

So yeah we can't really do much about that since these are notification tests. We could potentially stub out LOG.exception in the specific part of the test that triggers the fault, but that's pretty hacky.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/656633
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=84e64a995eb456772519ba26dc8a98f042a7a44a
Submitter: Zuul
Branch: master

commit 84e64a995eb456772519ba26dc8a98f042a7a44a
Author: Matt Riedemann <email address hidden>
Date: Wed May 1 08:57:39 2019 -0400

    Require at least oslo.versionedobjects>=1.35.0

    This is to pick up change If20663ecad19f18f22172ae489206b42489fd9f6
    for our lower-constraints CI job to avoid blowing up the subunit
    parser with too much log output.

    Change-Id: I3c404bd650521b44faddf97ef0b41953f82c4bd2
    Related-Bug: #1813147

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/656844
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=93027713ab614b59ebb66f0617f85d43c956427c
Submitter: Zuul
Branch: master

commit 93027713ab614b59ebb66f0617f85d43c956427c
Author: Balazs Gibizer <email address hidden>
Date: Thu May 2 12:24:19 2019 -0600

    Reset the stored logs at each notification test steps

    This is an atempt to solve a new appearance of bug 1813147 where the
    long test_instance_action notification sample test case logs too much.
    This patch drops the logs of each successful test step to try to avoid
    the overload of the logging system.

    Change-Id: I0e5db9b2f423ab92ff93b6b3aada7d6b79a7abf6
    Related-Bug: #1813147

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/662965

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/662965
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0956539edf93736ef23169610b08959a69ec0d92
Submitter: Zuul
Branch: stable/stein

commit 0956539edf93736ef23169610b08959a69ec0d92
Author: Balazs Gibizer <email address hidden>
Date: Thu May 2 12:24:19 2019 -0600

    Reset the stored logs at each notification test steps

    This is an atempt to solve a new appearance of bug 1813147 where the
    long test_instance_action notification sample test case logs too much.
    This patch drops the logs of each successful test step to try to avoid
    the overload of the logging system.

    Change-Id: I0e5db9b2f423ab92ff93b6b3aada7d6b79a7abf6
    Related-Bug: #1813147
    (cherry picked from commit 93027713ab614b59ebb66f0617f85d43c956427c)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/682133

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/682140

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/682133
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2973faa07063d2112ef7459ab07cc07e20d74610
Submitter: Zuul
Branch: master

commit 2973faa07063d2112ef7459ab07cc07e20d74610
Author: Matt Riedemann <email address hidden>
Date: Fri Sep 13 14:18:25 2019 -0400

    libvirt: stub logging of host capabilities

    By default we capture the logging output stream at INFO level
    for unit/functional tests and anything that hits the libvirt
    Host.get_capabilities method logs the giant xml host capabilities
    string that is defined in the fakelibvirt module. This can cause
    subunit parsing to fail because there is too much output in the
    buffer.

    This change simply stubs out that particular logging call in the
    FakeLibvirtFixture class which is used extensively in the more
    detailed libvirt tests like LibvirtConnTestCase. If this turns out
    to not be good enough we can try stubbing at a more global level
    like in the base nova TestCase.

    Change-Id: I38b350bce908005161ed7c3eb737f32076828a37
    Related-Bug: #1813147

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/682210

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/682140
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bb2b7cad294bebd6b3bdd6d10043c3c2fb038140
Submitter: Zuul
Branch: master

commit bb2b7cad294bebd6b3bdd6d10043c3c2fb038140
Author: Matt Riedemann <email address hidden>
Date: Fri Sep 13 14:58:49 2019 -0400

    Use SpawnIsSynchronousFixture in reschedule functional tests

    We see that in successful tests that are usually running some
    kind of negative or reschedule scenario that they will dump
    a stacktrace to the subunit output buffer because
    _allocate_network_async will raise an error. Too many of these
    can eventually blow up the subunit output buffer and cause
    tests to fail.

    To guard against this, we use the SpawnIsSynchronousFixture in
    the base class of two known test classes [1][2] that show a
    stacktace with _allocate_network_async in successful runs. As a
    result, one of the tests has to be fixed due to improper stubbing.

    [1] UnsupportedPortResourceRequestBasedSchedulingTest
    [2] PortResourceRequestReSchedulingTest

    Change-Id: Ib18ecaebbeccc9c9cea32ca8949175fcea0414ac
    Related-Bug: #1813147

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/682210
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d6b5e81df55394a87ec7c3e2fb75618506fffaa6
Submitter: Zuul
Branch: stable/stein

commit d6b5e81df55394a87ec7c3e2fb75618506fffaa6
Author: Matt Riedemann <email address hidden>
Date: Fri Sep 13 14:18:25 2019 -0400

    libvirt: stub logging of host capabilities

    By default we capture the logging output stream at INFO level
    for unit/functional tests and anything that hits the libvirt
    Host.get_capabilities method logs the giant xml host capabilities
    string that is defined in the fakelibvirt module. This can cause
    subunit parsing to fail because there is too much output in the
    buffer.

    This change simply stubs out that particular logging call in the
    FakeLibvirtFixture class which is used extensively in the more
    detailed libvirt tests like LibvirtConnTestCase. If this turns out
    to not be good enough we can try stubbing at a more global level
    like in the base nova TestCase.

    Conflicts:
          nova/virt/libvirt/host.py

    NOTE(mriedem): The conflict is due to not having change
    I06e1f7429c056c4ce8506b10359762e457dbb2a0 in Stein.

    Change-Id: I38b350bce908005161ed7c3eb737f32076828a37
    Related-Bug: #1813147
    (cherry picked from commit 2973faa07063d2112ef7459ab07cc07e20d74610)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/684781

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/684781
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=79ee80d159e8be0d9f8215c2c5e210c545c551a7
Submitter: Zuul
Branch: master

commit 79ee80d159e8be0d9f8215c2c5e210c545c551a7
Author: Stephen Finucane <email address hidden>
Date: Wed Sep 25 16:18:41 2019 +0100

    Ignore sqla-migrate inspect.getargspec deprecation warnings on py36

    The 'message' argument is a regex so you need to escape brackets and the
    likes. Strange that this worked on py27.

    This was coincidentally fixed by Ib9f735216773224f91ac7f49fbe2eee119670872
    in Train but this change is backportable while that one is not.

    Change-Id: Ia1f0fa6a46bdd0a87d40721937355b2cc887f1c9
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-Bug: #1813147

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/684772
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d0ca30b80a5063d5541587deda33e1a466a10f30
Submitter: Zuul
Branch: master

commit d0ca30b80a5063d5541587deda33e1a466a10f30
Author: Stephen Finucane <email address hidden>
Date: Wed Sep 25 15:59:12 2019 +0100

    Ignore warning from sqlalchemy-migrate

    This has been fixed but not released. Another warning is updated since
    modules don't have .py extensions.

    Change-Id: Iaa93b9743214300597650040c1cd34699f12ab3f
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-Bug: #1813147

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/685072

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/686836

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/686836
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3c27f3cdc0538c2b8a14cf309f04acaf1599a44b
Submitter: Zuul
Branch: master

commit 3c27f3cdc0538c2b8a14cf309f04acaf1599a44b
Author: Matt Riedemann <email address hidden>
Date: Fri Oct 4 16:48:56 2019 -0400

    Filter out alembic logs below WARNING in tests

    During functional tests we see quite a bit of alembic
    INFO level logging which we don't really need. This
    change sets the alembic logging level to WARNING to
    remove that noise.

    Change-Id: I9550e1c0c93b48e234f429204e27960d0d1cda81
    Related-Bug: #1813147

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/685072
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=58ad6aba98afe25a0f596decce3af567d0abe4e0
Submitter: Zuul
Branch: stable/stein

commit 58ad6aba98afe25a0f596decce3af567d0abe4e0
Author: Stephen Finucane <email address hidden>
Date: Wed Sep 25 16:18:41 2019 +0100

    Ignore sqla-migrate inspect.getargspec deprecation warnings on py36

    The 'message' argument is a regex so you need to escape brackets and the
    likes. Strange that this worked on py27.

    This was coincidentally fixed by Ib9f735216773224f91ac7f49fbe2eee119670872
    in Train but this change is backportable while that one is not.

    Conflicts:
          nova/tests/fixtures.py

    NOTE(mriedem): The conflict is due to not having change
    Ib9f735216773224f91ac7f49fbe2eee119670872 in Stein, as
    noted above.

    Change-Id: Ia1f0fa6a46bdd0a87d40721937355b2cc887f1c9
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-Bug: #1813147
    (cherry picked from commit 79ee80d159e8be0d9f8215c2c5e210c545c551a7)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/697301

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/697301
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=945d662d4f0a0b1b14f32ab92aaf3f0f35e0be4b
Submitter: Zuul
Branch: master

commit 945d662d4f0a0b1b14f32ab92aaf3f0f35e0be4b
Author: Matt Riedemann <email address hidden>
Date: Wed Dec 4 09:50:17 2019 -0500

    Handle ServiceNotFound in DbDriver._report_state

    If a service record is gone from the DB the _report_state method
    will log a generic traceback every time the report interval runs,
    which is every 10 seconds by default. This gets pretty noisy and
    the error logged isn't very useful. One could get into this state
    by deleting the service record in the API before stopping the actual
    process that is running.

    This simply handles the ServiceNotFound error and logs a more useful
    error message without the noisy traceback.

    Change-Id: If0336001fbe39922a199756db0803121cbe560af
    Related-Bug: #1813147

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/700522

Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

I have a potential "fix" for this proposed at [1], with a related change for stestr here [2].

[1] https://github.com/testing-cabal/subunit/pull/40
[2] https://github.com/mtreinish/stestr/pull/285

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/713611

Changed in nova:
assignee: Balazs Gibizer (balazs-gibizer) → Stephen Finucane (stephenfinucane)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Stephen Finucane (<email address hidden>) on branch: master
Review: https://review.opendev.org/700522
Reason: This should be resolved with testtools 1.4.0 and whatever the next version of stestr ends up being

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/713611
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a2f30f9d7cad1c025592c1087ff0cba6dec9a309
Submitter: Zuul
Branch: master

commit a2f30f9d7cad1c025592c1087ff0cba6dec9a309
Author: Stephen Finucane <email address hidden>
Date: Wed Mar 18 10:19:07 2020 +0000

    Bump python-subunit minimum to 1.4.0

    This contains the fix required to avoid the 'subunit parser error'
    errors we've been seeing for some time now.

    Change-Id: I605005e97b58a9dc037f426c7c313e8dfb081e4d
    Signed-off-by: Stephen Finucane <email address hidden>
    Closes-Bug: #1813147

Changed in nova:
status: In Progress → Fix Released
Changed in glance:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.