Fatal Python error: Cannot recover from stack overflow. - in py35 unit test job

Bug #1685333 reported by Matt Riedemann
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann
Pike
Fix Released
Medium
Matt Riedemann

Bug Description

Seeing this in the py35 job, looks like it's related to an infinite recursion in oslo.config:

Fatal Python error: Cannot recover from stack overflow.

http://logs.openstack.org/34/458834/2/check/gate-nova-python35/c55b003/console.html#_2017-04-21_16_36_11_981505

I'm not entirely sure which test it is, but I suspect this one which is still in progress when the job dies:

{0} nova.tests.unit.test_rpc.TestRPC.test_add_extra_exmods [] ... inprogress

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

I saw a very similar failure https://bugs.launchpad.net/nova/+bug/1706563 but there a different test gets stuck:

nova.tests.unit.test_rpc.TestRPC.test_cleanup_notifier_null [] ... inprogress

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

We have 7 occurrences so far with the stack overflow signature:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Fatal%20Python%20error%3A%20Cannot%20recover%20from%20stack%20overflow.%5C%22

TestRPC.test_cleanup_notification_transport_null : 3
TestRPC.test_cleanup : 2
TestRPC.test_cleanup_notifier_null: 1
TestRPC.test_clear_extra_exmods: 1

So this is most likely a problem in the TestRPC tests

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

there was couple of new occurrences all in TestRPC. I still wasn't able to reproduce the problem locally. Can we somehow tell python to to truncate the stack trace in this case to see where the infinite recursion starts?

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

s/to to truncate/not to truncate/

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

The 'python3.5 -X faulthandler' could be a way to catch this but that also only supports dumping up to the last 100 stack frames.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/507239

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/507239
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0534872abb78738993a35d24a6640c82b711deee
Submitter: Jenkins
Branch: master

commit 0534872abb78738993a35d24a6640c82b711deee
Author: Matt Riedemann <email address hidden>
Date: Mon Sep 25 14:57:43 2017 -0400

    Make TestRPC inherit from the base nova TestCase

    By not inheriting from the base nova test case, we
    lose things like timeouts.

    This makes the TestRPC test class inherit from the
    base test case and still avoid using the RPCFixture.

    Change-Id: Id65e15a57175bbdd9c84851b1b6716e6a1f2cfb8
    Related-Bug: #1685333

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/511842

Matt Riedemann (mriedem)
Changed in nova:
importance: High → Medium
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Based on logstash there was no new appearance of this problem since the related patch https://review.openstack.org/507239 was merged so I think the related patch actually fixed the problem. So this can be closed.

Matt Riedemann (mriedem)
Changed in nova:
status: Confirmed → Fix Released
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/511842
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d873550d3c357adb6f3544adbc43e8ab9b94cfbd
Submitter: Zuul
Branch: stable/pike

commit d873550d3c357adb6f3544adbc43e8ab9b94cfbd
Author: Matt Riedemann <email address hidden>
Date: Mon Sep 25 14:57:43 2017 -0400

    Make TestRPC inherit from the base nova TestCase

    By not inheriting from the base nova test case, we
    lose things like timeouts.

    This makes the TestRPC test class inherit from the
    base test case and still avoid using the RPCFixture.

    Change-Id: Id65e15a57175bbdd9c84851b1b6716e6a1f2cfb8
    Related-Bug: #1685333
    (cherry picked from commit 0534872abb78738993a35d24a6640c82b711deee)

tags: added: in-stable-pike
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers