Fatal Python error: Cannot recover from stack overflow. - in py35 unit test job

Bug #1685333 reported by Matt Riedemann on 2017-04-21
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Matt Riedemann
Pike
Medium
Matt Riedemann

Bug Description

Seeing this in the py35 job, looks like it's related to an infinite recursion in oslo.config:

Fatal Python error: Cannot recover from stack overflow.

http://logs.openstack.org/34/458834/2/check/gate-nova-python35/c55b003/console.html#_2017-04-21_16_36_11_981505

I'm not entirely sure which test it is, but I suspect this one which is still in progress when the job dies:

{0} nova.tests.unit.test_rpc.TestRPC.test_add_extra_exmods [] ... inprogress

Balazs Gibizer (balazs-gibizer) wrote :

I saw a very similar failure https://bugs.launchpad.net/nova/+bug/1706563 but there a different test gets stuck:

nova.tests.unit.test_rpc.TestRPC.test_cleanup_notifier_null [] ... inprogress

Balazs Gibizer (balazs-gibizer) wrote :

We have 7 occurrences so far with the stack overflow signature:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Fatal%20Python%20error%3A%20Cannot%20recover%20from%20stack%20overflow.%5C%22

TestRPC.test_cleanup_notification_transport_null : 3
TestRPC.test_cleanup : 2
TestRPC.test_cleanup_notifier_null: 1
TestRPC.test_clear_extra_exmods: 1

So this is most likely a problem in the TestRPC tests

Balazs Gibizer (balazs-gibizer) wrote :

there was couple of new occurrences all in TestRPC. I still wasn't able to reproduce the problem locally. Can we somehow tell python to to truncate the stack trace in this case to see where the infinite recursion starts?

Balazs Gibizer (balazs-gibizer) wrote :

s/to to truncate/not to truncate/

Balazs Gibizer (balazs-gibizer) wrote :

The 'python3.5 -X faulthandler' could be a way to catch this but that also only supports dumping up to the last 100 stack frames.

Reviewed: https://review.openstack.org/507239
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0534872abb78738993a35d24a6640c82b711deee
Submitter: Jenkins
Branch: master

commit 0534872abb78738993a35d24a6640c82b711deee
Author: Matt Riedemann <email address hidden>
Date: Mon Sep 25 14:57:43 2017 -0400

    Make TestRPC inherit from the base nova TestCase

    By not inheriting from the base nova test case, we
    lose things like timeouts.

    This makes the TestRPC test class inherit from the
    base test case and still avoid using the RPCFixture.

    Change-Id: Id65e15a57175bbdd9c84851b1b6716e6a1f2cfb8
    Related-Bug: #1685333

Matt Riedemann (mriedem) on 2017-10-13
Changed in nova:
importance: High → Medium
Balazs Gibizer (balazs-gibizer) wrote :

Based on logstash there was no new appearance of this problem since the related patch https://review.openstack.org/507239 was merged so I think the related patch actually fixed the problem. So this can be closed.

Matt Riedemann (mriedem) on 2017-10-25
Changed in nova:
status: Confirmed → Fix Released
assignee: nobody → Matt Riedemann (mriedem)

Reviewed: https://review.openstack.org/511842
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d873550d3c357adb6f3544adbc43e8ab9b94cfbd
Submitter: Zuul
Branch: stable/pike

commit d873550d3c357adb6f3544adbc43e8ab9b94cfbd
Author: Matt Riedemann <email address hidden>
Date: Mon Sep 25 14:57:43 2017 -0400

    Make TestRPC inherit from the base nova TestCase

    By not inheriting from the base nova test case, we
    lose things like timeouts.

    This makes the TestRPC test class inherit from the
    base test case and still avoid using the RPCFixture.

    Change-Id: Id65e15a57175bbdd9c84851b1b6716e6a1f2cfb8
    Related-Bug: #1685333
    (cherry picked from commit 0534872abb78738993a35d24a6640c82b711deee)

tags: added: in-stable-pike
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers