Functional test test_restart_wsgi_on_sighup_multiple_workers is failing sometimes

Bug #1886956 reported by Slawek Kaplonski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Slawek Kaplonski

Bug Description

Test neutron.tests.functional.test_server.TestWsgiServer.test_restart_wsgi_on_sighup_multiple_workers is failing from time to time.

Logstash query: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22in%20_test_restart_service_on_sighup%5C%22

Failure example: https://f7a63aeb9edd557a2176-4740624f0848c8c3257f704064a4516f.ssl.cf2.rackcdn.com/736026/4/gate/neutron-functional/d7d5c47/testr_results.html

ft1.1: neutron.tests.functional.test_server.TestWsgiServer.test_restart_wsgi_on_sighup_multiple_workerstesttools.testresult.real._StringException: Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 704, in wait_until_true
    eventlet.sleep(sleep)
  File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/greenthread.py", line 36, in sleep
    hub.switch()
  File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch
    return self.greenlet.switch()
eventlet.timeout.Timeout: 5 seconds

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 182, in func
    return f(self, *args, **kwargs)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/test_server.py", line 237, in test_restart_wsgi_on_sighup_multiple_workers
    workers=2)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/test_server.py", line 163, in _test_restart_service_on_sighup
    utils.wait_until_true(condition, timeout=5, sleep=1)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 709, in wait_until_true
    raise WaitTimeout(_("Timed out after %d seconds") % timeout)
neutron.common.utils.WaitTimeout: Timed out after 5 seconds

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/740283

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/740283
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0c55ab9c78df028e9e321a3f8dea59ede217fef8
Submitter: Zuul
Branch: master

commit 0c55ab9c78df028e9e321a3f8dea59ede217fef8
Author: Slawek Kaplonski <email address hidden>
Date: Thu Jul 9 15:19:03 2020 +0200

    [Functional] Add logging to the check test file function

    In the functional tests module which is testing WSGI Server, there
    is used function to assert that file created by the WSGI service is
    created and has got correct size.
    This function wasn't logging anything so in case of test failure
    there was no way to check why it really failed.

    This patch adds some log messages to the check so it will be clear
    if file wasn't created at all or if size was not as expected.

    Change-Id: Ib610b8513b4ea888a540873b26c3f205ac575b17
    Related-Bug: #1886956

tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/757573

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/757574

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/757573
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b0892ec3f76d4f22364eaad9dc353dd5b3e406dc
Submitter: Zuul
Branch: stable/ussuri

commit b0892ec3f76d4f22364eaad9dc353dd5b3e406dc
Author: Slawek Kaplonski <email address hidden>
Date: Thu Jul 9 15:19:03 2020 +0200

    [Functional] Add logging to the check test file function

    In the functional tests module which is testing WSGI Server, there
    is used function to assert that file created by the WSGI service is
    created and has got correct size.
    This function wasn't logging anything so in case of test failure
    there was no way to check why it really failed.

    This patch adds some log messages to the check so it will be clear
    if file wasn't created at all or if size was not as expected.

    Change-Id: Ib610b8513b4ea888a540873b26c3f205ac575b17
    Related-Bug: #1886956
    (cherry picked from commit 0c55ab9c78df028e9e321a3f8dea59ede217fef8)

tags: added: in-stable-ussuri
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I didn't saw such issues recently so I'm closing this bug (for now at least).

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I just found new occurences of that issue: https://0301ae0bf83f02db0b33-cb59b7799bd327c5e6811146313bf22f.ssl.cf5.rackcdn.com/759697/1/check/neutron-functional-with-uwsgi/53e42f1/controller/logs/dsvm-functional-logs/neutron.tests.functional.test_server.TestRPCServer.test_restart_rpc_on_sighup_multiple_workers.txt

It looks that after we send SIUGHUP to the process, parent process has died:

2020-10-27 17:42:19.249 299384 INFO oslo_service.service [-] Parent process has died unexpectedly, exiting
2020-10-27 17:42:19.249 299385 INFO oslo_service.service [-] Parent process has died unexpectedly, exiting

And because of that it never restarts workers so test failed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/train)

Reviewed: https://review.opendev.org/757574
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=86463bd822dc004c812acf79ad95aeed7e43f71e
Submitter: Zuul
Branch: stable/train

commit 86463bd822dc004c812acf79ad95aeed7e43f71e
Author: Slawek Kaplonski <email address hidden>
Date: Thu Jul 9 15:19:03 2020 +0200

    [Functional] Add logging to the check test file function

    In the functional tests module which is testing WSGI Server, there
    is used function to assert that file created by the WSGI service is
    created and has got correct size.
    This function wasn't logging anything so in case of test failure
    there was no way to check why it really failed.

    This patch adds some log messages to the check so it will be clear
    if file wasn't created at all or if size was not as expected.

    Change-Id: Ib610b8513b4ea888a540873b26c3f205ac575b17
    Related-Bug: #1886956
    (cherry picked from commit 0c55ab9c78df028e9e321a3f8dea59ede217fef8)

tags: added: in-stable-train
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

No new occurences in at least last 30 days. I'm closing this bug for now. Let's hope it will not go back :)

Changed in neutron:
status: Confirmed → Incomplete
Revision history for this message
Slawek Kaplonski (slaweq) wrote :
Download full text (8.9 KiB)

I was checking new occurrence of that issue today. And the only thing which I found is that when this issue happend there were logs that:

2021-01-25 09:00:55.777 317432 INFO oslo_service.service [-] Parent process has died unexpectedly, exiting
2021-01-25 09:00:55.777 317433 INFO oslo_service.service [-] Parent process has died unexpectedly, exiting

just at the beginning of the test. It seems that it was before workers were even restarted and because of that there was missing entries for restart.
The other thing which is there (and which I can't see in any log from the test which passed) is the error from the ovn maintenance worker, like:

2021-01-25 09:00:58.809 61035 ERROR futurist.periodics [req-15aa196f-6c70-4a99-b79f-acaf6c85660d - - - - -] Failed to call periodic 'neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.maintenance.HashRingHealthCheckPeriodics.touch_hash_ring_nodes' (it runs every 30.00 seconds): oslo_db.exception.DBNonExistentTable: (sqlite3.OperationalError) no such table: ovn_hash_ring
[SQL: UPDATE ovn_hash_ring SET updated_at=? WHERE ovn_hash_ring.hostname = ? AND ovn_hash_ring.group_name = ?]
[parameters: ('2021-01-25 09:00:58.806126', 'ubuntu-focal-ovh-bhs1-0022700828', 'mechanism_driver')]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
2021-01-25 09:00:58.809 61035 ERROR futurist.periodics Traceback (most recent call last):
2021-01-25 09:00:58.809 61035 ERROR futurist.periodics File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
2021-01-25 09:00:58.809 61035 ERROR futurist.periodics self.dialect.do_execute(
2021-01-25 09:00:58.809 61035 ERROR futurist.periodics File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 609, in do_execute
2021-01-25 09:00:58.809 61035 ERROR futurist.periodics cursor.execute(statement, parameters)
2021-01-25 09:00:58.809 61035 ERROR futurist.periodics sqlite3.OperationalError: no such table: ovn_hash_ring
2021-01-25 09:00:58.809 61035 ERROR futurist.periodics
2021-01-25 09:00:58.809 61035 ERROR futurist.periodics The above exception was the direct cause of the following exception:
2021-01-25 09:00:58.809 61035 ERROR futurist.periodics
2021-01-25 09:00:58.809 61035 ERROR futurist.periodics Traceback (most recent call last):
2021-01-25 09:00:58.809 61035 ERROR futurist.periodics File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/futurist/periodics.py", line 293, in run
2021-01-25 09:00:58.809 61035 ERROR futurist.periodics work()
2021-01-25 09:00:58.809 61035 ERROR futurist.periodics File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/futurist/periodics.py", line 67, in __call__
2021-01-25 09:00:58.809 61035 ERROR futurist.periodics return self.callback(*self.args, **self.kwargs)
2021-01-25 09:00:58.809 61035 ERROR futurist.periodics File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/futurist/periodics.py", line 181, in deco...

Read more...

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

After https://review.opendev.org/c/openstack/neutron/+/778080 was merged I didn't saw it anymore in CI. So I'm closing that LP now.

Changed in neutron:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.