TestAsyncProcess.test_async_process_respawns fails with TimeoutException

Bug #1477860 reported by Ihar Hrachyshka
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Jakub Libosvar
Kilo
Fix Released
Undecided
Unassigned

Bug Description

Logstash: http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiaW4gdGVzdF9hc3luY19wcm9jZXNzX3Jlc3Bhd25zXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0Mzc3MjMxNTU2ODB9

fails for both feature/qos and master:

2015-07-24 06:35:09.394 | 2015-07-24 06:35:07.369 | Captured traceback:
2015-07-24 06:35:09.394 | 2015-07-24 06:35:07.370 | ~~~~~~~~~~~~~~~~~~~
2015-07-24 06:35:09.394 | 2015-07-24 06:35:07.371 | Traceback (most recent call last):
2015-07-24 06:35:09.394 | 2015-07-24 06:35:07.372 | File "neutron/tests/functional/agent/linux/test_async_process.py", line 70, in test_async_process_respawns
2015-07-24 06:35:09.394 | 2015-07-24 06:35:07.373 | proc._kill_process(proc.pid)
2015-07-24 06:35:09.395 | 2015-07-24 06:35:07.375 | File "neutron/agent/linux/async_process.py", line 177, in _kill_process
2015-07-24 06:35:09.395 | 2015-07-24 06:35:07.376 | self._process.wait()
2015-07-24 06:35:09.395 | 2015-07-24 06:35:07.377 | File "/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/eventlet/green/subprocess.py", line 75, in wait
2015-07-24 06:35:09.395 | 2015-07-24 06:35:07.378 | eventlet.sleep(check_interval)
2015-07-24 06:35:09.395 | 2015-07-24 06:35:07.379 | File "/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/eventlet/greenthread.py", line 34, in sleep
2015-07-24 06:35:09.396 | 2015-07-24 06:35:07.380 | hub.switch()
2015-07-24 06:35:09.396 | 2015-07-24 06:35:07.381 | File "/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch
2015-07-24 06:35:09.396 | 2015-07-24 06:35:07.382 | return self.greenlet.switch()
2015-07-24 06:35:09.396 | 2015-07-24 06:35:07.383 | File "/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 346, in run
2015-07-24 06:35:09.397 | 2015-07-24 06:35:07.384 | self.wait(sleep_time)
2015-07-24 06:35:09.397 | 2015-07-24 06:35:07.385 | File "/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 85, in wait
2015-07-24 06:35:09.397 | 2015-07-24 06:35:07.387 | presult = self.do_poll(seconds)
2015-07-24 06:35:09.397 | 2015-07-24 06:35:07.388 | File "/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/eventlet/hubs/epolls.py", line 62, in do_poll
2015-07-24 06:35:09.398 | 2015-07-24 06:35:07.389 | return self.poll.poll(seconds)
2015-07-24 06:35:09.398 | 2015-07-24 06:35:07.390 | File "/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/fixtures/_fixtures/timeout.py", line 52, in signal_handler
2015-07-24 06:35:09.398 | 2015-07-24 06:35:07.391 | raise TimeoutException()
2015-07-24 06:35:09.398 | 2015-07-24 06:35:07.392 | fixtures._fixtures.timeout.TimeoutException

Example: http://logs.openstack.org/64/199164/2/check/gate-neutron-dsvm-functional/9b43ead/console.html

Assaf Muller (amuller)
tags: added: functional-tests
Assaf Muller (amuller)
Changed in neutron:
importance: Undecided → High
status: New → Confirmed
tags: added: gate-failure
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

3 hits last 7 days.

Changed in neutron:
importance: High → Medium
Changed in neutron:
milestone: none → liberty-rc1
Changed in neutron:
importance: Medium → High
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I am gonna triage this a bit.

Changed in neutron:
assignee: nobody → Armando Migliaccio (armando-migliaccio)
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

argh...I got harassed and I couldn't do this...we'll get back to it if I can.

Changed in neutron:
assignee: Armando Migliaccio (armando-migliaccio) → nobody
Revision history for this message
Jakub Libosvar (libosvar) wrote :

The culrpit appears to be race between respawn and wait. We kill the tail process, execute() of kill command yields in eventlet and if in the meantime AsyncProcess is respawned then new _process attribute is created and later then, wait() is called on newly spawned 'tail' process.

I reproduced locally with debugger attached:
ipdb> print pid
15447 <-- pid we killed

ipdb> print self._process.pid
15450 <--- pid which we call wait() on

$ ps -ef | grep tail
vagrant 15450 15428 0 13:08 pts/6 00:00:00 tail -f /tmp/tmpA54wnp/tmp_7vrjM/test_async_process.tmp
                ^^^^^ process is still running

Changed in neutron:
assignee: nobody → Jakub Libosvar (libosvar)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/223160

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/223160
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f4a76a7a26c0902d61f4fe61091e7fe556923592
Submitter: Jenkins
Branch: master

commit f4a76a7a26c0902d61f4fe61091e7fe556923592
Author: Jakub Libosvar <email address hidden>
Date: Mon Sep 14 14:54:34 2015 +0000

    func: Don't use private method of AsyncProcess

    In functional test we simulate crash of AsyncProcess by calling
    _kill_process(). This method is a private method and such usage
    introduced a race where process was respawned prior to calling wait() of
    killed process, leading to infinite wait on newly spawned process.

    This patch adds manual send of kill and then active waiting for process
    to be respawned, similarly like done with recent keepalived patch [1].

    [1] https://review.openstack.org/#/c/222460/7/neutron/tests/functional/agent/linux/test_keepalived.py

    Closes-Bug: #1477860
    Change-Id: I1c91393304d65a0695311416ecc5b64fd549b192

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (feature/pecan)

Fix proposed to branch: feature/pecan
Review: https://review.openstack.org/224334

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: feature/pecan
Review: https://review.openstack.org/224357

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (feature/pecan)
Download full text (73.6 KiB)

Reviewed: https://review.openstack.org/224357
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fdc3431ccd219accf6a795079d9b67b8656eed8e
Submitter: Jenkins
Branch: feature/pecan

commit fe236bdaadb949661a0bfb9b62ddbe432b4cf5f1
Author: Miguel Angel Ajo <email address hidden>
Date: Thu Sep 3 15:40:12 2015 +0200

    No network devices on network attached qos policies

    Network devices, like internal router legs, or dhcp ports
    should not be affected by bandwidth limiting rules.

    This patch disables application of network attached policies
    to network/neutron owned ports.

    Closes-bug: #1486039
    DocImpact

    Change-Id: I75d80227f1e6c4b3f5fa7762b8dc3b0c0f1abd46

commit db4a06f7caa20a4c7879b58b20e95b223ed8eeaf
Author: Ken'ichi Ohmichi <email address hidden>
Date: Wed Sep 16 10:04:32 2015 +0000

    Use tempest-lib's token_client

    Now tempest-lib provides token_client modules as library and the
    interface is stable. So neutron repogitory doesn't need to contain
    these modules.
    This patch makes neutron use tempest-lib's token_client and removes
    the own modules for the maintenance.

    Change-Id: Ieff7eb003f6e8257d83368dbc80e332aa66a156c

commit 78aed58edbe6eb8a71339c7add491fe9de9a0546
Author: Jakub Libosvar <email address hidden>
Date: Thu Aug 13 09:08:20 2015 +0000

    Fix establishing UDP connection

    Previously, in establish_connection() for UDP protocol data were sent
    but never read on peer socket. That lead to successful read on peer side
    if this connection was filtered. Having constant testing string masked
    this issue as we can't distinguish to which test of connectivity data
    belong.

    This patch makes unique data string per test_connectivity() and
    also makes establish_connection() to create an ASSURED entry in
    conntrack table. Finally, in last test after firewall filter was
    removed, connection is re-established in order to avoid troubles with
    terminated processes or TCP continuing sending packets which weren't
    successfully delivered.

    Closes-Bug: 1478847
    Change-Id: I2920d587d8df8d96dc1c752c28f48ba495f3cf0f

commit e6292fcdd6262434a7b713ad8802db6bc8a6d3dc
Author: YAMAMOTO Takashi <email address hidden>
Date: Wed Sep 16 13:20:51 2015 +0900

    ovsdb: Fix a few docstring

    Change-Id: I53e1e21655b28fe5da60e58aeeb7cbbd103ae014

commit c22949a4449d96a67caa616290cf76b67b182917
Author: fumihiko kakuma <email address hidden>
Date: Wed Sep 16 11:52:59 2015 +0900

    Remove requirements.txt for the ofagent mechanism driver

    It is no longer used.

    Related-Blueprint: core-vendor-decomposition
    https://blueprints.launchpad.net/neutron/+spec/core-vendor-decomposition

    Change-Id: Ib31fb3febf8968e50d86dd66e1e6e1ea2313f8ac

commit d1d4de19d85f961d388c91e70f31b3bafec418c5
Author: Kevin Benton <email address hidden>
Date: Thu Sep 3 20:25:57 2015 -0700

    Always return iterables in L3 get_candidates

    The caller of this function expects iterables.

    Closes-Bug: #1494996
    Change-Id: I3d103e63f4e127a77268502415c0ddb0d804b54a

commit 1ad6ac448067306...

tags: added: in-feature-pecan
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (feature/pecan)

Change abandoned by Doug Wiegley (<email address hidden>) on branch: feature/pecan
Review: https://review.openstack.org/224334

Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/231576

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/kilo)

Reviewed: https://review.openstack.org/231576
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1c87d728893d3177030e369bb7827484f854fc42
Submitter: Jenkins
Branch: stable/kilo

commit 1c87d728893d3177030e369bb7827484f854fc42
Author: Jakub Libosvar <email address hidden>
Date: Mon Sep 14 14:54:34 2015 +0000

    func: Don't use private method of AsyncProcess

    In functional test we simulate crash of AsyncProcess by calling
    _kill_process(). This method is a private method and such usage
    introduced a race where process was respawned prior to calling wait() of
    killed process, leading to infinite wait on newly spawned process.

    This patch adds manual send of kill and then active waiting for process
    to be respawned, similarly like done with recent keepalived patch [1].

    [1] https://review.openstack.org/#/c/222460/7/neutron/tests/functional/agent/linux/test_keepalived.py

    Closes-Bug: #1477860
    Change-Id: I1c91393304d65a0695311416ecc5b64fd549b192
    (cherry picked from commit f4a76a7a26c0902d61f4fe61091e7fe556923592)

tags: added: in-stable-kilo
Thierry Carrez (ttx)
Changed in neutron:
milestone: liberty-rc1 → 7.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.