test_router_rescheduling failed with unexpected FIP status after rescheduling

Bug #1644937 reported by Ihar Hrachyshka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tempest
Fix Released
High
Ihar Hrachyshka

Bug Description

http://logs.openstack.org/48/395748/15/check/gate-tempest-dsvm-neutron-linuxbridge-ubuntu-xenial/366ecda/logs/testr_results.html.gz

Traceback (most recent call last):
  File "tempest/test.py", line 119, in wrapper
    return func(*func_args, **func_kwargs)
  File "tempest/test.py", line 100, in wrapper
    return f(self, *func_args, **func_kwargs)
  File "tempest/scenario/test_network_basic_ops.py", line 768, in test_router_rescheduling
    msg='After router rescheduling')
  File "tempest/scenario/test_network_basic_ops.py", line 206, in check_public_network_connectivity
    self.check_floating_ip_status(floating_ip, floatingip_status)
  File "tempest/scenario/manager.py", line 907, in check_floating_ip_status
    st=status))
  File "/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/testtools/testcase.py", line 411, in assertEqual
    self.assertThat(observed, matcher, message)
  File "/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/testtools/testcase.py", line 498, in assertThat
    raise mismatch_error
testtools.matchers._impl.MismatchError: 'ACTIVE' != u'DOWN': FloatingIP: {u'id': u'41cbfe8e-066c-414b-9b06-f818cc845ff9', u'status': u'DOWN', u'description': u'', u'router_id': u'fa20a117-a4f4-42db-9232-92ed1a3fd211', u'tenant_id': u'2d4bc4eb27984f1f9466af083e6c0929', u'revision_number': 1, u'port_id': u'1204588c-9084-4846-92d8-c0ed0719b16d', u'fixed_ip_address': u'10.1.0.3', u'floating_ip_address': u'172.24.5.17', u'updated_at': u'2016-11-25T16:17:42Z', u'created_at': u'2016-11-25T16:17:42Z', u'floating_network_id': u'3c4a550a-3dd9-4501-9bc8-0719853048d1', u'project_id': u'2d4bc4eb27984f1f9466af083e6c0929'} is at status: DOWN. failed to reach status: ACTIVE

In tempest log, we see that we rescheduled the same agent for a router, then check it's indeed in the list of scheduled agents, then check FIP status to be ACTIVE. The last check fails (initially it's ACTIVE, but then just before we check for the last time, it flips back to DOWN).

Looking into l3 agent logs, it seems like the agent flipped the status to DOWN while processing the previous unscheduling event.

It seems like we should wait for the FIP status to flip to DOWN after unscheduling, to make sure the agent is done with unscheduling event processing, so that we are safe to proceed with rescheduling without a risk of some previous update events still sitting in router update queue.

Tags: gate-failure
Changed in neutron:
status: New → Confirmed
assignee: nobody → Ihar Hrachyshka (ihar-hrachyshka)
importance: Undecided → High
tags: added: gate-failure
affects: neutron → tempest
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master)

Fix proposed to branch: master
Review: https://review.openstack.org/403289

Changed in tempest:
status: Confirmed → In Progress
Revision history for this message
Attila Fazekas (afazekas) wrote :

Generally tempest should cannot / and should not assume anything what happening at any agent scheduling, becuase it is not documented `internal` behavior, on the long run these needs to moved to another repo (neutron tests ?), or the behavior has to be exactly specified by the api documentation.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

Attila, agreed. Though we still should fix the test, and tempest network test cleanup is a separate task.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tempest (master)

Reviewed: https://review.openstack.org/403289
Committed: https://git.openstack.org/cgit/openstack/tempest/commit/?id=3a0a0f7288153030c2b104336df416766593e38d
Submitter: Jenkins
Branch: master

commit 3a0a0f7288153030c2b104336df416766593e38d
Author: Ihar Hrachyshka <email address hidden>
Date: Sun Nov 13 11:54:54 2016 +0000

    Wait for FIP status to get to DOWN in test_router_rescheduling

    If we don't wait until the port actually gets to DOWN in database, we
    may proceed to scheduling the router again and check floating IP status,
    expecting it to be ACTIVE, but then catch DOWN still in database because
    the router was slow to process the unscheduling event.

    Closes-Bug: #1644937
    Change-Id: I0806ead789953a4ff879ef9e6e77e1c66e658316

Changed in tempest:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.