tempest

test_router_rescheduling failed with unexpected FIP status after rescheduling

Bug #1644937 reported by Ihar Hrachyshka on 2016-11-25

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tempest	Fix Released	High	Ihar Hrachyshka

Bug Description

http://logs.openstack.org/48/395748/15/check/gate-tempest-dsvm-neutron-linuxbridge-ubuntu-xenial/366ecda/logs/testr_results.html.gz

Traceback (most recent call last):
  File "tempest/test.py", line 119, in wrapper
    return func(*func_args, **func_kwargs)
  File "tempest/test.py", line 100, in wrapper
    return f(self, *func_args, **func_kwargs)
  File "tempest/scenario/test_network_basic_ops.py", line 768, in test_router_rescheduling
    msg='After router rescheduling')
  File "tempest/scenario/test_network_basic_ops.py", line 206, in check_public_network_connectivity
    self.check_floating_ip_status(floating_ip, floatingip_status)
  File "tempest/scenario/manager.py", line 907, in check_floating_ip_status
    st=status))
  File "/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/testtools/testcase.py", line 411, in assertEqual
    self.assertThat(observed, matcher, message)
  File "/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/testtools/testcase.py", line 498, in assertThat
    raise mismatch_error
testtools.matchers._impl.MismatchError: 'ACTIVE' != u'DOWN': FloatingIP: {u'id': u'41cbfe8e-066c-414b-9b06-f818cc845ff9', u'status': u'DOWN', u'description': u'', u'router_id': u'fa20a117-a4f4-42db-9232-92ed1a3fd211', u'tenant_id': u'2d4bc4eb27984f1f9466af083e6c0929', u'revision_number': 1, u'port_id': u'1204588c-9084-4846-92d8-c0ed0719b16d', u'fixed_ip_address': u'10.1.0.3', u'floating_ip_address': u'172.24.5.17', u'updated_at': u'2016-11-25T16:17:42Z', u'created_at': u'2016-11-25T16:17:42Z', u'floating_network_id': u'3c4a550a-3dd9-4501-9bc8-0719853048d1', u'project_id': u'2d4bc4eb27984f1f9466af083e6c0929'} is at status: DOWN. failed to reach status: ACTIVE

In tempest log, we see that we rescheduled the same agent for a router, then check it's indeed in the list of scheduled agents, then check FIP status to be ACTIVE. The last check fails (initially it's ACTIVE, but then just before we check for the last time, it flips back to DOWN).

Looking into l3 agent logs, it seems like the agent flipped the status to DOWN while processing the previous unscheduling event.

It seems like we should wait for the FIP status to flip to DOWN after unscheduling, to make sure the agent is done with unscheduling event processing, so that we are safe to proceed with rescheduling without a risk of some previous update events still sitting in router update queue.

Tags:

Ihar Hrachyshka (ihar-hrachyshka) on 2016-11-25

Changed in neutron:
status:	New → Confirmed
assignee:	nobody → Ihar Hrachyshka (ihar-hrachyshka)
importance:	Undecided → High
tags:	added: gate-failure
affects:	neutron → tempest

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-25: Fix proposed to tempest (master)

Fix proposed to branch: master
Review: https://review.openstack.org/403289

Changed in tempest:
status:	Confirmed → In Progress

Revision history for this message

Attila Fazekas (afazekas) wrote on 2016-11-29:

Generally tempest should cannot / and should not assume anything what happening at any agent scheduling, becuase it is not documented `internal` behavior, on the long run these needs to moved to another repo (neutron tests ?), or the behavior has to be exactly specified by the api documentation.

Revision history for this message

Ihar Hrachyshka (ihar-hrachyshka) wrote on 2016-11-29:

Attila, agreed. Though we still should fix the test, and tempest network test cleanup is a separate task.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-30: Fix merged to tempest (master)

Reviewed: https://review.openstack.org/403289
Committed: https://git.openstack.org/cgit/openstack/tempest/commit/?id=3a0a0f7288153030c2b104336df416766593e38d
Submitter: Jenkins
Branch: master

commit 3a0a0f7288153030c2b104336df416766593e38d
Author: Ihar Hrachyshka <email address hidden>
Date: Sun Nov 13 11:54:54 2016 +0000

Wait for FIP status to get to DOWN in test_router_rescheduling

    If we don't wait until the port actually gets to DOWN in database, we
    may proceed to scheduling the router again and check floating IP status,
    expecting it to be ACTIVE, but then catch DOWN still in database because
    the router was slow to process the unscheduling event.

Closes-Bug: #1644937
Change-Id: I0806ead789953a4ff879ef9e6e77e1c66e658316

Changed in tempest:
status:	In Progress → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.