test_create_list_show_delete_interfaces race fail due to port_state mismatch

Bug #1407190 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Kevin Benton

Bug Description

http://logs.openstack.org/12/142412/3/gate/gate-tempest-dsvm-neutron-full/9990c84/console.html#_2015-01-02_16_07_40_306

2015-01-02 16:07:40.306 | Traceback (most recent call last):
2015-01-02 16:07:40.306 | File "tempest/test.py", line 112, in wrapper
2015-01-02 16:07:40.306 | return f(self, *func_args, **func_kwargs)
2015-01-02 16:07:40.306 | File "tempest/api/compute/servers/test_attach_interfaces.py", line 128, in test_create_list_show_delete_interfaces
2015-01-02 16:07:40.306 | self._test_show_interface(server, ifs)
2015-01-02 16:07:40.306 | File "tempest/api/compute/servers/test_attach_interfaces.py", line 81, in _test_show_interface
2015-01-02 16:07:40.306 | self.assertEqual(iface, _iface)
2015-01-02 16:07:40.306 | File "/usr/local/lib/python2.7/dist-packages/testtools/testcase.py", line 348, in assertEqual
2015-01-02 16:07:40.306 | self.assertThat(observed, matcher, message)
2015-01-02 16:07:40.306 | File "/usr/local/lib/python2.7/dist-packages/testtools/testcase.py", line 433, in assertThat
2015-01-02 16:07:40.307 | raise mismatch_error
2015-01-02 16:07:40.307 | MismatchError: !=:
2015-01-02 16:07:40.307 | reference = {u'fixed_ips': [{u'ip_address': u'10.100.0.4',
2015-01-02 16:07:40.307 | u'subnet_id': u'e6252f5f-b117-4156-a320-365d387f9385'}],
2015-01-02 16:07:40.307 | u'mac_addr': u'fa:16:3e:59:46:57',
2015-01-02 16:07:40.307 | u'net_id': u'1fa8b2a3-1d9c-4fdd-ac18-a6b631cbc1f6',
2015-01-02 16:07:40.307 | u'port_id': u'a56e7bba-1f6d-4382-8a72-db430235c147',
2015-01-02 16:07:40.307 | u'port_state': u'ACTIVE'}
2015-01-02 16:07:40.307 | actual = {u'fixed_ips': [{u'ip_address': u'10.100.0.4',
2015-01-02 16:07:40.307 | u'subnet_id': u'e6252f5f-b117-4156-a320-365d387f9385'}],
2015-01-02 16:07:40.308 | u'mac_addr': u'fa:16:3e:59:46:57',
2015-01-02 16:07:40.308 | u'net_id': u'1fa8b2a3-1d9c-4fdd-ac18-a6b631cbc1f6',
2015-01-02 16:07:40.308 | u'port_id': u'a56e7bba-1f6d-4382-8a72-db430235c147',
2015-01-02 16:07:40.308 | u'port_state': u'BUILD'}

Revision history for this message
Matt Riedemann (mriedem) wrote :

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwidSdwb3J0X3N0YXRlJzogdSdCVUlMRCd9XCIgQU5EIHRhZ3M6XCJjb25zb2xlXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0MjAyNTQ4MDE3Mjh9

348 hits in 7 days, looks like it really started on 12/29, check and gate, all failures. We need a better fingerprint for the failure message though so I can push a related patch to Tempest for that.

Revision history for this message
Matt Riedemann (mriedem) wrote :

There were no nova commits merged on 12/29 but there are a few neutron changes merged on 12/29 related to dealing with ports/interfaces so those might be suspect.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

The tempest code looks OK, it creates a server and gets the interface, then adds two more interfaces, waiting each time for the port_state to be 'ACTIVE', then at the end it gets the port from the server by port_id and checks to see if it's equal to the original interface created (after waiting for it to be active) and compares it to the latest, which is equal except for the port_state went back to 'BUILD' for some reason.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

Here is the nova vif plugged event response information on the neutron side:

http://logs.openstack.org/12/142412/3/gate/gate-tempest-dsvm-neutron-full/9990c84/logs/screen-q-svc.txt.gz#_2015-01-02_15_44_23_938

That's shortly after a message that the port state went from DOWN to BUILD.

--

Here is where the port state changes from ACTIVE to BUILD:

http://logs.openstack.org/12/142412/3/gate/gate-tempest-dsvm-neutron-full/9990c84/logs/screen-q-svc.txt.gz#_2015-01-02_15_44_42_497

I'm not familiar enough with what is happening in the neutron nova notifiers code in neutron, but something is definitely telling the port status to change from ACTIVE to BUILD, and I'm not sure how normal that is, can ports be 'rebuilt'? If not, we should get some warning logging into neutron to help pinpoint this a bit more for debug.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/144827

Matt Riedemann (mriedem)
Changed in neutron:
importance: Undecided → High
status: New → Confirmed
status: Confirmed → Triaged
Revision history for this message
Matt Riedemann (mriedem) wrote :

Apparently a port status going from ACTIVE to BUILD isn't uncommon, it happened 148 times in the first patch set to add more verbose logging when this case happens:

http://goo.gl/jOi3wD

Scanning the neutron code for 'BUILD' I'm not really sure what's making that port status change though.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in neutron:
assignee: nobody → Kevin Benton (kevinbenton)
Revision history for this message
Kevin Benton (kevinbenton) wrote :

This bug is a result of https://github.com/openstack/neutron/commit/3f0bf6cfac2e151d5a4a7f076062b3365bdbf457 as you had suspected.

Passing a list of interfaces to 'ovs-vsctl list interfaces' that contains an interface name that doesn't exist will cause it to return an error. This can happen if the interface is deleted between the time the ports are listed to construct the command and the time that the list command is actually executed.

Previously, having non-existent interfaces didn't matter since we were just using the name list as a membership requirement to continue the port processing. It didn't matter if the list had bonus material that didn't exist.

The fix here is relatively simple, we just add the '--if-exists' flag to the list command so it ignores interfaces which no longer exist.
That patch is up for review under change-id I8f359981386d13fb455281a72b8bb245395c0909.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/144872

Changed in neutron:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/144827
Reason: https://review.openstack.org/#/c/144872/ looks like the fix so this isn't needed anymore.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/144872
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=5de1d2ed67e00a010f23c5f9055556745a55957a
Submitter: Jenkins
Branch: master

commit 5de1d2ed67e00a010f23c5f9055556745a55957a
Author: Kevin Benton <email address hidden>
Date: Sun Jan 4 01:47:01 2015 -0800

    Ignore non-existent ports during OVS intf list

    A recent commit[1] to pass the list of port names directly to
    ovs-vsctl during a list operation introduced a new possible
    failure condition where one of the names might refer to a port
    which no longer exists. By default this causes ovs-vsctl to quit
    in a fit of rage[2].

    Previously, all interfaces were retrieved and the ones that were a
    subset of the name list were processed. The name list could contain
    extra non-existent names (e.g. recently deleted interfaces).

    This patch just passes the '--if-exists' flag to the 'list Interface'
    command to match the same previous behavior.

    1. 3f0bf6cfac2e151d5a4a7f076062b3365bdbf457
    2. Example: Stderr: 'ovs-vsctl: no row "tap80664420-ea" in table Interface\n'

    Closes-Bug: #1407190
    Change-Id: I8f359981386d13fb455281a72b8bb245395c0909

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → kilo-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-2 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.