DVR Tempest Job check-tempest-dsvm-neutron-dvr not stable when compared to the neutron job

Bug #1415522 reported by Swaminathan Vasudevan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Swaminathan Vasudevan

Bug Description

DVR Tempest Job check-tempest-dsvm-neutron-dvr is unstable when compared to the legacy router job.
This is very critical to make the DVR job gating.
So we need to find out the actual subtest that is causing the failure.

tags: added: l3-dvr-backlog
Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

I would like to see if there are any member from the infra team that can help figure out the most vulnerable test that is causing this deviation often.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/150998

Revision history for this message
Rajeev Grover (rajeev-grover) wrote :
Download full text (7.0 KiB)

Too early to conclude anything but from an analysis of the failing job http://logs.openstack.org/22/149122/3/check/check-tempest-dsvm-neutron-dvr/35e420c/logs/testr_results.html.gz noticed a few things:

a) Most of the failing tests appear to be towards the end of the test suite consisting of 1667+ test

b) It appears the test cases timeout while the L-3 agent is still in the middle of configuring the iptables for the floatingip. Here are couple of timing correlations I found in this particular test job.

c) While looking at these Mike brought up the point that there is a large accumulation of namespaces towards the tail end of the test suite. These accumulated namespaces could slow down the L-3 agent significantly.

http://logs.openstack.org/22/149122/3/check/check-tempest-dsvm-neutron-dvr/35e420c/logs/testr_results.html.gz

--> test_network_basic_ops[compute,gate,network,smoke]

2015-01-26 03:33:00,337 8115 DEBUG [tempest.scenario.manager] checking network connections to IP 172.24.4.119 with user: cirros
2015-01-26 03:35:00,550 8115 ERROR [tempest.scenario.manager] Public network connectivity check failed
2015-01-26 03:35:00.550 8115 TRACE tempest.scenario.manager Traceback (most recent call last):
2015-01-26 03:35:00.550 8115 TRACE tempest.scenario.manager File "tempest/scenario/manager.py", line 501, in check_public_network_connectivity
2015-01-26 03:35:00.550 8115 TRACE tempest.scenario.manager should_connect=should_connect)
2015-01-26 03:35:00.550 8115 TRACE tempest.scenario.manager File "tempest/scenario/manager.py", line 485, in check_vm_connectivity
2015-01-26 03:35:00.550 8115 TRACE tempest.scenario.manager msg=msg)
2015-01-26 03:35:00.550 8115 TRACE tempest.scenario.manager File "/usr/local/lib/python2.7/dist-packages/unittest2/case.py", line 678, in assertTrue
2015-01-26 03:35:00.550 8115 TRACE tempest.scenario.manager raise self.failureException(msg)
2015-01-26 03:35:00.550 8115 TRACE tempest.scenario.manager AssertionError: False is not true : Timed out waiting for 172.24.4.119 to become reachable
2015-01-26 03:35:00.550 8115 TRACE tempest.scenario.manager

---> From L-3 agent log:

2015-01-26 03:35:03.970 DEBUG neutron.agent.linux.utils [req-8ba4a420-10d9-4581-86a0-8fbcd866a745 TestNetworkBasicOps-82148639 TestNetworkBasicOps-1630875080]
Command: ['sudo', '/usr/local/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-190c5c9d-4c95-4a77-8efd-cc205e776caf', 'iptables-save', '-c']
Exit code: 0
Stdout: # Generated by iptables-save v1.4.21 on Mon Jan 26 03:35:03 2015
*raw
:PREROUTING ACCEPT [3:714]
:OUTPUT ACCEPT [1:84]
:neutron-vpn-agen-OUTPUT - [0:0]
:neutron-vpn-agen-PREROUTING - [0:0]
[3:714] -A PREROUTING -j neutron-vpn-agen-PREROUTING
[1:84] -A OUTPUT -j neutron-vpn-agen-OUTPUT
COMMIT
# Completed on Mon Jan 26 03:35:03 2015
# Generated by iptables-save v1.4.21 on Mon Jan 26 03:35:03 2015
*nat
:PREROUTING ACCEPT [2:393]
:INPUT ACCEPT [2:393]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:neutron-postrouting-bottom - [0:0]
:neutron-vpn-agen-OUTPUT - [0:0]
:neutron-vpn-agen-POSTROUTING - [0:0]
:neutron-vpn-agen-PREROUTING - [0:0]
:neutron-vpn-agen-float-snat - [0:0]
...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/151758

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/153077
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ea4abf0199bebc5f67ae5437076bdc638615e56b
Submitter: Jenkins
Branch: master

commit ea4abf0199bebc5f67ae5437076bdc638615e56b
Author: rajeev <email address hidden>
Date: Wed Feb 4 17:48:16 2015 -0500

    Log entry when no Floating IP interface present

    Having a log entry here in process_router_floating_ip_addresses
    would make it easier to understand why the status of floating ips
    wasn't updated.

    Change-Id: If7ff3d8951010ed2a4e802acdb948cfdfcb5dda6
    Related-bug: #1415522

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/153729

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/153735

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/154757

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Swaminathan Vasudevan (<email address hidden>) on branch: master
Review: https://review.openstack.org/154757
Reason: Adandoning this patch for now, since we need to maintain the rpc's for backward compatibility if it is used.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/153729
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9c9db24738161aef465489b320e6f54a94b4cac7
Submitter: Jenkins
Branch: master

commit 9c9db24738161aef465489b320e6f54a94b4cac7
Author: Swaminathan Vasudevan <email address hidden>
Date: Fri Feb 6 15:07:40 2015 -0800

    Remove RPC dependency to create FIP agent gw port

    The Floatingip Agent Gateway port was initially
    created when the agent requests one through an
    RPC exchange.

    We are seeing more failures in this area of the
    code where there is delay in getting the agent
    gateway port from the plugin through RPC.

    Change-Id: Ieaa79c8bf2b1e03bc352f9252ce22286703e3715
    Related-bug: #1415522

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/153735
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=707890ef22c203389f61ddfe8025f1b0e2afe819
Submitter: Jenkins
Branch: master

commit 707890ef22c203389f61ddfe8025f1b0e2afe819
Author: Swaminathan Vasudevan <email address hidden>
Date: Fri Feb 6 15:59:06 2015 -0800

    Get rid of rpc to fetch fip agent port on agent.

    This patch is dependent on the plugin side patch
    Change-Id: Ieaa79c8bf2b1e03bc352f9252ce22286703e3715
    for retrieving the fip agent port from the
    router_update message.

    This would reduce the wait time substantially.

    Change-Id: I47bc43bab4bff59d14e2cdbce9f8b47826d392d9
    Related-Bug: #1415522

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Michael Smith (<email address hidden>) on branch: master
Review: https://review.openstack.org/150998
Reason: patch wasn't helpful for debug

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/159230

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/159317

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/159332

Changed in neutron:
status: New → Confirmed
milestone: none → kilo-3
importance: Undecided → Medium
Kyle Mestery (mestery)
Changed in neutron:
milestone: kilo-3 → kilo-rc1
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This was more of a sentinel bug. I'd close if for now as DVR seems to track centralized routing pretty well.

http://goo.gl/jF5zBP

Changed in neutron:
assignee: nobody → Swaminathan Vasudevan (swaminathan-vasudevan)
status: Confirmed → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-rc1 → 2015.1.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Kyle Mestery (<email address hidden>) on branch: master
Review: https://review.openstack.org/159317
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Kyle Mestery (<email address hidden>) on branch: master
Review: https://review.openstack.org/153422
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Kyle Mestery (<email address hidden>) on branch: master
Review: https://review.openstack.org/151758
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.