DVR and HA migration tests failing intermittently for gate-tempest-dsvm-neutron-dvr-multinode-scenario-ubuntu-xenial-nv job

Bug #1714802 reported by venkata anil
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
venkata anil

Bug Description

For the migration test failures Jakub has already created this etherpad https://etherpad.openstack.org/p/neutron-dvr-multinode-scenario-gate-failures

My analysis is this -
DVR and HA migration tempest scenario tests are failing(or passing) intermittently. In the existing tests, immediately after the port update API is returned we are trying ssh connectivity, without checking the dependent resources (like below) created or updated properly.
1) new interfaces are created
2) existing interfaces updated
3) interfaces bound to agents
4) interfaces status updated
5) agents creates namespaces etc

For example, during DVR to HA migration, as soon as the router update api is returned, ssh test might try to use old data plane created with DVR router, as agents might have not synced(removed namespaces, ovs flows and ip routes) with server. If the ssh reply packets arrived back before the old data plane is removed, then ssh can be succesful. If this data path is reconstructed(because of the migration) before the packet arrived, then ssh can fail. Though ssh can retry, it may use existing conection track and try to follow the same old data path(just my assumption)

When I updated tests to check for the dependent resources before trying for ssh, tests are passing reliably. So we can have these checks before we try for ssh connectivity.

Changed in neutron:
assignee: nobody → venkata anil (anil-venkata)
tags: added: l3-dvr-backlog l3-ha tempest
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/500384

Changed in neutron:
status: New → In Progress
Changed in neutron:
assignee: venkata anil (anil-venkata) → Brian Haley (brian-haley)
Changed in neutron:
assignee: Brian Haley (brian-haley) → venkata anil (anil-venkata)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/510090

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/500384
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f5718972257cf229c8a9db0a5fc4349acbaade12
Submitter: Jenkins
Branch: master

commit f5718972257cf229c8a9db0a5fc4349acbaade12
Author: venkata anil <email address hidden>
Date: Tue Sep 19 07:41:19 2017 +0000

    tempest: check router interface exists before ssh

    As explained in the bug, tempest DVR and HA migration scenario
    tests are failing intermittently, as we are not checking if the
    new router interfaces are ready after migration and might try
    to use the old dataplane if the pre-migration router resources
    (like interfaces, namespaces, etc) still exist and are not yet
    destroyed.

    We need to check that the pre-migration router interfaces are
    deleted and the new interfaces are created and active (as we
    can't check namespace existence on other nodes, we rely on port
    status set by L2 agent after wiring the port) before
    attempting ssh connectivity.

    Closes-Bug: 1714802
    Change-Id: I2a933d4cdd6de4e5ff31c8e3f97477819ba27afa

Changed in neutron:
status: In Progress → Fix Released
tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by venkata anil (<email address hidden>) on branch: master
Review: https://review.openstack.org/510090
Reason: Yes. Thanks Ihar.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.0.0b1

This issue was fixed in the openstack/neutron 12.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/526638

tags: removed: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/526638
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=10c57888c2686dd14eef3917262f35df958f22b3
Submitter: Zuul
Branch: stable/pike

commit 10c57888c2686dd14eef3917262f35df958f22b3
Author: venkata anil <email address hidden>
Date: Tue Sep 19 07:41:19 2017 +0000

    tempest: check router interface exists before ssh

    As explained in the bug, tempest DVR and HA migration scenario
    tests are failing intermittently, as we are not checking if the
    new router interfaces are ready after migration and might try
    to use the old dataplane if the pre-migration router resources
    (like interfaces, namespaces, etc) still exist and are not yet
    destroyed.

    We need to check that the pre-migration router interfaces are
    deleted and the new interfaces are created and active (as we
    can't check namespace existence on other nodes, we rely on port
    status set by L2 agent after wiring the port) before
    attempting ssh connectivity.

    Conflicts:
     neutron/tests/tempest/api/base.py

    Closes-Bug: 1714802
    Change-Id: I2a933d4cdd6de4e5ff31c8e3f97477819ba27afa
    (cherry picked from commit f5718972257cf229c8a9db0a5fc4349acbaade12)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.3

This issue was fixed in the openstack/neutron 11.0.3 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.