Tempest NetworkBasicOps tests cannot ping from within instances

Bug #1425255 reported by Nolan Brubaker
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Undecided
Unassigned
Icehouse
Fix Released
Undecided
Unassigned
Juno
Fix Released
Low
Kevin Carter
Trunk
Won't Fix
Wishlist
Evan Callicoat

Bug Description

The_check_server_connectivity function is failing to reach IP addresses from within an instance. Attaching log for full test output.

The invocation was as follows:

root@aio1_utility_container-419a9703:/opt/tempest_3# cat test_list2
tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_network_basic_ops[compute,gate,network,smoke]
root@aio1_utility_container-419a9703:/opt/tempest_3# ./run_tempest.sh --no-virtual-env -s -- --load-list=test_list2

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :
description: updated
Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

This has 2 causes:

1 - the VMs can't reach the gateway on AIOs; we can fix this by adding the appropriate address to the br-vlan. This will be addressed in a separate bug.
2 - Packet checksums are not present on packets sent from containers on the AIO, since there is no physical NIC. The utility container will need to have some iptables rules set to mangle the checksums, but I'm not sure if we want to limit this to just tempest runs, or on all utility containers.

Revision history for this message
Evan Callicoat (diopter) wrote :

The first cause mentioned above is being dealt with here: https://bugs.launchpad.net/openstack-ansible/+bug/1425717

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-ansible-deployment (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/159276

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to os-ansible-deployment (master)

Reviewed: https://review.openstack.org/159276
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=0737a6062c2e1c96706bd0289c652dc3802802c2
Submitter: Jenkins
Branch: master

commit 0737a6062c2e1c96706bd0289c652dc3802802c2
Author: Evan Callicoat <email address hidden>
Date: Wed Feb 25 16:25:24 2015 -0600

    Add flat network gateway to br-vlan for AIO builds

    When an AIO is built using the included scripts and networking config,
    the public network has a gateway IP of 172.29.248.1 from the
    172.29.248.0/22 network, which is not expected or configured to exist
    anywhere. This can cause issues when using floats or in general if
    some communication is desired which uses the public side gateway of
    the neutron-routed network.

    A simple solution is to simply drop 172.29.248.1/22 on br-vlan via the
    interface config, which allows the traffic to do the needful, at least
    as far as tempest's requirements are concerned. In modern
    Debian/Ubuntu, this can be accomplished with another "iface" stanza
    with its own "address" directive to add the additional address.

    Change-Id: I79897bc4e4d7eb7d55ad3c12f55a339dfef869e1
    Closes-Bug: #1425717
    Related-Bug: #1425255

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-ansible-deployment (juno)

Related fix proposed to branch: juno
Review: https://review.openstack.org/161777

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-ansible-deployment (icehouse)

Related fix proposed to branch: icehouse
Review: https://review.openstack.org/161779

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: icehouse
Review: https://review.openstack.org/161824

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-ansible-deployment (juno)

Related fix proposed to branch: juno
Review: https://review.openstack.org/161827

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on os-ansible-deployment (icehouse)

Change abandoned by Nolan Brubaker (<email address hidden>) on branch: icehouse
Review: https://review.openstack.org/161779
Reason: Abandoned in favor of a new review that used git-review -X

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on os-ansible-deployment (juno)

Change abandoned by Nolan Brubaker (<email address hidden>) on branch: juno
Review: https://review.openstack.org/161777
Reason: Abandoned in favor of a new review that uses git-review -X

Changed in openstack-ansible:
assignee: nobody → Nolan Brubaker (nolan-brubaker)
status: New → In Progress
importance: Undecided → Medium
milestone: none → 11.0.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-ansible-deployment (master)

Fix proposed to branch: master
Review: https://review.openstack.org/161893

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to os-ansible-deployment (juno)

Reviewed: https://review.openstack.org/161827
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=745aa4bdfb6a5c3fe5c2a13f647f95fee4ba79d4
Submitter: Jenkins
Branch: juno

commit 745aa4bdfb6a5c3fe5c2a13f647f95fee4ba79d4
Author: Evan Callicoat <email address hidden>
Date: Wed Feb 25 16:25:24 2015 -0600

    Add flat network gateway to br-vlan for AIO builds

    When an AIO is built using the included scripts and networking config,
    the public network has a gateway IP of 172.29.248.1 from the
    172.29.248.0/22 network, which is not expected or configured to exist
    anywhere. This can cause issues when using floats or in general if
    some communication is desired which uses the public side gateway of
    the neutron-routed network.

    A simple solution is to simply drop 172.29.248.1/22 on br-vlan via the
    interface config, which allows the traffic to do the needful, at least
    as far as tempest's requirements are concerned. In modern
    Debian/Ubuntu, this can be accomplished with another "iface" stanza
    with its own "address" directive to add the additional address.

    Conflicts:
     etc/network/interfaces.d/aio_interfaces.cfg

    Change-Id: I79897bc4e4d7eb7d55ad3c12f55a339dfef869e1
    Closes-Bug: #1425717
    Related-Bug: #1425255
    (cherry picked from commit 0737a6062c2e1c96706bd0289c652dc3802802c2)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to os-ansible-deployment (icehouse)

Reviewed: https://review.openstack.org/161824
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=7fe90a849ee10d1254ad11eb875bf9b1cfe21c6f
Submitter: Jenkins
Branch: icehouse

commit 7fe90a849ee10d1254ad11eb875bf9b1cfe21c6f
Author: Evan Callicoat <email address hidden>
Date: Wed Feb 25 16:25:24 2015 -0600

    Add flat network gateway to br-vlan for AIO builds

    When an AIO is built using the included scripts and networking config,
    the public network has a gateway IP of 172.29.248.1 from the
    172.29.248.0/22 network, which is not expected or configured to exist
    anywhere. This can cause issues when using floats or in general if
    some communication is desired which uses the public side gateway of
    the neutron-routed network.

    A simple solution is to simply drop 172.29.248.1/22 on br-vlan via the
    interface config, which allows the traffic to do the needful, at least
    as far as tempest's requirements are concerned. In modern
    Debian/Ubuntu, this can be accomplished with another "iface" stanza
    with its own "address" directive to add the additional address.

    Conflicts:
     etc/network/interfaces.d/aio_interfaces.cfg

    Change-Id: I79897bc4e4d7eb7d55ad3c12f55a339dfef869e1
    Closes-Bug: #1425717
    Related-Bug: #1425255
    (cherry picked from commit 0737a6062c2e1c96706bd0289c652dc3802802c2)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-ansible-deployment (master)

Fix proposed to branch: master
Review: https://review.openstack.org/163544

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (master)

Reviewed: https://review.openstack.org/163544
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=cb5736e711d83336944487703b743a6156acf99c
Submitter: Jenkins
Branch: master

commit cb5736e711d83336944487703b743a6156acf99c
Author: Nolan Brubaker <email address hidden>
Date: Wed Mar 11 13:43:47 2015 -0400

    Enable network basic ops tests

    Change-Id: Icea46b5d10cba1b9a2fe78817f4000767e4fe2d4
    Closes-Bug: #1430937
    Closes-Bug: #1425255

Changed in openstack-ansible:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on os-ansible-deployment (master)

Change abandoned by Nolan Brubaker (<email address hidden>) on branch: master
Review: https://review.openstack.org/161893
Reason: I'm abandoning this change because it actually isn't necessary for getting the network tests to pass.

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :
Changed in openstack-ansible:
status: Fix Committed → In Progress
Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

After investigation, we've isolated this to a problem with the rpc_workers variable in the neutron config. It is currently set to 2, yet in neutron master it is labeled experimental and advised to be 0.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-ansible-deployment (master)

Fix proposed to branch: master
Review: https://review.openstack.org/166095

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

I have changed the status of this issue to Critical since it is now causing transient failures in gating jobs.

Changed in openstack-ansible:
importance: Medium → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (master)

Reviewed: https://review.openstack.org/166095
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=caea2022232703f2f9f1501a16860baf1239d18b
Submitter: Jenkins
Branch: master

commit caea2022232703f2f9f1501a16860baf1239d18b
Author: Nolan Brubaker <email address hidden>
Date: Thu Mar 19 23:24:32 2015 -0500

    Set the neutron default workers

    As stated in neutron's default conf file:
      # This feature is experimental until issues are addressed and testing
      # has been
      # enabled for various plugins for compatibility.

    This change has been shown to be reliable in manual testing of the gate
    jobs. These jobs had been seeing transient failures possibly leading
    back to this value. The changes here have been set within the
    `bootstrap-aio.sh` script such that gating is using a consistent
    environment even if these options change their values in the
    future.

    Thanks to Evan Callicoat for doing the work to isolate the failures.

    Change-Id: I98116eef94a7240addfdf449d116ec1c24260c59
    Co-Authored-By: Evan Callicoat <email address hidden>
    Closes-Bug: #1425255

Changed in openstack-ansible:
status: In Progress → Fix Committed
Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

I've assigned this to Evan Callicoat, as he's been doing more extensive investigation into the problem.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-ansible-deployment (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/171247

Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

This behavior is not being seen in upstream Neutron as they are not gating on LinuxBridge; their full Tempest runs only use Open vSwitch.

This appears to be related to be related to the kernel's async operation when modifying bridges, and changes are happening outside of Neutron's knowledge.

The bug is almost certainly upstream, in Neutron, Neutron's LinuxBridge Agent, or the kernel. We've not identified exactly where as of yet.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to os-ansible-deployment (master)

Reviewed: https://review.openstack.org/171247
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=019cf56b8322fcb8e9624c4485d122e77037f657
Submitter: Jenkins
Branch: master

commit 019cf56b8322fcb8e9624c4485d122e77037f657
Author: Jesse Pretorius <email address hidden>
Date: Tue Apr 7 16:17:40 2015 +0100

    Remove Tempest NetworkBasicOps from commit tests

    The NetworkBasicOps tests are failing most of the time, causing severe gate
    check blockage and therefore hindering development progress.

    The tests are being removed from the commit check in order to unblock the
    gate, allowing development to continue while the root cause of this issue is
    determined.

    Change-Id: Iec35a592d04f1e2345207a0f8754c33ab1d4830e
    Related-Bug: #1425255

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on os-ansible-deployment (master)

Change abandoned by Kevin Carter (<email address hidden>) on branch: master
Review: https://review.openstack.org/161893
Reason: This PR has been around for a while with not a lot of traction. At this point I'm abandoning it however feel free to re-open if needed.

Revision history for this message
Kevin Carter (kevin-carter) wrote :

The basic network ops tests have been removed because they're unstable. This can be re-opened at a later date if we so chose however this is nothing that we're going to fix now.

Changed in openstack-ansible:
status: In Progress → Won't Fix
importance: Critical → Wishlist
milestone: 11.0.0 → next
milestone: next → none
no longer affects: openstack-ansible
Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

Removed Icehouse, Juno, and Kilo series since re-enabling this will likely not include backporting. Kilo's possible, but again, not likely.

no longer affects: openstack-ansible/icehouse
no longer affects: openstack-ansible/juno
no longer affects: openstack-ansible/kilo
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-ansible-deployment (juno)

Fix proposed to branch: juno
Review: https://review.openstack.org/179973

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-ansible-deployment (icehouse)

Fix proposed to branch: icehouse
Review: https://review.openstack.org/180020

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (juno)

Reviewed: https://review.openstack.org/179973
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=03f24fdb1b5c311f14f3241d4e174ff8ec53908f
Submitter: Jenkins
Branch: juno

commit 03f24fdb1b5c311f14f3241d4e174ff8ec53908f
Author: Kevin Carter <email address hidden>
Date: Mon May 4 18:58:47 2015 -0500

    Set the neutron default workers
    As stated in neutron's default conf file:
      # This feature is experimental until issues are addressed and testing
      # has been
      # enabled for various plugins for compatibility.

    This change has been shown to be reliable in manual testing of the gate
    jobs. These jobs had been seeing transient failures possibly leading
    back to this value. The changes here have been set within the
    `bootstrap-aio.sh` script such that gating is using a consistent
    environment even if these options change their values in the
    future.

    Thanks to Evan Callicoat for doing the work to isolate the failures.

    Change-Id: I98116eef94a7240addfdf449d116ec1c24260c59
    Co-Authored-By: Evan Callicoat <email address hidden>
    Closes-Bug: #1425255
    (cherry picked from commit caea2022232703f2f9f1501a16860baf1239d18b)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (icehouse)

Reviewed: https://review.openstack.org/180020
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=865fe4f89e770554b18e0cfff07354077eed7804
Submitter: Jenkins
Branch: icehouse

commit 865fe4f89e770554b18e0cfff07354077eed7804
Author: Kevin Carter <email address hidden>
Date: Mon May 4 18:58:47 2015 -0500

    Set the neutron default workers
    As stated in neutron's default conf file:
      # This feature is experimental until issues are addressed and testing
      # has been
      # enabled for various plugins for compatibility.

    This change has been shown to be reliable in manual testing of the gate
    jobs. These jobs had been seeing transient failures possibly leading
    back to this value. The changes here have been set within the
    `bootstrap-aio.sh` script such that gating is using a consistent
    environment even if these options change their values in the
    future.

    Thanks to Evan Callicoat for doing the work to isolate the failures.

    Change-Id: I98116eef94a7240addfdf449d116ec1c24260c59
    Co-Authored-By: Evan Callicoat <email address hidden>
    Closes-Bug: #1425255
    (cherry picked from commit caea2022232703f2f9f1501a16860baf1239d18b)

Changed in openstack-ansible:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.