Instance suddenly lost connectivity on tenant net, DVR with centralised snat

Bug #1561509 reported by Fabrizio Soppelsa
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Invalid
High
Kevin Benton
7.0.x
Fix Released
Critical
Sergii Rizvan
8.0.x
Invalid
High
Sergii Rizvan
9.x
Fix Released
High
Kevin Benton

Bug Description

Detailed bug description: On a tenant network, instances lost connectivity to outside. In practice, they could ping the router internal interfaces, but not the external gateway anymore. In the flow, we were unable to trace exactly where the traffic got stuck.
Steps to reproduce: -
Reproducibility: Observed only once, so far
Expected results: VM not to lose connectivity
Actual result: VM unable to reach external networks
Workaround: Restart of ovs-agent with removal of namespaces didn't help
Impact: Instances lose connectivity
Description of the environment:
- Operating system: Ubuntu
- Versions of components: MOS 7.0
- Reference architecture: Neutron DVR with centralised snat
- Network model: VxLAN
- Related projects installed: Sahara, Ceilometer
Additional information: -

Upstream related issue in Neutron project:
 https://bugs.launchpad.net/neutron/+bug/1562467

Changed in mos:
importance: Undecided → High
Roman Rufanov (rrufanov)
tags: added: support
Revision history for this message
Fabrizio Soppelsa (fsoppelsa) wrote :

I have ovs-agent.log and l2-agent.log available upon request (Slack or mail)

Revision history for this message
Fabrizio Soppelsa (fsoppelsa) wrote :
Revision history for this message
Fabrizio Soppelsa (fsoppelsa) wrote :
Revision history for this message
Fabrizio Soppelsa (fsoppelsa) wrote :
description: updated
description: updated
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Setting to High, since the workaround is to restart ovs-agent on affected node.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/neutron (9.0/mitaka)

Reviewed: https://review.fuel-infra.org/19866
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0/mitaka

Commit: ed1ca7dcfcc89fc19384dbbf2174ae0f6795289b
Author: Jenkins <email address hidden>
Date: Wed Apr 20 08:40:40 2016

Merge the tip of origin/stable/mitaka into origin/9.0/mitaka

643b443 Imported Translations from Zanata
1ffea42 Updated from global requirements
b970ed5 Clear DVR MAC on last agent deletion from host
eee9e58 Add an option for WSGI pool size
93795a4 Fix deprecation warning for external_network_bridge
36305c0 Add ALLOCATING state to routers
07fa372 ADDRESS_SCOPE_MARK_IDS should not be global for L3 agent
9c58ae6 Wrap all update/delete l3_rpc handlers with retries
ece192b Use new DB context when checking if agent is online during rescheduling
2e2d75c ovsfw: Load vlan tag from other_config
5853af9 Iptables firewall prevent IP spoofed DHCP requests
9679285 Return oslo_config Opts to config generator
e2676ae DVR: rebind port if ofport changes

Closes-Bug: #1566689
Closes-Bug: #1496723
Closes-Bug: #1523479
Closes-Bug: #1561509

Change-Id: Id18fd3ba2fa15369748828c462e8e888ccecc0de

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/neutron (openstack-ci/fuel-7.0/2015.1.0)

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Kevin Benton <email address hidden>
Review: https://review.fuel-infra.org/20712

Revision history for this message
Sergii Rizvan (srizvan) wrote :

In 8.0 bug should be fixed by merging tip of origin/stable/liberty into origin/openstack-ci/fuel-8.0/liberty.

tags: added: on-verification
Revision history for this message
Kristina Berezovskaia (kkuznetsova) wrote :

Verify on
[root@fuel ~]# shotgun2 short-report
cat /etc/fuel_build_id:
 389
cat /etc/fuel_build_number:
 389
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-release-9.0.0-1.mos6346.noarch
 fuel-bootstrap-cli-9.0.0-1.mos282.noarch
 fuel-migrate-9.0.0-1.mos8378.noarch
 rubygem-astute-9.0.0-1.mos745.noarch
 fuel-misc-9.0.0-1.mos8378.noarch
 network-checker-9.0.0-1.mos72.x86_64
 fuel-mirror-9.0.0-1.mos136.noarch
 fuel-openstack-metadata-9.0.0-1.mos8693.noarch
 fuel-notify-9.0.0-1.mos8378.noarch
 nailgun-mcagents-9.0.0-1.mos745.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8693.noarch
 python-fuelclient-9.0.0-1.mos315.noarch
 fuelmenu-9.0.0-1.mos270.noarch
 fuel-9.0.0-1.mos6346.noarch
 fuel-utils-9.0.0-1.mos8378.noarch
 fuel-setup-9.0.0-1.mos6346.noarch
 fuel-library9.0-9.0.0-1.mos8378.noarch
 shotgun-9.0.0-1.mos88.noarch
 fuel-agent-9.0.0-1.mos282.noarch
 fuel-ui-9.0.0-1.mos2696.noarch
 fuel-ostf-9.0.0-1.mos934.noarch
 python-packetary-9.0.0-1.mos136.noarch
 fuel-nailgun-9.0.0-1.mos8693.noarch
(neutron + vxlan + dvr)

Reproduced on 9.0 iso 134 (2016-03-30)

Steps to reproduce:
1) Create 10 nets, 10 routers, create 10 vms on 1 compute
2) Choose 1 vm and ping 8.8.8.8 from it
3) Restart l3 agent on compute with vms several times ( 23 is enough for reproducing, so for verifing I used 60 times)
4) Ping 8.8.8.8 is available

tags: removed: on-verification
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/neutron (openstack-ci/fuel-7.0/2015.1.0)

Reviewed: https://review.fuel-infra.org/20712
Submitter: Vitaly Sedelnik <email address hidden>
Branch: openstack-ci/fuel-7.0/2015.1.0

Commit: 7992cfe6e8f005d078a5d8cf6773fdbf1a6e2e85
Author: Kevin Benton <email address hidden>
Date: Wed May 18 15:39:17 2016

DVR: rebind port if ofport changes

When binding is called in DVR, check to see if the port was
previously wired under a different ofport. If it was, first
unbind the old port and then bind the new one.

Conflicts:
        neutron/tests/unit/plugins/openvswitch/agent/test_ovs_neutron_agent.py

Change-Id: I372158c4a6986295e396d849a2c9c5372b271e08
Closes-Bug: #1561509
(cherry picked from upstream commit 834ea6f8e6cc2427538213535c1494de6c11e5e4)

tags: added: on-automation
tags: added: on-verification
Revision history for this message
Alexander Gromov (agromov) wrote :

Couldn't reproduce the bug on MOS 7.0 mu4.

The patch was applied according to the changes in upstream.

After applying the patch the problem also couldn't be reproduced.

tags: removed: on-verification
Revision history for this message
Alexander Gromov (agromov) wrote :
tags: added: covered-automated-test
removed: on-automation
Revision history for this message
Sergii Rizvan (srizvan) wrote :

For 8.0 issue was fixed by merging tip of origin/stable/liberty into origin/openstack-ci/fuel-8.0/liberty branch https://review.fuel-infra.org/#/c/21205/

That's why setting status 'Invalid' for 8.0.

Revision history for this message
Alexander Ignatov (aignatov) wrote :

This bug fixed in Newton, no additional steps are required to fix it in MOS 10,0-Newton.

Revision history for this message
Alexander Ignatov (aignatov) wrote :

Moving to Invalid.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.