[OVS] Restart of ovs agent leads to errorneous delete of flows on tunnel bridge

Bug #1513530 reported by Kristina Berezovskaia
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Oleg Bondarev
7.0.x
Won't Fix
High
Eugene Nikanorov
8.0.x
Fix Released
High
Oleg Bondarev
9.x
Invalid
High
MOS Neutron

Bug Description

On compute node was deleted flow between br-mesh and bt-tun and connection to vm and drom it was lost

Steps on env:
1) create 2 private nets, router between them
2) Boot vm1 and vm2 on different computes in these nets
3) restart ovs on all computes and controllers (connection between vm is still work)
4) destroy and start primary controller
5) reset compute nodes
After that connection to vms in one net was lost. Via horisn we can see that vm doesn't have ip
The problem was resolved after restarting ovs and run on vm command 'cirros

1) flows on compute
ovs-ofctl dump-flows br-tun
NXST_FLOW reply (xid=0x4):
 cookie=0x840466a3bdec53f0, duration=5239.632s, table=0, n_packets=363, n_bytes=15838, idle_age=0, priority=0 actions=drop
 cookie=0x840466a3bdec53f0, duration=5238.621s, table=0, n_packets=356, n_bytes=107065, idle_age=9, priority=1,in_port=1 actions=resubmit(,1)
 cookie=0x840466a3bdec53f0, duration=5238.489s, table=1, n_packets=356, n_bytes=107065, idle_age=9, priority=0 actions=resubmit(,2)
 cookie=0x840466a3bdec53f0, duration=5239.545s, table=2, n_packets=0, n_bytes=0, idle_age=5239, priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
 cookie=0x840466a3bdec53f0, duration=5239.468s, table=2, n_packets=356, n_bytes=107065, idle_age=9, priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22)
 cookie=0x840466a3bdec53f0, duration=5239.396s, table=3, n_packets=0, n_bytes=0, idle_age=5239, priority=0 actions=drop
 cookie=0x840466a3bdec53f0, duration=5239.312s, table=4, n_packets=0, n_bytes=0, idle_age=5239, priority=0 actions=drop
 cookie=0x840466a3bdec53f0, duration=4956.274s, table=4, n_packets=0, n_bytes=0, idle_age=4956, priority=1,tun_id=0x48 actions=mod_vlan_vid:1,resubmit(,9)
 cookie=0x840466a3bdec53f0, duration=5238.555s, table=9, n_packets=0, n_bytes=0, idle_age=5238, priority=0 actions=resubmit(,10)
 cookie=0x840466a3bdec53f0, duration=5238.166s, table=9, n_packets=0, n_bytes=0, idle_age=5238, priority=1,dl_src=fa:16:3f:6f:bb:98 actions=output:1
 cookie=0x840466a3bdec53f0, duration=5238.321s, table=9, n_packets=0, n_bytes=0, idle_age=5238, priority=1,dl_src=fa:16:3f:61:64:fb actions=output:1
 cookie=0x840466a3bdec53f0, duration=5237.862s, table=9, n_packets=0, n_bytes=0, idle_age=5237, priority=1,dl_src=fa:16:3f:c7:50:e2 actions=output:1
 cookie=0x840466a3bdec53f0, duration=5238.005s, table=9, n_packets=0, n_bytes=0, idle_age=5238, priority=1,dl_src=fa:16:3f:ae:5e:99 actions=output:1
 cookie=0x840466a3bdec53f0, duration=5239.219s, table=10, n_packets=0, n_bytes=0, idle_age=5239, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,cookie=0x840466a3bdec53f0,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1
 cookie=0x840466a3bdec53f0, duration=5239.126s, table=20, n_packets=0, n_bytes=0, idle_age=5239, priority=0 actions=resubmit(,22)
 cookie=0x840466a3bdec53f0, duration=5239.046s, table=22, n_packets=356, n_bytes=107065, idle_age=9, priority=0 actions=drop

2) flows after restart ovs on compute:
ovs-ofctl dump-flows br-tun
NXST_FLOW reply (xid=0x4):
 cookie=0xbfb19561333f23f2, duration=2664.349s, table=0, n_packets=10, n_bytes=420, idle_age=2654, priority=0 actions=drop
 cookie=0xbfb19561333f23f2, duration=2654.047s, table=0, n_packets=2809, n_bytes=133911, idle_age=0, priority=1,in_port=3 actions=resubmit(,4)
 cookie=0xbfb19561333f23f2, duration=2663.401s, table=0, n_packets=154, n_bytes=26381, idle_age=7, priority=1,in_port=1 actions=resubmit(,1)
 cookie=0xbfb19561333f23f2, duration=2653.576s, table=0, n_packets=72, n_bytes=13836, idle_age=7, priority=1,in_port=4 actions=resubmit(,4)
 cookie=0xbfb19561333f23f2, duration=2663.237s, table=1, n_packets=154, n_bytes=26381, idle_age=7, priority=0 actions=resubmit(,2)
 cookie=0xbfb19561333f23f2, duration=2664.264s, table=2, n_packets=128, n_bytes=21319, idle_age=7, priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
 cookie=0xbfb19561333f23f2, duration=2664.181s, table=2, n_packets=26, n_bytes=5062, idle_age=347, priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22)
 cookie=0xbfb19561333f23f2, duration=2664.081s, table=3, n_packets=0, n_bytes=0, idle_age=8298, priority=0 actions=drop
 cookie=0xbfb19561333f23f2, duration=2664.008s, table=4, n_packets=0, n_bytes=0, idle_age=8298, priority=0 actions=drop
 cookie=0xbfb19561333f23f2, duration=2661.121s, table=4, n_packets=2881, n_bytes=147747, idle_age=0, priority=1,tun_id=0x48 actions=mod_vlan_vid:1,resubmit(,9)
 cookie=0xbfb19561333f23f2, duration=2663.327s, table=9, n_packets=2881, n_bytes=147747, idle_age=0, priority=0 actions=resubmit(,10)
 cookie=0xbfb19561333f23f2, duration=2662.732s, table=9, n_packets=0, n_bytes=0, idle_age=8297, priority=1,dl_src=fa:16:3f:6f:bb:98 actions=output:1
 cookie=0xbfb19561333f23f2, duration=2662.961s, table=9, n_packets=0, n_bytes=0, idle_age=8297, priority=1,dl_src=fa:16:3f:61:64:fb actions=output:1
 cookie=0xbfb19561333f23f2, duration=2662.331s, table=9, n_packets=0, n_bytes=0, idle_age=8297, priority=1,dl_src=fa:16:3f:c7:50:e2 actions=output:1
 cookie=0xbfb19561333f23f2, duration=2662.526s, table=9, n_packets=0, n_bytes=0, idle_age=8297, priority=1,dl_src=fa:16:3f:ae:5e:99 actions=output:1
 cookie=0xbfb19561333f23f2, duration=2663.939s, table=10, n_packets=2881, n_bytes=147747, idle_age=0, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,cookie=0xbfb19561333f23f2,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1
 cookie=0xbfb19561333f23f2, duration=2663.862s, table=20, n_packets=0, n_bytes=0, idle_age=8298, priority=0 actions=resubmit(,22)
 cookie=0xbfb19561333f23f2, duration=2639.067s, table=20, n_packets=0, n_bytes=0, hard_timeout=300, idle_age=2639, hard_age=7, priority=1,vlan_tci=0x0001/0x0fff,dl_dst=fa:16:3e:db:64:43 actions=load:0->NXM_OF_VLAN_TCI[],load:0x48->NXM_NX_TUN_ID[],output:4
 cookie=0xbfb19561333f23f2, duration=2653.792s, table=20, n_packets=0, n_bytes=0, hard_timeout=300, idle_age=2653, hard_age=0, priority=1,vlan_tci=0x0001/0x0fff,dl_dst=fa:16:3e:c1:ef:01 actions=load:0->NXM_OF_VLAN_TCI[],load:0x48->NXM_NX_TUN_ID[],output:3
 cookie=0xbfb19561333f23f2, duration=2653.341s, table=20, n_packets=79, n_bytes=12772, idle_age=42, priority=2,dl_vlan=1,dl_dst=fa:16:3e:c1:ef:01 actions=strip_vlan,set_tunnel:0x48,output:3
 cookie=0xbfb19561333f23f2, duration=2652.790s, table=20, n_packets=49, n_bytes=8547, idle_age=7, priority=2,dl_vlan=1,dl_dst=fa:16:3e:db:64:43 actions=strip_vlan,set_tunnel:0x48,output:4
 cookie=0xbfb19561333f23f2, duration=2652.945s, table=20, n_packets=0, n_bytes=0, idle_age=2653, priority=2,dl_vlan=1,dl_dst=fa:16:3e:bb:d5:86 actions=strip_vlan,set_tunnel:0x48,output:4
 cookie=0xbfb19561333f23f2, duration=2663.789s, table=22, n_packets=1, n_bytes=322, idle_age=2662, priority=0 actions=drop
 cookie=0xbfb19561333f23f2, duration=2653.952s, table=22, n_packets=25, n_bytes=4740, idle_age=347, dl_vlan=1 actions=strip_vlan,set_tunnel:0x48,output:3,output:4

on iso:
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "301"
  build_id: "301"
  nailgun_sha: "4162b0c15adb425b37608c787944d1983f543aa8"
  python-fuelclient_sha: "486bde57cda1badb68f915f66c61b544108606f3"
  fuel-agent_sha: "50e90af6e3d560e9085ff71d2950cfbcca91af67"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "6c5b73f93e24cc781c809db9159927655ced5012"
  fuel-library_sha: "5d50055aeca1dd0dc53b43825dc4c8f7780be9dd"
  fuel-ostf_sha: "2cd967dccd66cfc3a0abd6af9f31e5b4d150a11c"
  fuelmain_sha: "a65d453215edb0284a2e4761be7a156bb5627677"

with updates
vxlan+dvr+l2pop

Revision history for this message
Kristina Berezovskaia (kkuznetsova) wrote :
summary: - [OVS] Deleting ovs flow from compute
+ [OVS] Restart of ovs agent leads to errorneous delete of flows on tunnel
+ bridge
Revision history for this message
Kristina Berezovskaia (kkuznetsova) wrote :
Changed in mos:
importance: Undecided → High
Revision history for this message
Alexander Ignatov (aignatov) wrote :

Need one more repo in 8.0

Revision history for this message
Kristina Berezovskaia (kkuznetsova) wrote :

See the similar situation. Connection to vm was lost.
Steps:
1) Create net1, subnet
2) Create net2, subnet
3) Create DVR router, set gateway and add interface to both nets
4) Boot vms in net1 and net2 on different compute nodes
4) Destroy controller
5) Wait some time
6) Start controller
7) Wait some time
8) Reset both compute

Current result: connection to vm was lost.
After creating new vms on both compute, we couldn't go to these vms by ssh.
After restarting ovs, connection appeared again to all vms

Logs in attachment

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "361"
  build_id: "361"
  fuel-nailgun_sha: "53c72a9600158bea873eec2af1322a716e079ea0"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "7463551bc74841d1049869aaee777634fb0e5149"
  fuel-nailgun-agent_sha: "92ebd5ade6fab60897761bfa084aefc320bff246"
  astute_sha: "c7ca63a49216744e0bfdfff5cb527556aad2e2a5"
  fuel-library_sha: "ba8063d34ff6419bddf2a82b1de1f37108d96082"
  fuel-ostf_sha: "889ddb0f1a4fa5f839fd4ea0c0017a3c181aa0c1"
  fuel-mirror_sha: "8adb10618bb72bb36bb018386d329b494b036573"
  fuelmenu_sha: "824f6d3ebdc10daf2f7195c82a8ca66da5abee99"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "9f0ba4577915ce1e77f5dc9c639a5ef66ca45896"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "07d5f1c3e1b352cb713852a3a96022ddb8fe2676"
(neutron+dvr+vlan, neutron+dvt+vxlan, 3 controllers, 2 compute)

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

MOS Neutron, is this really a Medium bug? Or we should raise the importance to High and fix it in 8.0?

tags: added: area-neutron
removed: neutron
Revision history for this message
Kristina Berezovskaia (kkuznetsova) wrote :
Revision history for this message
Oleg Bondarev (obondarev) wrote :

Raising to high because of recent occurences.

Revision history for this message
Oleg Bondarev (obondarev) wrote :

One of possible root causes was filed upstream: https://bugs.launchpad.net/neutron/+bug/1536110

Revision history for this message
Oleg Bondarev (obondarev) wrote :

https://review.openstack.org/#/c/271755/ is a backport for liberty. We're going to close the bug once this is merged and synced to MOS 8.0

tags: added: hit-hcf
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/neutron (openstack-ci/fuel-8.0/liberty)

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/16615

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Marking as invalid for 9.0 as fix is in Mitaka already

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/neutron (openstack-ci/fuel-8.0/liberty)

Reviewed: https://review.fuel-infra.org/16615
Submitter: Pkgs Jenkins <email address hidden>
Branch: openstack-ci/fuel-8.0/liberty

Commit: 582e6c5b793499af6559e715bf298558b5d6a80d
Author: Oleg Bondarev <email address hidden>
Date: Tue Feb 2 12:02:15 2016

OVS agent should fail if it can't get DVR mac address

Currently agent will fall back to non-dvr mode in case it can't.
However neutron server does not check dvr mode of ovs agents when
scheduling routers. So in a DVR enabled cluster all ovs agents
should run in DVR mode. Otherwise it will lead to undefined
behavior which is hard to debug.

Closes-Bug: #1513530
Closes-Bug: #1536110
Change-Id: I6c31aabf1852c688e9c27fc1859d3fdd830caa68

tags: added: on-verification
Revision history for this message
Kristina Berezovskaia (kkuznetsova) wrote :

Verify on
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "570"
  build_id: "570"
  fuel-nailgun_sha: "558ca91a854cf29e395940c232911ffb851899c1"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "658be72c4b42d3e1436b86ac4567ab914bfb451b"
  fuel-nailgun-agent_sha: "b2bb466fd5bd92da614cdbd819d6999c510ebfb1"
  astute_sha: "b81577a5b7857c4be8748492bae1dec2fa89b446"
  fuel-library_sha: "c2a335b5b725f1b994f78d4c78723d29fa44685a"
  fuel-ostf_sha: "3bc76a63a9e7d195ff34eadc29552f4235fa6c52"
  fuel-mirror_sha: "fb45b80d7bee5899d931f926e5c9512e2b442749"
  fuelmenu_sha: "78ffc73065a9674b707c081d128cb7eea611474f"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "a43cf96cd9532f10794dce736350bf5bed350e9d"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "d605bcbabf315382d56d0ce8143458be67c53434"
(vxlan+l2+dvr, vlan+dvr)

A lot of destroying controllers, resetting computes and different combination of this steps were done, bug wasn't reproduced

tags: removed: on-verification
tags: added: 8.0 release-notes-done
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

We no longer support MOS5.1, MOS6.0, MOS6.1
We deliver only Critical/Security fixes to MOS7.0, MOS8.0.
We deliver only High/Critical/Security fixes to MOS9.2.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.