Wrong ovs flow after destroying controller

Bug #1536970 reported by Kristina Berezovskaia
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Won't Fix
Undecided
Oleg Bondarev

Bug Description

After destroying controller connectivity to external net was lost

Steps:
1) Create net1, subnet1
2) Create distributed router, add gateway and interface to the subnet1
3) Boot vm1 in net1
4) Find controller with snat
5) Destroy controller with snat
5) Wait some time
6) Check ping 8.8.8.8 from vm
Expected result: ping 8.8.8.8 is available
Current result: ping isn't available

This issue was seen only once

find on:
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "429"
  build_id: "429"
  fuel-nailgun_sha: "12b15b2351e250af41cc0b10d63a50c198fe77d8"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "df16d41cd7a9445cf82ad9fd8f0d53824711fcd8"
  fuel-nailgun-agent_sha: "92ebd5ade6fab60897761bfa084aefc320bff246"
  astute_sha: "c7ca63a49216744e0bfdfff5cb527556aad2e2a5"
  fuel-library_sha: "3eaf4f4a9b88b287a10cc19e9ce6a62298cc4013"
  fuel-ostf_sha: "214e794835acc7aa0c1c5de936e93696a90bb57a"
  fuel-mirror_sha: "b62f3cce5321fd570c6589bc2684eab994c3f3f2"
  fuelmenu_sha: "85de57080a18fda18e5325f06eaf654b1b931592"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "9f0ba4577915ce1e77f5dc9c639a5ef66ca45896"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "e8e36cff332644576d7853c80b8a53d5b955420a"

Tags: neutron
Revision history for this message
Oleg Bondarev (obondarev) wrote :

We found that some ovs flows on br-int had wrong output port: flows for SNAT ports in particular. Further investigation (logs inspection) showed that this happened due to rpc connectivity problems:
 - router was constantly rescheduled from one blinking agent to another
 - this led to snat port being plugged and unplugged
 - one of such unplugs went unnoticed for the agent - there are some issues in ovs agent resync mechanism which made this happen
 - when snat port was plugged back agent had in in cache and did not update ovs flows for the port
 - ovs flows had outdated output ports

Revision history for this message
Oleg Bondarev (obondarev) wrote :

There is a patch in Mitaka which improves ovs resync mechanism and fixes the problem on neutron side: https://review.openstack.org/245105/.

However it's quite big so I'd not backport it to 8.0 so late in the cycle. This is a corner case and there is a simple workaround - restart ovs agent on the node hosting SNAT for the router. Marking as won't fix for 8.0. In 9.0 the fix will come with upstream code.

Changed in mos:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.