OVS drops RARP packets by QEMU upon live-migration - VM temporarily disconnected

Bug #1489198 reported by Fabrizio Soppelsa
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Confirmed
High
Oleg Bondarev
6.1.x
Fix Released
High
Alexey Stupnikov
7.0.x
Fix Released
High
Alexey Stupnikov
8.0.x
Fix Released
High
Oleg Bondarev
9.x
Fix Released
High
Oleg Bondarev

Bug Description

Changed in mos:
milestone: none → 6.1-updates
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-7.0/2015.1.0)

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/11051

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-7.0/2015.1.0)

Reviewed: https://review.fuel-infra.org/11051
Submitter: mos-infra-ci <>
Branch: openstack-ci/fuel-7.0/2015.1.0

Commit: 250e71c75caca8407b5c3cefd05f8132b3b36048
Author: Oleg Bondarev <email address hidden>
Date: Wed Sep 2 10:20:55 2015

Live Migration: add sleep on pre live migration at destination

Sleep is needed to let neutron ovs agent handle new ports
and assign proper tags to not block RARP packets sent by qemu
right after migration. Only side effect is more time needed for
live migration. Positive effect however is less packet loss.

Correct fix would be to leverage external event framework
and wait for neutron event about vif plugging. This will requre
changes on neutron side as well.
So it was decided to go with a simple safe fix for now.

Closes-Bug: #1489198
Change-Id: Id3200a675eaba30f086c91d050fb0e0d03b03027

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-6.1/2014.2)

Fix proposed to branch: openstack-ci/fuel-6.1/2014.2
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/11072

Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Per conversation with Timofey Durakov this fix won't be included in 6.1 maintenance update #3 as is. We will be waiting for fix this in neutron.

Roman Rufanov (rrufanov)
tags: added: support
Anna Babich (ababich)
tags: added: on-verification
Revision history for this message
Anna Babich (ababich) wrote :

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "288"
  build_id: "288"
  nailgun_sha: "93477f9b42c5a5e0506248659f40bebc9ac23943"
  python-fuelclient_sha: "1ce8ecd8beb640f2f62f73435f4e18d1469979ac"
  fuel-agent_sha: "082a47bf014002e515001be05f99040437281a2d"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "a717657232721a7fafc67ff5e1c696c9dbeb0b95"
  fuel-library_sha: "121016a09b0e889994118aa3ea42fa67eabb8f25"
  fuel-ostf_sha: "1f08e6e71021179b9881a824d9c999957fcc7045"
  fuelmain_sha: "6b83d6a6a75bf7bca3177fcf63b2eebbf1ad0a85"

Verified on cluster: neutron+vxlan, 3 controllers, 2 computes

Steps to reproduce
1. Create net01: net01__subnet, 192.168.1.0/24, attach it to router04
2. Create vm1 and vm2 in net01 on different computes
3. Go to vm1's console and send pings to vm2
4. Go to vm2's console and send pings to 8.8.8.8
5. Go to compute's console where vm2 will be live-migrated to, and run tcpdump:
tcpdump -i br-mesh > rarp.log
6. Go to controller and initiate live migration for vm2:
nova live-migration --block-migrate vm2 node-2.domain.tld
7. Check that vm2 is hosted on a new compute now and grep rarp.log for Reverse ARP packets:
cat rarp.log | grep 'ARP, Reverse'

The bug was reproduced on ISO #259: about 3 packets were lost during migration and there were no RARP packets in rarp.log
It was rechecked on a current environment and the result is: http://paste.openstack.org/show/467698/

tags: removed: on-verification
Changed in mos:
status: Fix Committed → Fix Released
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Per conversation with Oleg Bondarev I am targeting this to 8.0 and reopening for 7.0-updates. This issue could be fixed better by using neutron-nova notification mechanism rather than sleep.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-8.0/liberty)

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/13287

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/nova (openstack-ci/fuel-6.1/2014.2)

Change abandoned by Oleg Bondarev <email address hidden> on branch: openstack-ci/fuel-6.1/2014.2
Review: https://review.fuel-infra.org/11072
Reason: Not needed for 6.1

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Here are patches in upstream: https://review.openstack.org/#/c/246910/ https://review.openstack.org/#/c/246898/
Going to backport to MOS once merged in upstream.

Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Nice to hear, Oleg! Please backport to 6.1 too, if this is possible.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/neutron (openstack-ci/fuel-8.0/liberty)

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/16355

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-8.0/liberty)

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/16356

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/neutron (openstack-ci/fuel-8.0/liberty)

Reviewed: https://review.fuel-infra.org/16355
Submitter: Pkgs Jenkins <email address hidden>
Branch: openstack-ci/fuel-8.0/liberty

Commit: e1f47946b7e344d65a66f1bf2af3b9b957f595d8
Author: Oleg Bondarev <email address hidden>
Date: Mon Jan 25 09:52:08 2016

Notify nova with network-vif-plugged in case of live migration

 - during live migration on pre migration step nova plugs instance
   vif device on the destination compute node;
 - L2 agent on destination host detects new device and requests device
   info from server;
 - server does not change port status since port is bound to another
   host (source host);
 - L2 agent processes device and sends update_device_up to server;
 - again server does not update status as port is bound to another host;

Nova notifications are sent only in case port status change so in this case
no notifications are sent.
I don't think that server should change port status in this case as actually
port remains active on source host.

The fix is to explicitly notify nova if agent reports device up from a host
other than port's current host.

This is the fix on neutron side, the actual fix of the bug is on nova side:
change-id Ib1cb9c2f6eb2f5ce6280c685ae44a691665b4e98

upstream review: https://review.openstack.org/246898

Closes-Bug: #1489198
Closes-Bug: #1414559
Change-Id: Ifa919a9076a3cc2696688af3feadf8d7fa9e6fc2

Revision history for this message
Alexander Ignatov (aignatov) wrote :

Patches are on the review, an expectation that it will be fixed before HCF.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-8.0/liberty)

Reviewed: https://review.fuel-infra.org/16356
Submitter: Pkgs Jenkins <email address hidden>
Branch: openstack-ci/fuel-8.0/liberty

Commit: 5109fe4d09d6fcac5c47fc12f60bff2998a37106
Author: Oleg Bondarev <email address hidden>
Date: Fri Jan 29 17:12:13 2016

Live migration: wait for vif-plugged event on pre live migration

Just like with spawning new VM, we need to wait for neutron to properly
handle new vif device on pre migration step.

By default nova dispatches event to the instance's host while during
live migration we're waiting on destination host - the solution
is to check if instance is migrating and send event to destination
compute as well.

Depends-On: Ifa919a9076a3cc2696688af3feadf8d7fa9e6fc2

upstream review: https://review.openstack.org/246910

Closes-Bug: #1489198
Closes-Bug: #1414559
Change-Id: Ib1cb9c2f6eb2f5ce6280c685ae44a691665b4e98

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Fix merged ^

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/nova (openstack-ci/fuel-8.0/liberty)

Change abandoned by Roman Podoliaka <email address hidden> on branch: openstack-ci/fuel-8.0/liberty
Review: https://review.fuel-infra.org/13287

Revision history for this message
Kristina Berezovskaia (kkuznetsova) wrote :

Verify on
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "543"
  build_id: "543"
  fuel-nailgun_sha: "baec8643ca624e52b37873f2dbd511c135d236d9"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "658be72c4b42d3e1436b86ac4567ab914bfb451b"
  fuel-nailgun-agent_sha: "b2bb466fd5bd92da614cdbd819d6999c510ebfb1"
  astute_sha: "b81577a5b7857c4be8748492bae1dec2fa89b446"
  fuel-library_sha: "e2d79330d5d708796330fac67722c21f85569b87"
  fuel-ostf_sha: "3bc76a63a9e7d195ff34eadc29552f4235fa6c52"
  fuel-mirror_sha: "fb45b80d7bee5899d931f926e5c9512e2b442749"
  fuelmenu_sha: "78ffc73065a9674b707c081d128cb7eea611474f"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "a43cf96cd9532f10794dce736350bf5bed350e9d"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "87dfb6bc25d4650264f09c338ed77c21a3d6fe87"
(neutron+vxlan)

Repeated steps from verification on 7.0, RARP packets are in rarp.log, ping on vm2 lost onl 1-2 packets

tags: added: 8.0 release-notes-done
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (9.0/mitaka)

Fix proposed to branch: 9.0/mitaka
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/18345

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/neutron (9.0/mitaka)

Fix proposed to branch: 9.0/mitaka
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/18417

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/neutron (stable/mitaka)

Fix proposed to branch: stable/mitaka
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/18666

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/neutron (stable/mitaka)

Change abandoned by Sergey Belous <email address hidden> on branch: stable/mitaka
Review: https://review.fuel-infra.org/18666
Reason: Ops, wrong branch

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change restored on openstack/nova (openstack-ci/fuel-6.1/2014.2)

Change restored by Alexey Stupnikov <email address hidden> on branch: openstack-ci/fuel-6.1/2014.2
Review: https://review.fuel-infra.org/11072
Reason: Patch should work for 6.1

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

To collect dump on MOS 6.1 you should run 'tcpdump -i br-fw-admin > arp.log' command on compute nodes.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-6.1/2014.2)

Reviewed: https://review.fuel-infra.org/11072
Submitter: Vitaly Sedelnik <email address hidden>
Branch: openstack-ci/fuel-6.1/2014.2

Commit: 95fd30d477bec4a5f9ea1ffa21c3cd73ffc01003
Author: Oleg Bondarev <email address hidden>
Date: Mon Apr 25 17:33:05 2016

Live Migration: add sleep on pre live migration at destination

Sleep is needed to let neutron ovs agent handle new ports
and assign proper tags to not block RARP packets sent by qemu
right after migration. Only side effect is more time needed for
live migration. Positive effect however is less packet loss.
Correct fix would be to leverage external event framework
and wait for neutron event about vif plugging. This will requre
changes on neutron side as well.
So it was decided to go with a simple safe fix for now.

Closes-Bug: #1489198
Change-Id: Id3200a675eaba30f086c91d050fb0e0d03b03027

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

I don't think that we should backport pretty fix from upstream's master to MOS6.1 for the following reasons:
  - there are differences in synced functions;
  - current WA works well.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/neutron (9.0/mitaka)

Reviewed: https://review.fuel-infra.org/18417
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0/mitaka

Commit: 93179921769abd9b55c6914c070bd6ba61c10759
Author: Oleg Bondarev <email address hidden>
Date: Mon Apr 18 21:07:17 2016

Notify nova with network-vif-plugged in case of live migration

 - during live migration on pre migration step nova plugs instance
   vif device on the destination compute node;
 - L2 agent on destination host detects new device and requests device
   info from server;
 - server does not change port status since port is bound to another
   host (source host);
 - L2 agent processes device and sends update_device_up to server;
 - again server does not update status as port is bound to another host;
Nova notifications are sent only in case port status change so in this case
no notifications are sent.
The fix is to explicitly notify nova if agent reports device up from a host
other than port's current host.
This is the fix on neutron side, the actual fix of the bug is on nova side:
change-id Ib1cb9c2f6eb2f5ce6280c685ae44a691665b4e98
upstream review: https://review.openstack.org/246898
Closes-Bug: #1489198
Closes-Bug: #1414559
Change-Id: Ifa919a9076a3cc2696688af3feadf8d7fa9e6fc2

tags: added: on-automation
tags: added: on-verification
Revision history for this message
Ekaterina Shutova (eshutova) wrote :

Bug verified on MOS 6.1 + mu6 updates.

Used steps to reproduce above. Captured packets on br-aux interface, RARP packets were catched:
12:17:47.155585 ARP, Reverse Request who-is fa:16:3e:56:c4:dc (oui Unknown) tell fa:16:3e:56:c4:dc (oui Unknown), length 46

tags: removed: on-verification
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (9.0/mitaka)

Reviewed: https://review.fuel-infra.org/18345
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0/mitaka

Commit: a7c889cd8480bcb4230173174313b20749e398dd
Author: Oleg Bondarev <email address hidden>
Date: Tue May 24 10:06:19 2016

Live migration: wait for vif-plugged event on pre live migration

Just like with spawning new VM, we need to wait for neutron to properly
handle new vif device on pre migration step.

By default nova dispatches event to the instance's host while during
live migration we're waiting on destination host - the solution
is to check if instance is migrating and send event to destination
compute as well.

Depends-On: Ifa919a9076a3cc2696688af3feadf8d7fa9e6fc2

upstream review: https://review.openstack.org/246910

Closes-Bug: #1489198
Closes-Bug: #1414559
Change-Id: Ib1cb9c2f6eb2f5ce6280c685ae44a691665b4e98

tags: added: on-verification
Revision history for this message
Kristina Berezovskaia (kkuznetsova) wrote :

Verify on:
cat /etc/fuel_build_id:
 461
cat /etc/fuel_build_number:
 461
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-release-9.0.0-1.mos6349.noarch
 fuel-misc-9.0.0-1.mos8451.noarch
 python-packetary-9.0.0-1.mos140.noarch
 fuel-bootstrap-cli-9.0.0-1.mos285.noarch
 fuel-migrate-9.0.0-1.mos8451.noarch
 shotgun-9.0.0-1.mos90.noarch
 fuel-notify-9.0.0-1.mos8451.noarch
 python-fuelclient-9.0.0-1.mos325.noarch
 fuel-9.0.0-1.mos6349.noarch
 fuel-setup-9.0.0-1.mos6349.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8741.noarch
 fuel-library9.0-9.0.0-1.mos8451.noarch
 network-checker-9.0.0-1.mos74.x86_64
 fuel-agent-9.0.0-1.mos285.noarch
 fuel-ui-9.0.0-1.mos2717.noarch
 fuel-ostf-9.0.0-1.mos935.noarch
 fuelmenu-9.0.0-1.mos273.noarch
 fuel-nailgun-9.0.0-1.mos8741.noarch
 rubygem-astute-9.0.0-1.mos750.noarch
 fuel-mirror-9.0.0-1.mos140.noarch
 fuel-openstack-metadata-9.0.0-1.mos8741.noarch
 nailgun-mcagents-9.0.0-1.mos750.noarch
 fuel-utils-9.0.0-1.mos8451.noarch
(neutron+vxlan, 3 controllers, 2 compute)

Repeat steps from previous verification. RARP packets are in tcpdump, 1 packet was lost on vm1 and 5 packets were lost on vm2

tags: removed: on-verification
Revision history for this message
Alexander Ignatov (aignatov) wrote :

This bug fixed in MOS 9.0 downstream only but not going to be fixed in Newton for some reasons. Let's keep it as is in 10.0. Once we get our downstream branch for MOS 10.0-Newton we can frowardport fix from 9.0 to 10.0.

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Targeted to 7.0-mu-6, Alexey - please check if the issue is fixed completely by your patch.

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Moving to Won't Fix for 7.0-updates since we already have patch https://review.fuel-infra.org/#/c/11051/ merged. Because this bug is fixed, but the fix is not optimal we are talking about optimization and this will not get into stable product.

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Sorry, I think that Fix Release will be more suitable here.

Revision history for this message
Ekaterina Shutova (eshutova) wrote :
tags: added: covered-automated-test
removed: on-automation
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (10.0/newton)

Fix proposed to branch: 10.0/newton
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/30250

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (mcp/newton)

Fix proposed to branch: mcp/newton
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/33658

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (11.0/ocata)

Fix proposed to branch: 11.0/ocata
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/34451

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (mcp/ocata)

Fix proposed to branch: mcp/ocata
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/34846

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/nova (11.0/ocata)

Change abandoned by Roman Podoliaka <email address hidden> on branch: 11.0/ocata
Review: https://review.fuel-infra.org/34451
Reason: 11.0/ocata is deprecated in favor of mcp/ocata

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (mcp/ocata)

Reviewed: https://review.fuel-infra.org/34846
Submitter: Pkgs Jenkins <email address hidden>
Branch: mcp/ocata

Commit: e916e8eec1fdc4c023fb63a9f6362f923345e0fe
Author: Oleg Bondarev <email address hidden>
Date: Wed May 3 16:09:46 2017

Live migration: wait for vif-plugged event on pre live migration

Just like with spawning new VM, we need to wait for neutron to properly
handle new vif device on pre migration step.

By default nova dispatches event to the instance's host while during
live migration we're waiting on destination host - the solution
is to check if instance is migrating and send event to destination
compute as well.

Upstream review: https://review.openstack.org/246910

Closes-Bug: #1489198

Change-Id: Ib1cb9c2f6eb2f5ce6280c685ae44a691665b4e98

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (mcp/newton)

Reviewed: https://review.fuel-infra.org/33658
Submitter: Pkgs Jenkins <email address hidden>
Branch: mcp/newton

Commit: b77f3d3249968420532b22b855cdbe0eedcfd325
Author: Oleg Bondarev <email address hidden>
Date: Fri May 5 09:41:02 2017

Live migration: wait for vif-plugged event on pre live migration

Just like with spawning new VM, we need to wait for neutron to properly
handle new vif device on pre migration step.

By default nova dispatches event to the instance's host while during
live migration we're waiting on destination host - the solution
is to check if instance is migrating and send event to destination
compute as well.

Upstream review: https://review.openstack.org/246910

Closes-Bug: #1489198

Change-Id: Ib1cb9c2f6eb2f5ce6280c685ae44a691665b4e98

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.