neutron

l2 population failed when bulk live migrate VMs

Bug #1483601 reported by shihanzhang on 2015-08-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	neutron	Fix Released	Undecided	shihanzhang

Bug Description

when we bulk live migrate VMs, the l2 population may possiblly(not always) failed at destination compute nodes, because when nova migrate VM at destination compute node, it just update port's binding:host, the port's status is still active, from neutron perspective, the progress of port status is : active -> build -> active,
in bellow case, l2 population will fail:
1. nova successfully live migrate vm A and VM B from compute A to compute B.
2. port A and port B status are active, binding:host are compute B .
3. l2 agent scans these two port, then handle them one by one.
4. neutron-server firstly handle port A, its status will be build(remember port B status is still active), and do bellow check
in l2 population check, this check will be fail

def _update_port_up(self, context):
        ......
  if agent_active_ports == 1 or (self.get_agent_uptime(agent) < cfg.CONF.l2pop.agent_boot_time):
  # First port activated on current agent in this network,
  # we have to provide it with the whole list of fdb entries

See original description

Tags:

shihanzhang (shihanzhang) on 2015-08-11

description:	updated
description:	updated

yalei wang (yalei-wang) on 2015-08-14

Changed in neutron:
assignee:	nobody → yalei wang (yalei-wang)

Revision history for this message

shihanzhang (shihanzhang) wrote on 2015-08-14:

@yalei wang, I have a idea to fix this bug, do you want to fix it?

Revision history for this message

yalei wang (yalei-wang) wrote on 2015-08-14:

hi hanzhang, pls assign to yourself.

Changed in neutron:
assignee:	yalei wang (yalei-wang) → nobody

Revision history for this message

shihanzhang (shihanzhang) wrote on 2015-08-14:

@yalei wang, I will submit the patch as soon as possible, please help to review :)

Changed in neutron:
assignee:	nobody → shihanzhang (shihanzhang)

Revision history for this message

Kevin Benton (kevinbenton) wrote on 2015-08-20:

I think the right thing to do here is to adjust ML2 to set the port status to DOWN on any ports that have the host_id updated.

Revision history for this message

Kevin Benton (kevinbenton) wrote on 2015-08-20:

Here is a patch that looks like it should work:

diff --git a/neutron/plugins/ml2/plugin.py b/neutron/plugins/ml2/plugin.py
index 904abe9..816e762 100644
--- a/neutron/plugins/ml2/plugin.py
+++ b/neutron/plugins/ml2/plugin.py
@@ -1153,8 +1153,12 @@ class Ml2Plugin(db_base_plugin_v2.NeutronDbPluginV2,
                 original_port=original_port)
             new_host_port = self._get_host_port_if_changed(
                 mech_context, attrs)
- need_port_update_notify |= self._process_port_binding(
+ binding_changed = self._process_port_binding(
                 mech_context, attrs)
+ if binding_changed:
+ need_port_update_notify = True
+ self.update_port_status(context, id, const.PORT_STATUS_DOWN)
+ updated_port['status'] = const.PORT_STATUS_DOWN
             # For DVR router interface ports we need to retrieve the
             # DVRPortbinding context instead of the normal port context.
             # The normal Portbinding context does not have the status

Revision history for this message

shihanzhang (shihanzhang) wrote on 2015-08-20:

hi kevin, thx for your nice patch!

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-21: Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/215467

Changed in neutron:
status:	New → In Progress

Mathieu Rohon (mathieu-rohon) on 2015-09-14

tags:

added: l2-pop

Revision history for this message

Mathieu Rohon (mathieu-rohon) wrote on 2015-09-15:

I would not go for a change to DOWN state when the port is migrated. If the migration fails the port wouldn't be available during the live migration process. I'm not even even sure that the port would move back to UP after a failure. It depends on what nova does with port attachment during live migration.

A new state such as "MIGRATING" seems consistent with the nova state during live migration.

Revision history for this message

Mathieu Rohon (mathieu-rohon) wrote on 2015-09-15:

here is a call flow that I did few month ago that might help :

http://paste.openstack.org/show/198298/

Revision history for this message

shihanzhang (shihanzhang) wrote on 2015-09-15:

#10

@Mathieu Rohon, your suggestion is very good, but now nova just update the port's host_id when it finish live-migration(as I know, nova update port's host_id in its steps of post_migrate), so I think there is not the problems you said

Revision history for this message

YAMAMOTO Takashi (yamamoto) wrote on 2015-09-18:

#11

Mathieu,

i really think we should have a similar diagram in devref!

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-02-05: Fix merged to neutron (master)

#12

Reviewed: https://review.openstack.org/215467
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c5fa665de3173f3ad82cc3e7624b5968bc52c08d
Submitter: Jenkins
Branch: master

commit c5fa665de3173f3ad82cc3e7624b5968bc52c08d
Author: shihanzhang <email address hidden>
Date: Fri Aug 21 09:51:59 2015 +0800

ML2: update port's status to DOWN if its binding info has changed

    This fixes the problem that when two or more ports in a network
    are migrated to a host that did not previously have any ports in
    the same network, the new host is sometimes not told about the
    IP/MAC addresses of all the other ports in the network. In other
    words, initial L2population does not work, for the new host.

    This is because the l2pop mechanism driver only sends catch-up
    information to the host when it thinks it is dealing with the first
    active port on that host; and currently, when multiple ports are
    migrated to a new host, there is always more than one active port so
    the condition above is never triggered.

The fix is for the ML2 plugin to set a port's status to DOWN when
its binding info changes.

This patch also fixes the bug when nova thinks it should not wait
for any events from neutron because all ports are already active.

    Closes-bug: #1483601
    Closes-bug: #1443421
    Closes-Bug: #1522824
    Related-Bug: #1450604

Change-Id: I342ad910360b21085316c25df2154854fd1001b2

Changed in neutron:
status:	In Progress → Fix Released

Assaf Muller (amuller) on 2016-02-05

tags:	added: liberty-backport-potential
tags:	added: kilo-backport-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-01: Fix proposed to neutron (stable/liberty)

#13

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/300539

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-01: Fix proposed to neutron (stable/kilo)

#14

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/300559

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-14: Fix merged to neutron (stable/liberty)

#15

Reviewed: https://review.openstack.org/300539
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a38cb93dde1633005e9e66e6b7ecec9e726304bb
Submitter: Jenkins
Branch: stable/liberty

commit a38cb93dde1633005e9e66e6b7ecec9e726304bb
Author: venkata anil <email address hidden>
Date: Fri Apr 1 14:52:01 2016 +0000

ML2: update port's status to DOWN if its binding info has changed

The fix is for the ML2 plugin to set a port's status to DOWN when
its binding info changes.

This patch also fixes the bug when nova thinks it should not wait
for any events from neutron because all ports are already active.

    Closes-bug: #1483601
    Closes-bug: #1443421
    Closes-Bug: #1522824
    Related-Bug: #1450604
    (cherry picked from commit c5fa665de3173f3ad82cc3e7624b5968bc52c08d)

Conflicts: neutron/plugins/ml2/drivers/l2pop/mech_driver.py

Change-Id: I342ad910360b21085316c25df2154854fd1001b2

tags:

added: in-stable-liberty

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-15: Fix proposed to neutron (stable/kilo)

#16

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/306300

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-15: Change abandoned on neutron (stable/kilo)

#17

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/300559

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-09:

#18

Change abandoned by Dave Walker (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/306300
Reason:
stable/kilo closed for 2015.1.4

This release is now pending its final release and no freeze exception has
been seen for this changeset. Therefore, I am now abandoning this change.

If this is not correct, please urgently raise a thread on openstack-dev.

More details at: https://wiki.openstack.org/wiki/StableBranch

Revision history for this message

Thierry Carrez (ttx) wrote on 2016-06-01: Fix included in openstack/neutron 7.1.0

#19

This issue was fixed in the openstack/neutron 7.1.0 release.

Ihar Hrachyshka (ihar-hrachyshka) on 2016-10-07

tags:

removed: kilo-backport-potential liberty-backport-potential

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.