Macvtap driver/agent migrates instances on an invalid physical network

Bug #1550400 reported by Andreas Scheuring on 2016-02-26
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Medium
Andreas Scheuring

Bug Description

Scenario1 - Migration on wrong physical network - High Prio
===========================================================
Host1 has physical_interface_mappings: pyhsnet1:eth0, physnet2=eth2
Host2 has physical_interface_mappings: physnet1:eth1, physnet2=eth0

Now Live migration from an instance hosted on host1 (connected to physnet1) to host2 succeeds. Libvirt just migrates the whole server with its domain.xml and the macvtap is plugged on the targets side eth0.
Now the instance does not have access to its network anymore, but access to another physical network. The behavior is documented, however this needs to be fixed!

Scenario2 - Migration fails - Low Prio
======================================
Host1 has physical_interface_mappings: pyhsnet1:eth0
Host2 has physical_interface_mappings: physnet1:eth1

Let's assume a vlan setup. Let's assume a migration from host1 to host2. Host to does NOT have a interface eth0. Migration will fail in instance will remain active on the source, as nova plug on host2 failed to create a vlan device on eth0.

If you have a flat network - definition of he libvirt xml will fail on host2.

Two approaches are thinkable
* Solve the problem (Scenario 1+2)
* just prevent such an invalid migration (let scenario 1 fail like scenario 2 fails today)

Solve the problem
=================

#1 Solve it in Nova pre live migration
--------------------------------------

This would allow migration although physical_interface mappings are different.

a) On pre live migration nova should change the binding:host to the migration target. This will trigger the portbinding and the mech driver which will update the vif_details with the right macvtap source device information. Libvirt can then adapt the migration-xml to reflect the changes.

Currently the update of the binding is done in post migration, after migration succeeded. Can we already do it in pre_live_migration and on failure undo it in rollback ?
- There's no issue for the reference implementations - See the prototype: https://review.openstack.org/297100
- But there might be some mechanisms for external SDN Controllers that might shut down ports on the source host as soon as the host_id is being updated. On the other hand, if controller rely on this mechanism, they will set the port up a little too late today, as the update host_id is sent after live migration succeeded.

b) The alternative would be to allow a port to be bound to multiple hosts simultaneously. So in pre_live migration, nova would add a binding for the target host and in post_live_migration it would remove the original binding.
This would require
- simultaneous port binding. This will be achieved by https://bugs.launchpad.net/neutron/+bug/1367391
- allow such a binding for compute ports as well
- Update APIs to reflect multiple port_bindings
  - Create / Update / Show Port
  - host_id is not reflect for DVR ports today [1]

#2 Moved to Prevent section
---------------------------

#3 Device renaming in the macvtap agent
---------------------------------------
This would allow migration although physical_interface mappings are different.

Instead of
     physical_interface_mapping = physnet1:eth0
use a
     physical_interface_mac_mapping = physnet1:00:11:22:33:44:55:66 #where 00:11:22:33:44:55:66 is the mac address of the interface to use

On agent startup, the agent could rename the associated device to "physnet1" (or to some other generic value) that is consistent cross all hosts!

We would need to document that this interface should not be used by any other application (that relies on the interface name)

#4 Use generic vlan device names
--------------------------------

This solves the problem only for vlan networks! For flat networks it still would exist

Today, the agent generates the vlan device names like this: for eth1 eth1.<vlan-id>. We could get rid of this pattern and use network-uuid.vlan instead. Where nework-uuid are the first 10 chars of the id.

But this would not solve the issue for flat networks. Therefore the device renaming like proposed in #3 would be required.

Prevent invalid migration
=========================

#1 Let Port binding fail
------------------------

The idea is to detect an invalid migration in the mechanism driver and let port binding fail.

This approach has two problems
a) Portbinding happens AFTER Migration happened. In post live migration nova requests to update the binding:host-id to the target. But when doing so, the instance is already running on the target host. The binding will fail, but the migration happened, though.
--> But at least the the instance would be in error state and user is aware of that! In addition, we might drop all traffic related to this instance.

b) Detecting a migration in the mech driver is difficult. The idea is to use the PortContext.original port. This works if we add the original port to the context somewhere in the ml2 plugin (see proposed patch). The problem is that this is not an indicator for a live migration. The original port will also be added if scheduling on another node failed before and now the current node is picked. There was no live migration but the PortContext.original port is set to another host. Maybe this can be solved but it's worthwhile mentioning here.
--> In the worst case, use the profile information added with https://review.openstack.org/#/c/275073/

see patch https://review.openstack.org/293404

#2 Let agent detect invalid migration (not working)
---------------------------------------------------
An invalid migration could be detected in the agent to avoid the agent setting the device status to up.

But this is too late, as the agent detects the device after migration already started. There is no way to stop it again.

see patch https://review.openstack.org/293403

#3 Solve it in nova post live migration
------------------------------------
The idea is, that nova starts the migration and then listens on plug_vif event that is emitted by neutron after the agent reported the device as up. Nova also waits for the portbinding to occur. If one of both runs into a timeout or fails, either the migration should be rolled back (if still possible) or the instance should be set into error state and the network locked down (which is the default for ovs - not sure about other right now).
There are some patchsets out that try to achieve something similar, but for the ovs-hybrid plug only. For others it's much more complicated, as the agent will only report the device up after it occured on the target (after migration already started) https://review.openstack.org/246898

#4 Prohibit agent start with invalid mapping
--------------------------------------------
Do not allow different mappings at all.

How to trigger the validtion?
* Have an RPC call from the Agent to the Neutron plugin at agent start.
--> Less resource consumption, but extra rpc call

* Use the regular agent status reports.
--> Checking on every status report consumes a lot of resources (db query and potential mech_driver calls)

What to compare?
* Have a master interface mappings config option configured on the plugin/mech_driver. All agent mappings must match that mapping
--> If the server changes the master mapping, there's no way to notify the agents (or it must get implemented)
--> Config option duplication

* Query the master mapping from the server and compare at agent side, ord ignore local mapping at all if one has been configured.

* Compare against existing mappings in database. The first agent that sends his mapping via status reports defines the valid mapping.
--> We need explicit table locking (locking rows is not sufficient) to avoid races, especially for the cases where the first agents get added.

Where to do the validation/gather data for validation?
* In the mech driver
--> Most natural way, but requires a new mech_driver interface
--> Also a new plugin interface is required

* In the rpc callbacks class
--> As the validation depends on the mech_driver, we would have mech_driver specific code there. But we would get around new interfaces

* In the agent

Proposal:
* Agent has a new config option "safe_migration_mode" = True|False (Default False, to stay backward compatible)
* If it is set, the servers master mapping is queried by the agent on agent start via RPC.
* If it does not map the local mapping, the agents terminates
* The RPC call will be made to the plugin, which then triggers all mechanism drivers. Those have a generic method like 'get_plugin_configuration()' or similar
* If this method is not present, the code will just continue (to not break other drivers)
* The plugins returns a dict mech_driver:configuration to the agent. If the mech_driver did not provide any configuration (as not required or method not implemented), it will not be part of the return dict.
* If the master mapping on the server got changed, bug agents haven't been restarted, the local mapping will not be validated against the new master mapping again (which would required agent restart)

References
==========

[1] curl -g -i -X GET http://192.168.122.30:9696/v2.0/ports/b780af01-a3c6-4279-a355-fa3289bc1ec3.json
{"port": {"status": "ACTIVE", "binding:host_id": "", "description": "", "allowed_address_pairs": [], "extra_dhcp_opts": [], "updated_at": "2016-04-08T08:59:08", "device_owner": "network:router_interface_distributed", "port_security_enabled": false, "binding:profile": {}, "fixed_ips": [{"subnet_id": "7c57f270-46a9-4cde-a2a7-82e66da1d084", "ip_address": "10.0.0.1"}], "id": "b780af01-a3c6-4279-a355-fa3289bc1ec3", "security_groups": [], "device_id": "48120dd2-2523-4b11-9f67-8ea944d11012", "name": "", "admin_state_up": true, "network_id": "2be9f80a-e3ac-42e6-9249-5ccca241ad85", "dns_name": null, "binding:vif_details": {}, "binding:vnic_type": "normal", "binding:vif_type": "distributed", "tenant_id": "1c9d0fc21afc40a2959d3d3d4acca528", "mac_address": "fa:16:3e:a7:d8:80", "created_at": "2016-04-07T15:05:57"}}

[2] https://review.openstack.org/342872

Adding Incomplete till more details.

summary: - Macvtap driver/agent migrates instances on an invalid pyhsical network
+ Macvtap driver/agent migrates instances on an invalid physical network
Changed in neutron:
status: New → Incomplete
description: updated
Changed in neutron:
assignee: nobody → Andreas Scheuring (andreas-scheuring)
Changed in neutron:
status: Incomplete → In Progress
importance: Undecided → High
description: updated
Changed in neutron:
milestone: none → newton-1

Fix proposed to branch: master
Review: https://review.openstack.org/293404

Change abandoned by Andreas Scheuring (<email address hidden>) on branch: master
Review: https://review.openstack.org/293403
Reason: Fix does not work - See comment on https://review.openstack.org/#/c/293404/

Change abandoned by Andreas Scheuring (<email address hidden>) on branch: master
Review: https://review.openstack.org/293404
Reason: This patch series does achieve the desired behavior.

#1 Port binding happens after the Live Migration already succeeded. If portbinding fail, the port will stay in Build state, but the instance has migrated before. It will not be rolled back.

#2 If the scheduler reschedules an instance to another host (as the first host could not start the instance due to some error), the porbinding also contains the .original port. So just having the .orginal port present in the port context is not sufficient to detect an migration.

description: updated
description: updated
description: updated
description: updated
description: updated
Rossella Sblendido (rossella-o) wrote :

This one seems doable: "#1 Solve it in Nova pre live migration" ...we should reach a general consensus, as you mention if might not work properly with all the mech drivers.
#3 and #4 solve only some specific use-case so I'd avoid it.

Regarding preventing invalid migration, #1 could work if the handle the migration the same way we handle port creation. Nova should wait for some Neutron event before declaring the migration successful. If it fails the migration should be reverted somehow, for example blocking all the traffic for the instance and setting it in error state.
Anyway I think we should move this conversation in the ML so that we can involve as many people as possible. Thanks for the nice write-up!

description: updated
description: updated
Changed in neutron:
milestone: newton-1 → newton-2
Changed in neutron:
milestone: newton-2 → newton-3
description: updated
description: updated

Fix proposed to branch: master
Review: https://review.openstack.org/343737

Fix proposed to branch: master
Review: https://review.openstack.org/343741

description: updated
description: updated
description: updated

Andreas tbh #4 seems pretty heavy to be a workaround. It's not so trivial to implement and will change the behaviour of the agent quite a bit. I am wondering if we should just provide a sanity check tool...I know it doesn't solve the problem but at least it can detect it. Let me think about it more, feel free to ping me on irc, we can maybe brainstorm a little more.

Prevent invalid migration - #1 Let Port binding fail - looks promisable for a workaround. The real fix requires https://review.openstack.org/309416

description: updated

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/343741
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/343737
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/342872
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in neutron:
status: In Progress → Confirmed

Fix proposed to branch: master
Review: https://review.openstack.org/361301

Changed in neutron:
status: Confirmed → In Progress

Fix proposed to branch: master
Review: https://review.openstack.org/362180

Changed in neutron:
milestone: newton-3 → newton-rc1

Reviewed: https://review.openstack.org/361301
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=6865f4d9f22d5daa2f07ff9651c2280aed489c8c
Submitter: Jenkins
Branch: master

commit 6865f4d9f22d5daa2f07ff9651c2280aed489c8c
Author: Andreas Scheuring <email address hidden>
Date: Mon Aug 29 17:08:34 2016 +0200

    macvtap: Mech driver detects invalid migration

    As of today, Macvtap agent allows invalid live migrations
    from a host with interface mapping 'physnet1:eth1' to a host
    with mapping 'physnet1:eth2'. Migration places the macvtap on
    the target on 'eth2' instead of 'eth1'. This fix detects
    such a migration in the mech driver and lets the portbinding fail.

    Nova will put the instance in error state. The instance might
    still be able to access the wrong network it migrated to, as
    today there is no communication to the agent after a failed binding
    after a migration. But at least the user is now aware that
    something went wrong.

    Change-Id: I92cd6411fe88483b921c92f219afa8e846cc0137
    Depends-On: Ib1cc44bf9d01baf4d1f1d26c2a368a5ca7c6ab68
    Partial-Bug: #1550400

What's needed to mark this complete?

Changed in neutron:
milestone: newton-rc1 → ocata-1

The patch that merged only added an indication for the user that something went wrong. The user is still able to access a potentially wrong network. Something in the agent needs to be done that the macvtap device blocks all the traffic in such a case. Let me try to come up with a proposal.

Changed in neutron:
assignee: Andreas Scheuring (andreas-scheuring) → Gary Kotton (garyk)
Changed in neutron:
assignee: Gary Kotton (garyk) → Andreas Scheuring (andreas-scheuring)
Changed in neutron:
milestone: ocata-1 → ocata-2

Setting this down to Medium priority, as macvtap agent is pretty new and therefore probably not widely used in the field.

Changed in neutron:
importance: High → Medium
Changed in neutron:
assignee: Andreas Scheuring (andreas-scheuring) → Gary Kotton (garyk)

Started prototyping how to stop prevent traffic flow in such a case already a few weeks ago - but didn't had the chance to continue since then: https://review.openstack.org/#/c/293403/

Reviewed: https://review.openstack.org/362180
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=69d73630301a9c2d55781920c47b3c29a2cd0fc4
Submitter: Jenkins
Branch: master

commit 69d73630301a9c2d55781920c47b3c29a2cd0fc4
Author: Andreas Scheuring <email address hidden>
Date: Mon Aug 29 13:31:17 2016 +0200

    ml2: Add original port to context on _bind_port

    During portbinding a new PortContext with the binding information
    for the new host gets created. This patch adds the original
    port to the newly created context. This has the following effects

    - When the port was not bound before, context.original will be set
      set to None on portbinding.

    - When the port was bound before (also if binding failed before)
      context.original is set to this port dict on portbinding. This
      happens mostly in the following scenarios:

      - Nova triggers a reschedule of an instance
        Nova reschedules an instance to another host, after spawning an
        instance on the first host failed for some reason. In this case,
        context.original is set to the port with the binding information
        as it was on the failed host.

      - Live Migration
        The port is now available on another host. context.original is
        set to the port and binding information of the source host.

    Context.original can now be used in mechanism drivers that need to be
    migration aware. This is the case for the macvtap mechanism driver.
    The corresponding functionality will be added with patch [1].

    [1] https://review.openstack.org/#/c/361301

    Partial-Bug: #1550400

    Change-Id: I2b60243366505f6e6e4c716d627255a56a4ba486

Changed in neutron:
milestone: ocata-2 → ocata-3
Changed in neutron:
milestone: ocata-3 → ocata-rc1
Changed in neutron:
assignee: Gary Kotton (garyk) → Andreas Scheuring (andreas-scheuring)
milestone: ocata-rc1 → pike-1
Changed in neutron:
milestone: pike-1 → pike-2
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers