[RFE] Adding macvtap ml2 driver and agent

Bug #1480979 reported by Andreas Scheuring on 2015-08-03
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Wishlist
Andreas Scheuring

Bug Description

Adding macvtap ml2 driver and agent to neutron.

Problem Statement
=================
Today, there's no macvtap support in Neutron that can be commonly used on all types of interfaces. But at least there is some support within the sriov-nic driver/agent. But this support cannot be reused, as it
- requires PCI SRIOV Ethernet Adapters
- is tightly coupled with Nova PCI Manager
- does not support vnic_type 'normal'
- only allows macvtaps in passthrough mode

Proposal
========
This new ml2 mech driver and agent support macvtap attachments for instances
- independent of the ethernet adapter used
- in a configurable macvtap mode (default 'bridge')
- with vnic_type 'normal'

Benefits
--------
Macvtap comes along with a couple of values, like
- performance (compared to ovs) in the sense of cpu consumption, throughput and latency,
- availability in each linux kernel
- can deal with adapters that are not in promisc mode.

Scope
-----
The scope of this proposal is the compute node only - attaching instances via macvtap to the network. However support for ports used by the network node could be added in the future using macvlan devices.

Restrictions in the first stage (Mitaka)
----------------------------------------
- Only flat and vlan is supported. local and tunneling - can be easily lifted in a later stage
- No Security Groups, as macvtap is a direct connection (like sriov passthrough and sriov macvtap today)
- Live Migration requires same physnet:interface mapping on each host. physnet1:eth1 on host_a and pyhsnet1:eth2 on host_b will not work (macvtap directly sits on that interface, there's no abstraction layer like a bridge available). This will be documented in the first stage. An enforcement mechanism or device renaming is planned for a later stage.
- Anti Spoofing rules on IP Level are not supported. However there is some prevention on mac level. An attacker in the guest can not receive packets destinated to a mac address =! its own. But on the other hand, outgoing traffic is not being filtered along the src mac.

High level architecture
-----------------------
An ml2 mechansim driver and a l2 agent are required.

Macvtap will be used similar like a virtual switch. There is an interface_mapping, that describes the mapping between openstack physical network and the eth interface (or bond) that should be used. One Macvtap will be instantiated on such an eth interface for each guest port. As macvtaps are in bridge mode, macvtaps on the same source device can talk to each other directly, without going out to the cable. For the vlan case, a vlan device will be set up on the eth interface and macvtaps connected to it instead. This ensures tenant isolation. In the first shot there is no support for vxlan and gre.

Neutron Integration
-------------------
A place for this agent was discussed on [2]. Kyle decided to have this agent living somewhere in the neutron tree.

A new ml2 mechanism driver will be required (currently around 80 lines of code).

For the agent, the current idea is to integrate it into the linuxbridge agent. Therefore the lb agent needs to be refactored, to separate the common agent code from the linuxbridge specific code via a clear interface. This interface then can be used for macvtap to plug in as well. 2 Approaches for this plugin mechanism are on the table

#1 Keep use the linuxbridgeagent binary (main method + entry point) and make it a pluginframework using stevedore to allow linuxbridge and macvtap to plug in.
#2 Each agent type keeps it's own binary (main method + entry point), but both instantiate the common agent and pass in the their driver as object.

A first prototype [4] for the agent showed, that for macvtap integration along this approach only around 50-150 lines of macvtap production code are required (depending on how it will be integrated into lb). Around 500 lines are shared between both agents and 720 are specific to linuxbridge. A new prototype including the lb refactoring is currently under development [5]

Nova Integration
----------------
Macvtap vif_type is already in nova [3]

Links
-----

[2] https://review.openstack.org/#/c/195907/
[3] https://review.openstack.org/#/c/182283/
[4] https://github.com/scheuran/networking-macvtap/commit/36a068cf3d3d6930ab9330efb099cd95a84ca785
[5] https://review.openstack.org/#/q/status:open+project:openstack/neutron+branch:master+topic:lb_common_agent_experiment,n,z

Assaf Muller (amuller) wrote :

Please add more information. What is the problem statement, what is the high level architecture of the proposed solution?

Fix proposed to branch: master
Review: https://review.openstack.org/209538

Changed in neutron:
assignee: nobody → Andreas Scheuring (andreas-scheuring)
status: New → In Progress
description: updated
description: updated
description: updated

This sounds relatively limited and isolated in scope. As part of the effort I would love to see documented, sorta like a buying guide for users, how to choose between one type of driver over the other.

tags: added: rfe-approved
removed: rfe

Armando, thanks for approving this. I totally agree to all of the points you mentioned! I just went over my description again and I got the feeling that I need to precise a few statements:

* Lines of code added for macvtap: The about 50 lines referred to the agent only. There probably will be a few lines more due to imports, copyright, method defintions, blank lines. It also depends on the approach how we share code between the agents. Another 80 lines are required for the mech driver. But both is very self containing and simple code! But finally it all depends on that we get the linuxbridge agent refactored.

* Inital release restrictions:
- Only flat and vlan is supported. local and tunneling - could be added in a later stage
- No Security Groups, as macvtap is a direct connection (like sriov and sriov macvtap today)
- Live Migration requires same physnet:interface mapping on each host. physnet1:eth1 on host_a and pyhsnet1:eth2 on host_b will not work. This will be documented in the first stage. An enforcement mechanism is planned for a later stage.

I will update the description accordingly. Please feel free to reasses!

description: updated
Changed in neutron:
importance: Undecided → Wishlist
Changed in neutron:
milestone: none → mitaka-1
Changed in neutron:
milestone: mitaka-1 → mitaka-2
summary: - Adding macvtap ml2 driver and agent
+ [RFE] Adding macvtap ml2 driver and agent

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/235952
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Change abandoned by Doug Wiegley (<email address hidden>) on branch: master
Review: https://review.openstack.org/248138
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Two patches abandoned, unless we pull it together this will be pushed out of mitaka

Changed in neutron:
status: In Progress → Incomplete
assignee: Andreas Scheuring (andreas-scheuring) → nobody
milestone: mitaka-2 → none

This needs to wait until the modular l2 agent merged (https://bugs.launchpad.net/neutron/+bug/1468803).

I will prepare the patchsets to (hopefully) get them in quickly after the modular l2 agent lives.

Changed in neutron:
assignee: nobody → Andreas Scheuring (andreas-scheuring)
status: Incomplete → In Progress
Alan Jenkins (aj504) wrote :

The "Security Impact" in the spec looks wrong to me. (I hope I'm posting on the right channel).

"macvtap prevents MAC spoofing already today" - I believe people need to stop making this assumption :).

In v4.1-rc4: "macvtap add missing ioctls... SIOCGIFHWADDR and SIOCSIFHWADDR ioctls to get and set the mac address." https://github.com/torvalds/linux/commit/b5082083392224eca4c46abde908ab0e4210510c

I can't see where in the code was *ever* supposed to prevent MAC spoofing. Just look at how macvlan_start_xmit() calls straight through to dev_queue_xmit_accel(). The limitations are more around RX of different MACs. I think that's the true reason the KernelNewbies page says "Configuring the mac address of the endpoint is important."

---

In a similar time frame there was also "macvlan: Propagate promiscuity setting to lower devices.". https://github.com/torvalds/linux/commit/efdbd2b30caa65dd9e687853afa4d7ce8b39447e AFAICT you can't set `promisc` from the macvtap character device with this, not yet. You'd have to deliberately set it on the macvlan network interface which is associated. Or create the macvtap in "passthru" mode, which is automatically promiscuous... It is somewhat messy, but to be on the safe side I would also assume that some bright spark will eventually wire up promisc properly, through tap -> qemu -> virtio.

[There's MACVLAN_FLAG_NOPROMISC you can set somehow, but I can't see a single way to restrict TX :).]

Hi Alan, thanks for your great input! I must admit I'm not a kernel expert. I will investigate to understand your concerns and update this bug accordingly (there's no spec required for this, but I will update this bug).

Thank you!

Seems like I got some clarification. Would be great if you could asses my assumptions:

Libvirt offers a option trustGuestRxFilters

- Having it set to no (default)
The guest can set another mac address on it's interface and send out packets with this mac. But a reply will never come back to the guest, as the macvtap does not consider this mac in it's mac filter (or however this is implemented). So we could say, that there is some very lightweight mac spoofing prevention. However the attacker in the guest can confuse the switches outside with this potential duplicated mac which is of course problematic.

And the problem with this approach is, that the guest cannot use multicast (which would require access to the Rx filters to figure out the multicast mac addresses)!

- Having it set to yes
All guest mac addresses will be added to the macvtaps mac filter (including multicast macs). But the door is totally open for arp spoofing.

So I agree, we should state that there is only this very restricted security feature.

Does that make sense?

Regarding promisc. mode, the argumentation is that with macvtap you do not need promisc mode at all, as the macs get registered.

Thank you!

[1] http://libvirt.org/formatdomain.html#elementsNICS

Alan Jenkins (aj504) wrote :

Thanks for looking at this in such detail. Apologies for silence, I forgot to subscribe for email notification. I hope belated feedback is still helpful.

I didn't know about trustGuestRxFilters. That's a great point & thanks for telling me about it. I like your writeup of the implications. I hadn't thought about multicast (just assumed it would always work). I have a clarity nitpick & a correction on promisc.

Clarity nitpick: I wasn't quite sure what was meant by "receive filters", particularly in case it referred to libvirt network filters. I guess it's really the fault of the libvirt documentation. The libvirt commit made it clear to me though. I don't know if you want to steal some of their words:

https://www.redhat.com/archives/libvir-list/2014-September/msg01439.html

When promisc. support is present, trustGuestRxFilters has a natural extension to it, which is great. Promisc. mode would be necessary e.g. to run a bridge in the VM. Ability to run a bridge was given as justification for the quick hack that was "passthru" mode macv{lan,tap}, which enables promisc. mode by default.

...oh look, they already implemented it :).

http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/qemu/qemu_driver.c;h=abcdbe6a0b1e4e03e8017a5a118bbcd442d10e04;hb=HEAD#l4446

So "the door is totally open for arp spoofing" (I think you meant MAC spoofing again though?) could be followed with "and packet sniffing".

Alan Jenkins (aj504) wrote :

What's more annoying is that blocking multicast blocks IPv6 (Neighbour Discovery uses multicast). Libvirt doesn't explicitly document blocking multicast, and definitely doesn't document breaking IPv6.

https://bugzilla.redhat.com/show_bug.cgi?id=1035253#c15

I hope that can be fixed (just allow multicast reception), but in the mean time you might have to document that as well.

Reviewed: https://review.openstack.org/246318
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=6e29cdd6b654874e4003e31891228d3abc107700
Submitter: Jenkins
Branch: master

commit 6e29cdd6b654874e4003e31891228d3abc107700
Author: Andreas Scheuring <email address hidden>
Date: Tue Oct 13 13:21:32 2015 +0200

    lb: ml2-agt: Separate AgentLoop from LinuxBridge specific impl

    The goal is to extract the common agent code from the linuxbridge agent
    to share this code with other agents (e.g. sriov and new macvtap [1]).
    This is a first step into the direction of a so called modular l2
    agent.

    Therefore all linuxbridge implementation specifics are moved into the
    LinuxBridgeManager class. The manager class will be passed as argument
    into the common agent loop instead of instantiating it in its
    constructor. In addition the network_maps and the updated_devices map
    has been moved into the rpc class.

    A clear manager interface has been defined for the communication
    between the common agent loop and the impl specific manager class.

    In a follow up patchset, the common agent loop will be moved into a
    new file. This has not yet happened to simplify tracking the code
    changes during review.

    [1] https://bugs.launchpad.net/neutron/+bug/1480979

    Change-Id: Ia71f5a403b7029f8cc591f83df91ab2d3916f3f8
    Partial-Bug: #1468803
    Partial-Bug: #1480979

Download full text (4.0 KiB)

Alan, thanks for this great discussion. I took the time to discuss this with the libvirt guy that implemented parts of the trustGuestRxFilter stuff. In addition I did some testing.

This is how I think it works with trust rx filter set to yes : Libvirt receives an event if there was a change from within the guest. It queries qemu for the actual state. More details see [1]. Now libvirt does the following
- adds the guests multicasts macs to the macvtaps multicast rx filter list
- if one or more vlans are enabled in the guest, it sets the macvtap into promisc mode
- snychronises the following flags: promisc, multicast, allmulti
- changes the macvtaps mac to the primary mac

What it NOT does is doing anything with the unicast mac list (as far as I understood). This one is being ignored.

What I could NOT verify is the use case of running a bridge in a macvtap attached vm - I mean it works as long as the bridge uses the same mac as the vms eth nic. But as soon as you change that mac - you won't get any inbound traffic for this mac anymore. I was using the bridge mode and tested vm-vm traffic as well as outside-vm traffic.

If that's right the big problems I see are the following:
# Having trustGuest set to NO
- no multicast
- no arbitrary ipv6 addresses (which require special multicast groups for neighbour discovery)
- mac spoofing (and therfore packet sniffing) is not possible, however you can sent out faked packets and confuse the switches in the datacenter

# having trustGuest set to YES
- Guest can change the mac and therefore the mac of the macvtap (this is a big problem, as the mac address will be used for identifying a device in this proposal)
- Guest can mac spoof unicast and therefore do packet sniffing.
But this works only for one mac address at the same time as only one can be set - (i know this doesn't make things better...) Spoofing does NOT work if the target mac is already used on the same host by another libvirt manged guest (if there is already a libvirt configured macvtap with this mac, libvirt does not change the mac of the macvtap device - however if it is not libvirt managed, it will result in a duplicated mac on the host; if a new instance appears on the same host that wants to use this mac, libvirt returns an error)

[1] Show rx-filter information.

Returns a json-array of rx-filter information for all NICs (or for the
given NIC), returning an error if the given NIC doesn't exist, or
given NIC doesn't support rx-filter querying, or given net client
isn't a NIC.

The query will clear the event notification flag of each NIC, then qemu
will start to emit event to QMP monitor.

Each array entry contains the following:

- "name": net client name (json-string)
- "promiscuous": promiscuous mode is enabled (json-bool)
- "multicast": multicast receive state (one of 'normal', 'none', 'all')
- "unicast": unicast receive state (one of 'normal', 'none', 'all')
- "vlan": vlan receive state (one of 'normal', 'none', 'all') (Since 2.0)
- "broadcast-allowed": allow to receive broadcast (json-bool)
- "multicast-overflow": multicast table is overflowed (json-bool)
- "unicast-overflow": unicast table is overflowed (json-bool)
- "main-mac"...

Read more...

Alan Jenkins (aj504) wrote :

Hi Andreas. Testing sounds good :), that's partly what I was trolling for. To be honest all I'd checked is that IPv6 wasn't working with DHCPv6-assigned addresses. Talking with the libvirt guy sounds even better.

My assumption with promisc. mode on a macvlan was that it let you recieve all packets regardless of MAC address. So you could add them to a linux bridge interface without any limitations.

However I get the same result as you when I tried it (just plugging together with `ip link`, veth interfaces, and namespaces; no libvirt). I can get bridging to work with one of the weird "passthrough mode" macvlan interfaces, but I don't expect you'll be using them. The normal "bridge mode" macvlans didn't co-operate with bridging, as you describe.

It's as if there's no difference between the effects of promisc. and allmulti. mode. I've come to think this is justified (see end). You're absolutely correct. VM's can only receive on one MAC address (for unicast). It's not what I would mean to call "packet sniffing". If you have to allow MAC changes in order to support multicast, it's not ideal, but significantly less dangerous than I was thinking. Apologies if I've been dumping my own confusions on you.

## Disagreements in wording

When you say "sending faked packets", I would call that "MAC spoofing", because it matches what we mean by IP address spoofing. That's what I picked up on to start with. You don't need to say that macvlan allows MAC spoofing in those exact words; it might be counter-productive because it's such a narrow concern. AFAICS macvlan allows fun with ARP spoofing too :), it's just how Ethernet works by default. My point is you should *not* describe macvlan as preventing MAC spoofing.

"No arbitrary IPv6 addresses" confuses me. I assumed assigning routable IPv6 addresses doesn't work, so IPv6 is dead. I'm just guessing, but maybe you're telling me SLAAC addresses will have the same interface identifier (last 64 bits) as link-local addresses and that means they will work too.

## promisc. notes

After much consideration :) I think the behaviour of promisc. mode is "correct". I thought it would make sense if you could create two macvlans, and enslave them to two separate bridges. I was wrong, because the notional bridge inside macvlan doesn't implement MAC learning. It would have to inefficiently flood unknown unicast traffic into *both* bridges in order to reach the destination. So I won't worry that someone will implement it in future :).

It was confusing that a promisc mode macvlan seems to enable promisc mode on the lowerdev, even if the macvlan is not passthrough mode. It looks like it only needed "allmulticast".

"bridge mode" macvlan code in the kernel also doesn't support extra secondary MAC addresses AFAICS. I can't work out the reason for _that_, and you can create "stacked" macvlans which are equivalent. But if you're a VM attached to a macvtap... that's not a concern.

As requested, here an update on this work:

Prerequirement modular l2 has the main patchset landed, there's just one minor patchset out that moves this code into a separate file [1]

Macvtap agent patches are up here (need to be rebased) [2]. Those patches contain the following code: The ML2 Driver, the agent, a common functions class + unittests for everything
What's still missing is: A release node and functional tests. I think I can add those early next week.

So from my point those patches can make M3 of course it depends on how reviews are going...

A quick update on the spoofing discussion above: I found a way to enable Multicast without turning the trustGuetRxFilter switch to on - so an attacker in the guest is NOT able to sniff any foreign unicast packets based on the mac address. We're currently evaluating if there's a way to block outgoing packets with a src mac =! the mac of the guest as well. However reading incoming packets with a dst mac =! the guests mac is NOT possible!

[1] https://review.openstack.org/#/c/273448/
[2] https://review.openstack.org/#/q/topic:macvtap_agent

description: updated

Coding complete!

Please see https://review.openstack.org/#/q/topic:macvtap_agent

I'm currently not sure about functional tests. It seems like most of the interactions with the system are already covered by the existing ip_lib tests. So any input is welcome.

What's also an open point is documentation. I added a docimpact flag to the relevant patches - anything else to do?

Alan, I updated the list of restrictions in this bug with the anti spoofing stuff we discussed (this bug description is the spec).

To allow multicast, you simply have to enable the ALLMULTICAST flag of the macvtap device in the hypervisor. This is done by my agent code. In parallel we're evaluating if something could be done in the kernel/libvirt to get the same functionality without Openstack.

Andreas: DocImpact will trigger a workflow action where a bug is filed against list [1]. Unless you think these should be addressed separately, it's most likely that one would be a duplicate of the other, so my suggestion would be to have only one of the two the DocImpact flag.

Are you going to work with Sam-I-Am about docs? Do you have an idea of the content?

[1] https://bugs.launchpad.net/neutron/+bugs?field.tag=doc
[2] https://review.openstack.org/#/c/275306/
[3] https://review.openstack.org/#/c/209538/

talked to Sam-I-Am regarding documentation

We will update the ML2 section here: http://docs.openstack.org/liberty/networking-guide/config-ml2-plug-in.html
And add a new scenario guide similar to this one: http://docs.openstack.org/developer/neutron/devref/linuxbridge_agent.html

Reviewed: https://review.openstack.org/280275
Committed: https://git.openstack.org/cgit/openstack/neutron-lib/commit/?id=d1b85ca3510226931c8c2478d82de414b4ca0fd5
Submitter: Jenkins
Branch: master

commit d1b85ca3510226931c8c2478d82de414b4ca0fd5
Author: Andreas Scheuring <email address hidden>
Date: Mon Feb 15 15:59:58 2016 +0100

    Add constants for macvtap agent

    Change-Id: Ic3dfe17499c0ef23c0a053c6320af56195374ce8
    Partial-Bug: #1480979

Reviewed: https://review.openstack.org/275305
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=bfcad8eec717fbed99a3ece23fb5b62634d414ae
Submitter: Jenkins
Branch: master

commit bfcad8eec717fbed99a3ece23fb5b62634d414ae
Author: Andreas Scheuring <email address hidden>
Date: Thu Jan 28 15:17:42 2016 +0100

    macvtap: Common functions and constants

    Functions and constants that are shared between the macvtap ml2 driver
    and the macvtap agent.

    The review is submitted in three parts:
     - Part 1 (this part)
        Common functions that are used by the ml2 driver and the agent
     - Part 2
         The Mechanism Driver to support port binding for macvtap attachments
     - Part 3
        The Macvtap L2 Agent.

    Partial-Bug: #1480979

    Change-Id: I63a095e6f592b94372ff018f2e73373ad9414d99

Changed in neutron:
milestone: none → mitaka-3

Reviewed: https://review.openstack.org/209538
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=eb9bda12d25a565e117a518d93abd75bfa50730a
Submitter: Jenkins
Branch: master

commit eb9bda12d25a565e117a518d93abd75bfa50730a
Author: Andreas Scheuring <email address hidden>
Date: Mon Aug 3 15:11:20 2015 +0200

    macvtap: ML2 mech driver for macvtap network attachments

    This driver uses the vif_type 'macvtap'. It enriches the vif_details
    with the corresponding attributes required by nova [1] to support
    macvtap attachments for libvirt qemu/kvm guests.

    The review is submitted in three parts:
     - Part 1
        Common functions that are used by the ml2 driver and the agent
     - Part 2 (this part)
         The Mechanism Driver to support port binding for macvtap attachments
     - Part 3
        The Macvtap L2 Agent.

    [1] https://review.openstack.org/#/c/182283

    Change-Id: I206f58a21c36e55de957d8a23993aa9bc26d1595
    Partial-Bug: #1480979

Reviewed: https://review.openstack.org/275306
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=2e7eb09271912e9db1948b15ab3f8e184d4c324a
Submitter: Jenkins
Branch: master

commit 2e7eb09271912e9db1948b15ab3f8e184d4c324a
Author: Andreas Scheuring <email address hidden>
Date: Tue Feb 2 16:34:59 2016 +0100

    macvtap: Macvtap L2 Agent

    This agent is required by the macvtap ml2 driver to support
    macvtap attachments for libvirt qemu/kvm instances. It introduces
    a new configuration option MACVTAP.physical_interface_mappings.

    The review is submitted in three parts:
     - Part 1
        Common functions that are used by the ml2 driver and the agent
     - Part 2
         The Mechanism Driver to support port binding for macvtap attachments
     - Part 3 (this part)
        The Macvtap L2 Agent.

    DocImpact
    New ML2 mech driver + l2 agent
    New config option "macvtap.physical_interface_mappings"

    Change-Id: I219d80b4c704ac2f41edd3501f4b2198925778d6
    Closes-Bug: #1480979

Changed in neutron:
status: In Progress → Fix Released

This issue was fixed in the openstack/neutron 8.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.