nova security groups issues with quantum-v2-api integration

Bug #1039400 reported by dan wendlandt
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Critical
dan wendlandt
neutron
Fix Released
Critical
dan wendlandt

Bug Description

The original plan for Folsom was that Quantum would manage security groups itself. However, this work did not get done, so for Folsom we are going to use Nova's iptables-based security groups.

We will likely end up breaking these issues into multiple reviews, so we may want to split these issues off into separate bugs, but I'm using this to track support for quantum + nova security groups as a whole.

There are a couple issues:

- In Folsom, quantum implements its own version of the nova-network API. This means that it must also make the security group RPC calls to compute when a VM fixed IP is allocated/deallocated. Note: in this case, this essentially amounts to when a VM is created or destroyed, since the nova + quantum v2 integration does not implement any mechanism for allocating new fixed IPs after boot.

- We need to populate the DHCP-server attribute of a network object, otherwise, running with security groups enabled will drop DHCP traffic from the VM. We've made the change in Quantum to set the device owner of the dhcp port to 'network:dhcp', so it should be pretty easy to query for the DHCP IP of a given network (note: this requires that the DHCP port is always created before any VMs are created on a network. I believe that is the case already, but we should confirm). Note: the nova data model will limit us to one v4 subnet per-network.

- Multiple NICs per VM. Nova's security groups are per-instance, so there may be some issues with how they apply to a VM with multiple NICs. We'll need to investigate.

- Open vSwitch by default is not compatible with iptables filtering on VIF devices. We think there's a pretty straight-forward work around for this, where we have a new type of vif-plugging in nova that actually plugins each vif into its own instance of the linux bridge, which would then be "uplinked" to an OVS bridge that does the tunneling

Tags: quantum
Revision history for this message
dan wendlandt (danwent) wrote :

Note: a recent change to quantum made sure that ports created by DHCP have a particular device_owner, which should let us query them given a subnet-id. However, there may be a race condition here, if we need the dhcp_server to be populated right after the port is created. If its the first port on a subnet, the dhcp agent may not have created the corresponding port yet. This could be an issue, as the DHCP server IP may be needed at VM boot time.

Changed in quantum:
milestone: none → folsom-rc1
importance: Undecided → High
dan wendlandt (danwent)
summary: - nova quantum-api must populate dhcp_server attribute of network
+ nova security groups issues with quantum-v2-api integration
dan wendlandt (danwent)
description: updated
dan wendlandt (danwent)
Changed in quantum:
status: New → Confirmed
dan wendlandt (danwent)
Changed in quantum:
importance: High → Critical
Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

Exploring trade-offs between different approaches to the problem

Changed in quantum:
assignee: nobody → Salvatore Orlando (salvatore-orlando)
Mark McLoughlin (markmc)
Changed in nova:
status: New → Confirmed
importance: Undecided → Critical
milestone: none → folsom-rc1
Revision history for this message
dan wendlandt (danwent) wrote :

for issue #1, I think this basically just amounts to replicating the code from from _do_trigger_security_group_members_refresh_for_instance() in nova/network/manager.py in nova/network/quantumv2/api.py

I think this should be straightforward.

Revision history for this message
dan wendlandt (danwent) wrote :

For issue #2: we're making sure there's a simple API call that can be made from nova/network/quantumv2/api.py to quantum to fetch the DHCP IP for a given network.

this will not be able to handle more advanced DHCP use cases, such as if the DHCP IP address is changed after the VM boots.

Revision history for this message
dan wendlandt (danwent) wrote :

#3 is a hybrid of two existing vif drivers, so there shouldn't really be much in terms of net new code. this code would only be enabled optionally. I will try to whip this out today, as it should be very simple if it ends up working. Otherwise, we'll drop it.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

On Issue #1 I agree with Dan. The remaining hooks for setting up security group rules are already in place in the virt driver and there's no need to add more hooks or modify the existing ones for quantum.
We need to ensure network_info contains all the required information - and this is the goal of #2.
On issue #2, I agree we should not spend too much time trying to make it work with multiple dhcp agents.

As regards #3:
Dan, is this comment related to multi-NIC? If you can flesh it out today it would be great. In the meanwhile, I will work on fixing #1 and #2 on nova.network.quantumv2.api and quantum itself.

Changed in nova:
assignee: nobody → Salvatore Orlando (salvatore-orlando)
Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

After some "hands on" experience, it looks like the potential race condition is much less of an issue then what previously thought.
The dhcp agent indeed runs enable_dhcp_helper and refresh_dhcp_helper every time a network or a subnet are created, updated, or deleted. These two methods will call 'enable' method on the dhcp driver.
The enable method, is the dhcp server is not yet active, will invoke the get_dhcp_port using the RPC interfaces. The latter method is responsible for querying or retrieving the port used by the dhcp server.

The race condition is therefore limited to the cases in which a script creates a subnet (and possibly a network) and then immediately spawns an instance.
To do so, we can either:
1) add a sleep/retry mechanism in allocate_for_instance when retrieving the dhcp port (trivial but not extremely efficient)
2) create an 'unbound' dhcp port when a subnet is created. This port will have the owner field set to 'network:dhcp' but no device_id. This will allow nova queries to work in any case. get_dhcp_port can be tweaked to look for such unbound port, and then update the device_id on the port itself with the device_id of a specific agent. This solution is not trivial, albeit not difficult at all, but will avoid having sleeps and loops in the nova/quantum integration code.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

It seems we won't be able to support multiple DHCP agents, as the only a single dhcp_server entry per network is supported.
Supporting multiple dhcp_server will require us to tweak _do_dhcp_rules in the firewall_driver classes, and I don't think this is the right time in the dev cycle for doing something like this.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

On another note, the firewall driver when multiple subnets are defined for a vif, considers the first subnet only.
This means that VIFs with multiple ips won't work when security groups are enabled - I guess this is fine and should be the expected behavior as the same already happens with nova.

Revision history for this message
dan wendlandt (danwent) wrote :

I have the three issues resolved. Just need to run tests and deal with any resulting issues, then I should be able to push for a review

Changed in nova:
assignee: Salvatore Orlando (salvatore-orlando) → dan wendlandt (danwent)
Changed in quantum:
assignee: Salvatore Orlando (salvatore-orlando) → dan wendlandt (danwent)
Changed in nova:
status: Confirmed → In Progress
Changed in quantum:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/12173

Mark McLoughlin (markmc)
tags: added: quantum
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/12173
Committed: http://github.com/openstack/nova/commit/5a470f89b6a508d578b89a1687d327efbc834346
Submitter: Jenkins
Branch: master

commit 5a470f89b6a508d578b89a1687d327efbc834346
Author: Dan Wendlandt <email address hidden>
Date: Thu Aug 30 22:21:51 2012 -0700

    fix issues with Nova security groups and Quantum

    bug #1039400

    - make quantumv2/api.py fetch actual DHCP server address, which
    is needed by firewall layer (otherwise, the gateway IP is
    incorrectly used and all DHCP traffic is dropped).
    - add missing call from quantumv2/api.py to the security
    groups API when a VM is allocated/deallocated.

    - Add a vif-driver that is a hybrid of the existing Open vswitch +
    linux bridge drivers, which allows OVS quantum plugins to
    be compatible with iptables based filtering, in particular, nova
    security groups.

    - Also clean-up some docstrings in virt/libvirt/vif.py

    Change-Id: I7cf5cf09583202a12785b616d18db3ee4bbffee0

Changed in nova:
status: In Progress → Fix Committed
dan wendlandt (danwent)
Changed in quantum:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in quantum:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in quantum:
milestone: folsom-rc1 → 2012.2
Thierry Carrez (ttx)
Changed in nova:
milestone: folsom-rc1 → 2012.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.