NIC ordering not respected in network_config metadata

Bug #1156844 reported by Boris Deschenes
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Mathieu Mitchell

Bug Description

When booting a VM with multiple NICs (in different quantum networks)

and using

1. quantum linuxbridge plugin
2. network injection
3. config drive

The NIC ordering is not respected in the network_config metadata file located on the config drive. Sometimes the first quantum network will be eth0 and the second network eth1, sometimes the other way around (50% of the time, it works all the time!)

Basically, here are my observations:

1. quantum respects the ordering from the nova boot command (--nic net-id=<net1> --nic net-id=<net2>), meaning that the first port is in net1, second port in net2.
2. nova respects the ordering because it passes both networks in order to libvirt
3. libvirt respects the ordering because it presents the first interface to the VM in the first network with the right MAC and the second interface in the second network with the second MAC.
4. the config drive metadata DOES NOT RESPECT THE ORDERING, as half of the time eth0 will contain information for the first network and half the time, it will contains the information for the second network.

Basically, if I have this simple "RHEL-style" interfaces template:

#for $ifc in $interfaces
[${ifc.name}]
IPADDR=${ifc.address}
NETMASK=${ifc.netmask}
GATEWAY=${ifc.gateway}
#end for

And instantiate 6 identical VMs with "--nic net-id=<net1> --nic net-id=<net2>"

3 will have an injected network config with eth0 containing net1 information and eth1 containing net2 information
3 will have an injected network config with eth0 containing net2 information and eth1 containing net1 information

Since I'm using a config drive the injected network information is located on /config-2/openstack/content/0000 but it is basically the same network information that would be injected on a root FS if you were not using config drive.

My guess is the get_network_injected_template function of nova/virt/netutils.py does not get the network infromation in an ordered fashion, although parts of quantum and nova take care to store it in a particular order (the order that is used on the command-line in the case of the CLI).

thank you very much for your help

Boris

Revision history for this message
Mathieu Gagné (mgagne) wrote :

This problem can happen in a multi-nic scenarios when using injected network template.

In nova.api.metadata.base.InstanceMetadata, when fetching network information with network.API().get_instance_nw_info, the initial network order is lost as the list of network ports returned by Quantum is sorted by port_id.

The injected network template is then erroneous and the instance will not be able to correctly configure its network interfaces.

get_instance_nw_info accepts a networks parameter which can be used to "ensure ports are in preferred network order". Unfortunately, it seems nova.api.metadata.base.InstanceMetadata does not have such information at that time.

Should anyone have an idea on how to fix it, it would be much appreciated.

Mathieu Gagné (mgagne)
Changed in nova:
status: New → Confirmed
Rohit Karajgi (rohitk)
Changed in nova:
assignee: nobody → Rohit Karajgi (rohitkarajgi)
Revision history for this message
Jay Bryant (jsbryant) wrote :

It appears that this problem may be able to be circumvented by specifying the ordering by adding '--nic net-id=<uuid1> --nic net-id=<uuid2>' . I think we were seeing the same problem on our systems and then it went away. I think we started specifying the ordering with --nic which would cause this fix to resolve the issue: https://review.openstack.org/#/c/15087/

Revision history for this message
Mathieu Gagné (mgagne) wrote :

I'm already specifying --nic in my use case. As stated in my comment, the initial Nova network order is lost at some point.

The list of network ports returned by Quantum is sorted by port_id which obviously isn't always the corresponding order in Nova.

Even with the mentioned patch, "ensure ports are in preferred network order" can't sort the list of ports accordingly because it doesn't have access to the list of requested networks. The information is lost.

Revision history for this message
Mathieu Mitchell (mat128) wrote :

This issue only affects setups using ConfigDrive. When using a ConfigDrive instead of injection, the network information is fetched by API and the original network order is lost, as Mathieu Gagné mentioned.

Here is a patch to pass the original network_info to InstanceMetadata for use when creating the metadata object that will later be used by the ConfigDrive code to generate the network template.

This patch only supports libvirt and from what I can tell, Hyper-V and XenAPI are also affected. The fix would be the same; simply pass network_info to InstanceMetadata to be later used when preparing the metadata object. I unfortunately do not have any setup to test this.

This fix is also similar to the content passed to InstanceMetadata. User-provided content to be passed to the virtual machine is not stored and is simply passed to InstanceMetadata.

Changed in nova:
assignee: Rohit Karajgi (rohitkarajgi) → Mathieu Mitchell (mat128)
status: Confirmed → In Progress
Revision history for this message
Mathieu Mitchell (mat128) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/33682

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/33682
Committed: http://github.com/openstack/nova/commit/b0da1ab2310316f55735827aa68ef0167274ce27
Submitter: Jenkins
Branch: master

commit b0da1ab2310316f55735827aa68ef0167274ce27
Author: Mathieu Mitchell <email address hidden>
Date: Tue Jun 18 10:15:45 2013 -0400

    Preserve network order when using ConfigDrive

    Pass network_info to be used by InstanceMetadata instead of fetching
    it by API and losing originally requested network order. This is
    similar to what has been done for "content" where the original data
    is lost after the initial call.

    This issue only affects the ConfigDrive code path as the original
    network_info is used when using file injection.

    HyperV and XenApi are still probably affected by the bug but
    unaffected by this fix. The fix is the same for these drivers. Simply
    pass a network_info and it will be used instead of fetching the
    network info by API.

    Change-Id: Ie673b725cb47bf491009db99f6cb1258d46b0a69
    Fixes: bug #1156844

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → havana-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: havana-2 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.