Nova mess up interfaces when restart the instance

Bug #1501430 reported by Andrey Grebennikov
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
High
MOS Nova
6.1.x
Fix Released
High
Alex Ermolov
7.0.x
Invalid
High
MOS Maintenance
8.0.x
Invalid
High
MOS Nova

Bug Description

Fuel 6.1 Ubuntu

Same problem as described here:
https://bugs.launchpad.net/nova/+bug/1405271

Here’s the initial XML file… (notice the timestamp)…

root@compute5:/var/lib/nova/instances/419f417e-b434-4af6-896a-87b602756145# ls -al libvirt.xml
-rw-r--r-- 1 nova nova 3489 Sep 18 12:22 libvirt.xml

Here’s my initial XML config for the 3 interfaces…

<interface type="bridge">
  <mac address="fa:16:3e:9d:0e:c9"/>
  <model type="virtio"/>
  <source bridge="qbrfcddbc83-bb"/>
  <target dev="tapfcddbc83-bb"/>
</interface>
<interface type="bridge">
  <mac address="fa:16:3e:24:fb:5f"/>
  <model type="virtio"/>
  <source bridge="qbr541af055-f6"/>
  <target dev="tap541af055-f6"/>
</interface>
<interface type="bridge">
  <mac address="fa:16:3e:c5:7e:ef"/>
  <model type="virtio"/>
  <source bridge="qbr0e5b1dbc-ad"/>
  <target dev="tap0e5b1dbc-ad"/>
</interface>
Here’s are the interfaces seen on the VM (notice how the MACs line up with the XML file)…

{here is the output of "ifconfig" - eth0, eth1 and eth2 have the same order of mac adsresses}

Here’s the XML file after a power-cycle of the VM via Horizon… (notice the timestamp)…
root@compute5:/var/lib/nova/instances/419f417e-b434-4af6-896a-87b602756145# ls -al libvirt.xml
-rw-r--r-- 1 nova nova 3489 Sep 18 12:48 libvirt.xml

Here’s my XML config for the 3 interfaces after a power-cycle from Horizon (notice that the MAC order has changed)….

<interface type="bridge">
  <mac address="fa:16:3e:c5:7e:ef"/>
  <model type="virtio"/>
  <source bridge="qbr0e5b1dbc-ad"/>
  <target dev="tap0e5b1dbc-ad"/>
</interface>
<interface type="bridge">
  <mac address="fa:16:3e:24:fb:5f"/>
  <model type="virtio"/>
  <source bridge="qbr541af055-f6"/>
  <target dev="tap541af055-f6"/>
</interface>
<interface type="bridge">
  <mac address="fa:16:3e:9d:0e:c9"/>
  <model type="virtio"/>
  <source bridge="qbrfcddbc83-bb"/>
  <target dev="tapfcddbc83-bb"/>
</interface>
Here’s are the interfaces seen on the VM (notice how the MACs line up with the XML file and the IP addresses are now assigned to the worn interfaces)…
{here is the output of "ifconfig" - eth0, eth1 and eth2 have the same order of mac addsresses which differs from the original one}

-----------------------
Step 2:
The patch from Kilo has been backported and applied. Now the situation is a bit different:
Originally the instance is created with the interface order according "nova boot" command. After hard reboot it sorts the ports alphabetically.

Jay Pipes (jaypipes)
Changed in mos:
assignee: nobody → MOS Nova (mos-nova)
Changed in mos:
importance: Undecided → High
milestone: none → 7.0-mu-1
status: New → Confirmed
Changed in mos:
milestone: 7.0-mu-1 → 6.1-updates
no longer affects: mos/7.0.x
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-6.1/2014.2)

Fix proposed to branch: openstack-ci/fuel-6.1/2014.2
Change author: Vishvananda Ishaya <email address hidden>
Review: https://review.fuel-infra.org/12391

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

https://bugs.launchpad.net/nova/+bug/1405271 turned out to be only about nova-network, and does not fix the problem, if neutron is used for networking. We'll be working on a new fix.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Another point is that, even if we make the order consistent across hard reboots (and the original boot), it might not help with the consistent naming of interfaces within the VM, as the guest OS may initialise PCI devices asynchronously in any order. So if the original problem was about naming of interfaces in the VM, it's worth to take a look at udev rules supplied.

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Invalid for 7.0 and 8.0 as Kilo already contains the fix

Revision history for this message
Andrey Grebennikov (agrebennikov) wrote :

This is Not invalid - the original fix is only applicable to Nova-Network case. Please return it back to Confirmed.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Andrey, I understand they are being pain in the ass and insist on weird things, but let's clarify here how we would like to see this fixed:

1) do we want the order of NIC in libvirt XML be consistent for the original boot of a VM and all subsequent hard reboots?

or

2) we want 1) + the order correspond to the order of NICs supplied to "nova boot" command when booting a VM

AFAIU, they want 2), but that's seems to be a whole lot of trouble as we need to make sure the order is preserved among:

1) python-novaclient (parses the command and sends the data to nova-api)
2) nova-api (parses the data and calls nova-compute)
3) nova-compute (passes the data to Neutron when allocating ports, gets information about ports based on device_id == instance_id passed to Neutron)
4) neutron (creates ports and gets the info from the DB upon nova-compute request)
5) nova-compute again (uses the data retrieved from neutron to create libvirt XML)

I'm not sure we preserve the same order of interfaces at all of these stages. If it turns out we don't, fixing (or I should really say, changing of the behaviour) will be complicated, as it will most likely will mean API changes, which we can't do downstream (as we can't change API in the released versions).

We'll investigate this on devstack.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

IMO, this looks like a whole lot a trouble for no good reason. And, we are not sure this makes sense at all, as the order of interfaces in the libvirt XML does not guarantee consistent naming of interfaces within the guest OS...

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Kilo and Liberty are not affected

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote :

Fix proposed to branch: openstack-ci/fuel-6.1/2014.2
Change author: Mark Goddard <email address hidden>
Review: https://review.fuel-infra.org/12657

Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :
tags: added: support
Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :

@Roman, so if guest OS does not have udev rules an relies on order of nwtworks/ports which were supplied on instance boot it is completely wrong?
What would be your suggestions to fix it? I see these options:
1) Use udev rules inside of vm
2) Make application not rely on order
3) Use "Consistent Network Device Naming"
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/appe-Consistent_Network_Device_Naming.html

But what if customer use some custom os, and he knows for sure that pci devices are initialized not asynchronously, but exactly in the order how they sit on the bus? In that case he would expect to see variant number 2) that you described.

>>I'm not sure we preserve the same order of interfaces at all of these stages.
Can you confirm it is not preserved?

>>the order of interfaces in the libvirt XML does not guarantee consistent naming of interfaces within the guest OS
But does order in libvirt xml guarantee same order on the bus?

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Nova does its best effort and propagates the passed networks (preserving the order) from the python-novaclient CLI via nova-api up to the libvirt domain XML on the compute node (with the fixes mentioned in this LP bug applied).

I'm not entirely sure how libvirt / qemu work here, but looks like interfaces will appear in the same order on the PCI bus, as they are stated in the domain XML. This is what we saw, when tested this. I'm not sure this is guaranteed by libvirt / qemu, though.

That said, it's up to the guest OS how it's initializes those interfaces. It might as well do that asynchronously and all that work on preserving of the interfaces order in Nova /libvirt XML would be a waste.

IMO, the best way to deal with this - is in the guest OS, where you have full control on naming of your interfaces by the means of udev rules.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

>>I'm not sure we preserve the same order of interfaces at all of these stages. Can you confirm it is not preserved?

We do now, with the patches mentioned in this thread applied.

>>the order of interfaces in the libvirt XML does not guarantee consistent naming of interfaces within the guest OS. But does order in libvirt xml guarantee same order on the bus?

It looks like so, but I'm not sure if it's guaranteed.

>>> But what if customer use some custom os, and he knows for sure that pci devices are initialized not asynchronously, but exactly in the order how they sit on the bus? In that case he would expect to see variant number 2) that you described.

We can only get this to the same order of interfaces in the libvirt domain XML (that's what patches are about).

Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :
Revision history for this message
jingsong.ge (jingsong-ge) wrote :
Revision history for this message
jingsong.ge (jingsong-ge) wrote :
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-6.1/2014.2)

Reviewed: https://review.fuel-infra.org/12657
Submitter: Vitaly Sedelnik <email address hidden>
Branch: openstack-ci/fuel-6.1/2014.2

Commit: e4894aadcb2e10aae2fd780b5c2700fae0d04f8a
Author: Mark Goddard <email address hidden>
Date: Mon Feb 29 16:21:02 2016

Refresh instance info cache within lock

Fix interface attachment bug where multiple concurrent attachment
requests can cause corruption of the nova instance info cache. This
change refreshes the info cache object from the database whilst
holding the refresh-cache lock, ensuring that changes are
synchronised.

Conflicts:
 nova/network/base_api.py
 nova/network/neutronv2/api.py
 nova/tests/network/test_neutronv2.py

Closes-Bug: #1501430
Closes-Bug: #1541838

Change-Id: I6ea2eda8a61f418b0c32f13a7ed6904352712857

tags: added: on-verification
Revision history for this message
Vadim Rovachev (vrovachev) wrote :

verified on 6.1 Ubuntu w/ nova packages with version 1:2014.2.2-1~u14.04+mos46

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.