Rescheduled instance with pre-existing port fails with PortInUse exception

Bug #1749838 reported by iain MacDonnell
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
New
Undecided
Unassigned

Bug Description

Attempting to create an instance that uses an existing neutron port, when the instance creation fails on the first compute node, and gets rescheduled to another compute node, the rescheduled attempt fails with a PortInUse exception. In case it matters, I'm using neutron ML2 with linuxbridge and the port is on a VLAN provider network.

Steps to reproduce (starting with an AZ/aggregate with two functional compute nodes up and running):

1. Create a neutron port, and make a note of the ID (os port create --network XXX myport)
2. Inject a failure on the first node - e.g. by renaming the qemu binary
3. Create an instance, using the port created earlier (openstack server create --nic port-id=XXX --image cirros --flavor m1.tiny myvm)

The instance will fail on the first node, and get rescheduled on the second, where it will fail with:

2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager Traceback (most recent call last):
2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1415, in _allocate_network_async
2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager bind_host_id=bind_host_id)
2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 855, in allocate_for_instance
2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager context, instance, neutron, requested_networks)
2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 573, in _validate_requested_port_ids
2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager raise exception.PortInUse(port_id=request.port_id)
2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager PortInUse: Port 9fd24371-e906-4af2-898a-eaef223abca9 is still in use.

I've reproduced this on both Ocata and Pike. It does not seem to happen if the port is created by nova (i.e. openstack server create --nic net-id=XXX ...)

This looks a bit like https://bugs.launchpad.net/nova/+bug/1308405 , but that's supposed to have been fixed long ago.

Matt Riedemann (mriedem)
tags: added: compute neutron
Revision history for this message
iain MacDonnell (imacdonn) wrote :
Revision history for this message
iain MacDonnell (imacdonn) wrote :
summary: - Rescheduled instace with pre-existing port fails with PortInUse
+ Rescheduled instance with pre-existing port fails with PortInUse
exception
Revision history for this message
Matt Riedemann (mriedem) wrote :

It's failing here because the port already has a device_id set (an instance id):

https://github.com/openstack/nova/blob/stable/pike/nova/network/neutronv2/api.py#L572

But we should unset that when cleaning up and unbinding the port on the first host before rescheduling:

https://github.com/openstack/nova/blob/stable/pike/nova/network/neutronv2/api.py#L511

Do you see this error in the logs on the first host?

LOG.exception(_LE("Unable to clear device ID "
"for port '%s'"), port_id)

Revision history for this message
Matt Riedemann (mriedem) wrote :

This is likely the same issue as is being fixed here:

https://review.openstack.org/#/c/520248/

Revision history for this message
Matt Riedemann (mriedem) wrote :

(5:18:06 PM) mriedem: in the case of nova creating a port,
(5:18:17 PM) mriedem: it doesn't fail because nova orphans the port created from the first host, and creates a new port when going through the 2nd host
(5:18:29 PM) mriedem: so you end up with 2 ports for the instance that nova created even though you're only using 1
(5:18:47 PM) mriedem: in the case that you bring a port, nova doesn't unbind it before rescheduling, and that's why we fail to use it on the 2nd host

Revision history for this message
iain MacDonnell (imacdonn) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.