network not always cleaned up when spawning VMs

Bug #1597596 reported by Aihua Edward Li on 2016-06-30
56
This bug affects 11 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Matt Riedemann
Ocata
Medium
Unassigned
Pike
Medium
Unassigned

Bug Description

Here are the scenario:
1). Nova scheduler/conductor selects a nova-compute A to spin a VM
2). Nova compute A tries to spin the VM, but the process failed, and generates a RE-SCHEDULE exception.
3). in re-schedule exception, only when retry is none, network resource is properly cleaned up. when retry is not none, the network is not cleaned up, the port information still stays with the VM.
4). Nova condutor was notified about the failure. It selects nova-compute-B to spin VM.
5). nova compute B spins up VM successfully. However, from the instance_info_cache, the network_info showed two ports allocated for VM, one from the origin network A that associated with nova-compute A nad one from network B that associated with nova compute B.

To simulate the case, raise a fake exception in _do_build_and_run_instance in nova-compute A:

diff --git a/nova/compute/manager.py b/nova/compute/manager.py
index ac6d92c..8ce8409 100644
--- a/nova/compute/manager.py
+++ b/nova/compute/manager.py
@@ -1746,6 +1746,7 @@ class ComputeManager(manager.Manager):
                         filter_properties)
             LOG.info(_LI('Took %0.2f seconds to build instance.'),
                      timer.elapsed(), instance=instance)
+ raise exception.RescheduledException( instance_uuid=instance.uuid, reason="simulated-fault")
             return build_results.ACTIVE
         except exception.RescheduledException as e:
             retry = filter_properties.get('retry')

environments:
*) nova master branch
*) ubuntu 12.04
*) kvm
*) bridged network.

summary: - network not alwasy cleaned up when spawning VMs
+ network not always cleaned up when spawning VMs
Changed in nova:
assignee: nobody → Aihua Edward Li (aihuaedwardli)
Changed in nova:
status: New → In Progress
status: In Progress → Fix Committed

The status is 'Fix Committed'.
But the patch has not been merged yet.

Has the issue already been fixed?

Aihua Edward Li (aihuaedwardli) wrote :

The patch is in review state. I am waiting for core reviewer's +2. Any help would be appreciated.

Cleanup of inconsistency: Bug reports which have a change for review in Gerrit should have the status "In Progress". When the change gets merged the status changes automatically to "Fix Released". We don't use "Fix Committed" anymore [1].

References:
[1] "[openstack-dev] [release][all] bugs will now close automatically
    when patches merge"; Doug Hellmann; 2015-12-07;
    http://lists.openstack.org/pipermail/openstack-dev/2015-December/081612.html

Changed in nova:
status: Fix Committed → In Progress
Dr. Jens Harbott (j-harbott) wrote :

So I'm wondering whether the proper fix for this bug indeed is to delete the first port. If I understand the code correctly, the intention of not deleting the port when rescheduling is that it could be reused on the second compute node. But that reuse does not seem to happen, instead nova allocates another port, leaving the first one pending. I've added some sample output at http://paste.openstack.org/show/526875/

Dr. Jens Harbott (j-harbott) wrote :

Also note that cleanup if the rescheduling fails three times seems to have been implemented in https://bugs.launchpad.net/nova/+bug/1510979. So when I have permanent rescheduling failures, I will get an instance with three ports assigned, but after a short interval, these will be cleaned up and the failed instance will have no network assigned anymore.

The real issue happens if the instance can be created successfully after one or two reschedules, then only the last address will be active on the instance, while e.g. associating a floatingip will by default bind it to the first address, leading to broken connectivity.

Dr. Jens Harbott (j-harbott) wrote :

Looking at nova/compute/manager.py in _allocate_network_async() there is code and comment

                instance.system_metadata['network_allocated'] = 'True'
                # NOTE(JoshNang) do not save the instance here, as it can cause
                # races. The caller shares a reference to instance and waits
                # for this async greenthread to finish before calling
                # instance.save().

But that doesn't seem to be true, the corresponding code in

                    # NOTE(JoshNang) This also saves the changes to the
                    # instance from _allocate_network_async, as they aren't
                    # saved in that function to prevent races.
                    instance.save(expected_task_state=
                            task_states.BLOCK_DEVICE_MAPPING)

gets executed earlier, as in the log I can see the self.driver.spawn() call below this code being executed before _allocate_network_async logs the assigned network info.

Matt Riedemann (mriedem) on 2016-07-07
Changed in nova:
importance: Undecided → Medium
Aihua Edward Li (aihuaedwardli) wrote :

In response to Dr. Rosenboom's comments:
1. "the intention of not deleting the port when rescheduling is that it could be reused on the second compute node."
In our use case, we are using bridged network mode, the network_info allocated initially on first compute is useless and cause adverse side effect. We need to clean up the network_resource associated with the first compute.
2. "Also note that cleanup if the rescheduling fails three times seems to have been implemented in https://bugs.launchpad.net/nova/+bug/1510979."
This is a different issue than what we encounter, in our case, the VM was retried and spawn up on a second compute, there is no "failure" to nova code and there is no chance for the network to get cleaned up.
3. "the self.driver.spawn() call below this code being executed before _allocate_network_async logs the assigned network info."
This is also a separate issue. We like to address the issue that network info was not cleaned up on some code path.

Dr. Jens Harbott (j-harbott) wrote :

> In our use case, we are using bridged network mode, the network_info allocated initially on
> first compute is useless and cause adverse side effect. We need to clean up
> the network_resource associated with the first compute.

If that is indeed the case, then your driver should return true instead of false in self.driver.deallocate_networks_on_reschedule() and your issue would be fixed.

We used standard nova/virt/driver.py, the current implementation is

    def deallocate_networks_on_reschedule(self, instance):
        """Does the driver want networks deallocated on reschedule?"""
        return False

Jesse Keating (jesse-keating) wrote :

This looks like a similar issue that we're seeing, and we're using the libvirt driver, which doesn't override deallocate_networks_on_reschedule.

In our scenario, we're seeing the instance get built with 2 or 3 fixed IPs instead of just the one. It seems that on retry a whole new port is allocated, even though there was already one allocated from the first attempt. Since our networks are more portable, they succeed in building with multiple ports, which is slightly different than what the proposed patch is trying to address.

Hi, Jesse,

Thanks for the info. Yes, we also see multiple ports allocated after retry. The root cause is the same, the network resources are not cleaned up. The proposed fix should resolve your issue as well.

Aihua

Dr. Jens Harbott (j-harbott) wrote :

@Jesse: When you say that building with multiple ports succeeds, do you see all of the ports connected to your instance or only the final one?

Jesse Keating (jesse-keating) wrote :

@Aihua Looking at the patch, it seems to only clean up the network resources if the ports are not portable, in your case the ports only work on specific HVs. This isn't the case for us, so I don't think your change would impact our scenario.

@Jens, I will research this and report back.

Jesse Keating (jesse-keating) wrote :

@Jens It appears that only the final address exists in the VM itself, and the other ports are just assigned to the VM in the databases.

Dr. Jens Harbott (j-harbott) wrote :

@Jesse Ok, that matches what I am seeing, thanks for confirming. If you still have some of these instances online, you could also check in the nova DB whether any of them have entries for 'network_allocated' in the system_metadata table (cf. https://bugs.launchpad.net/nova/+bug/1597596/comments/7). Assuming there are none, that would undermine my assumption that these async processes are not executed in the way described in the code.

Jesse Keating (jesse-keating) wrote :

@jens, I'm attaching photos to show the instance in question and the system metadata associated with it.

Jesse Keating (jesse-keating) wrote :
Jesse Keating (jesse-keating) wrote :

@Jesse, Thanks for sharing. This matches to what we see.
Are you planning to propose a fix ( in nova scheduler )? That might fix my problem also.

Jesse Keating (jesse-keating) wrote :

I have no plans to propose a fix. That's a bit beyond what I'm capable of at this time.

Dr. Jens Harbott (j-harbott) wrote :

I've still not found a proper way to fix this, but I would like to share the attached patch that will allow to easily reproduce the issue on a simple devstack node. It consists of three parts:

1. Patch nova/virt/libvirt/driver.py to insert a single error when spawning an instance. This will trigger a Reschedule and let spawning succeed on the second attempt.
2. Patch nova/scheduler/utils.py so that the Reschedule may happen on the same host again. The default behaviour will exclude the hosts where the instance has been scheduled first, which would imply that one needs a multi-node setup in order to reproduce this issue.
3. Add some debugging output to nova/compute/manager.py

The first instance booted after applying this patch (and restarting n-cond & n-cpu) will be running fine but with two addresses allocated.

It looks to me like the first address is generated properly in the n-cpu manager, but is then treated correctly in the conductor during rescheduling.

Dr. Jens Harbott (j-harbott) wrote :

@Aihua: Are you still working on this? https://review.openstack.org/335788 is in merge conflict and hasn't been updated for some time. If not, please unassign and let someone else take over.

Changed in nova:
assignee: Aihua Edward Li (aihuaedwardli) → nobody
Changed in nova:
status: In Progress → Confirmed
tags: added: network
Changed in nova:
assignee: nobody → srividyaketharaju (srividya)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/335788
Reason: This review is > 6 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in nova:
assignee: srividyaketharaju (srividya) → nobody
MarginHu (margin2017) wrote :

comment #13 ,

In my environment,I experienced both two scenes,
1. one port is existed in DB, but only the last port is attached VM.
2. two ports are attached VM (refer https://bugs.launchpad.net/nova/+bug/1722559)

Matt Riedemann (mriedem) on 2017-11-15
Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)

Fix proposed to branch: master
Review: https://review.openstack.org/520248

Changed in nova:
status: Confirmed → In Progress
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers