network not always cleaned up when spawning VMs

Bug #1597596 reported by Aihua Edward Li
66
This bug affects 12 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann
Ocata
Confirmed
Medium
Unassigned
Pike
Fix Committed
Medium
Matt Riedemann
Queens
Fix Committed
Medium
Matt Riedemann

Bug Description

Here are the scenario:
1). Nova scheduler/conductor selects a nova-compute A to spin a VM
2). Nova compute A tries to spin the VM, but the process failed, and generates a RE-SCHEDULE exception.
3). in re-schedule exception, only when retry is none, network resource is properly cleaned up. when retry is not none, the network is not cleaned up, the port information still stays with the VM.
4). Nova condutor was notified about the failure. It selects nova-compute-B to spin VM.
5). nova compute B spins up VM successfully. However, from the instance_info_cache, the network_info showed two ports allocated for VM, one from the origin network A that associated with nova-compute A nad one from network B that associated with nova compute B.

To simulate the case, raise a fake exception in _do_build_and_run_instance in nova-compute A:

diff --git a/nova/compute/manager.py b/nova/compute/manager.py
index ac6d92c..8ce8409 100644
--- a/nova/compute/manager.py
+++ b/nova/compute/manager.py
@@ -1746,6 +1746,7 @@ class ComputeManager(manager.Manager):
                         filter_properties)
             LOG.info(_LI('Took %0.2f seconds to build instance.'),
                      timer.elapsed(), instance=instance)
+ raise exception.RescheduledException( instance_uuid=instance.uuid, reason="simulated-fault")
             return build_results.ACTIVE
         except exception.RescheduledException as e:
             retry = filter_properties.get('retry')

environments:
*) nova master branch
*) ubuntu 12.04
*) kvm
*) bridged network.

Tags: network
summary: - network not alwasy cleaned up when spawning VMs
+ network not always cleaned up when spawning VMs
Changed in nova:
assignee: nobody → Aihua Edward Li (aihuaedwardli)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/335788

Changed in nova:
status: New → In Progress
status: In Progress → Fix Committed
Revision history for this message
Takashi Natsume (natsume-takashi) wrote :

The status is 'Fix Committed'.
But the patch has not been merged yet.

Has the issue already been fixed?

Revision history for this message
Aihua Edward Li (aihuaedwardli) wrote :

The patch is in review state. I am waiting for core reviewer's +2. Any help would be appreciated.

Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

Cleanup of inconsistency: Bug reports which have a change for review in Gerrit should have the status "In Progress". When the change gets merged the status changes automatically to "Fix Released". We don't use "Fix Committed" anymore [1].

References:
[1] "[openstack-dev] [release][all] bugs will now close automatically
    when patches merge"; Doug Hellmann; 2015-12-07;
    http://lists.openstack.org/pipermail/openstack-dev/2015-December/081612.html

Changed in nova:
status: Fix Committed → In Progress
Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

So I'm wondering whether the proper fix for this bug indeed is to delete the first port. If I understand the code correctly, the intention of not deleting the port when rescheduling is that it could be reused on the second compute node. But that reuse does not seem to happen, instead nova allocates another port, leaving the first one pending. I've added some sample output at http://paste.openstack.org/show/526875/

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

Also note that cleanup if the rescheduling fails three times seems to have been implemented in https://bugs.launchpad.net/nova/+bug/1510979. So when I have permanent rescheduling failures, I will get an instance with three ports assigned, but after a short interval, these will be cleaned up and the failed instance will have no network assigned anymore.

The real issue happens if the instance can be created successfully after one or two reschedules, then only the last address will be active on the instance, while e.g. associating a floatingip will by default bind it to the first address, leading to broken connectivity.

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

Looking at nova/compute/manager.py in _allocate_network_async() there is code and comment

                instance.system_metadata['network_allocated'] = 'True'
                # NOTE(JoshNang) do not save the instance here, as it can cause
                # races. The caller shares a reference to instance and waits
                # for this async greenthread to finish before calling
                # instance.save().

But that doesn't seem to be true, the corresponding code in

                    # NOTE(JoshNang) This also saves the changes to the
                    # instance from _allocate_network_async, as they aren't
                    # saved in that function to prevent races.
                    instance.save(expected_task_state=
                            task_states.BLOCK_DEVICE_MAPPING)

gets executed earlier, as in the log I can see the self.driver.spawn() call below this code being executed before _allocate_network_async logs the assigned network info.

Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
Aihua Edward Li (aihuaedwardli) wrote :

In response to Dr. Rosenboom's comments:
1. "the intention of not deleting the port when rescheduling is that it could be reused on the second compute node."
In our use case, we are using bridged network mode, the network_info allocated initially on first compute is useless and cause adverse side effect. We need to clean up the network_resource associated with the first compute.
2. "Also note that cleanup if the rescheduling fails three times seems to have been implemented in https://bugs.launchpad.net/nova/+bug/1510979."
This is a different issue than what we encounter, in our case, the VM was retried and spawn up on a second compute, there is no "failure" to nova code and there is no chance for the network to get cleaned up.
3. "the self.driver.spawn() call below this code being executed before _allocate_network_async logs the assigned network info."
This is also a separate issue. We like to address the issue that network info was not cleaned up on some code path.

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

> In our use case, we are using bridged network mode, the network_info allocated initially on
> first compute is useless and cause adverse side effect. We need to clean up
> the network_resource associated with the first compute.

If that is indeed the case, then your driver should return true instead of false in self.driver.deallocate_networks_on_reschedule() and your issue would be fixed.

Revision history for this message
Aihua Edward Li (aihuaedwardli) wrote :

We used standard nova/virt/driver.py, the current implementation is

    def deallocate_networks_on_reschedule(self, instance):
        """Does the driver want networks deallocated on reschedule?"""
        return False

Revision history for this message
Jesse Keating (jesse-keating) wrote :

This looks like a similar issue that we're seeing, and we're using the libvirt driver, which doesn't override deallocate_networks_on_reschedule.

In our scenario, we're seeing the instance get built with 2 or 3 fixed IPs instead of just the one. It seems that on retry a whole new port is allocated, even though there was already one allocated from the first attempt. Since our networks are more portable, they succeed in building with multiple ports, which is slightly different than what the proposed patch is trying to address.

Revision history for this message
Aihua Edward Li (aihuaedwardli) wrote :

Hi, Jesse,

Thanks for the info. Yes, we also see multiple ports allocated after retry. The root cause is the same, the network resources are not cleaned up. The proposed fix should resolve your issue as well.

Aihua

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

@Jesse: When you say that building with multiple ports succeeds, do you see all of the ports connected to your instance or only the final one?

Revision history for this message
Jesse Keating (jesse-keating) wrote :

@Aihua Looking at the patch, it seems to only clean up the network resources if the ports are not portable, in your case the ports only work on specific HVs. This isn't the case for us, so I don't think your change would impact our scenario.

@Jens, I will research this and report back.

Revision history for this message
Jesse Keating (jesse-keating) wrote :

@Jens It appears that only the final address exists in the VM itself, and the other ports are just assigned to the VM in the databases.

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

@Jesse Ok, that matches what I am seeing, thanks for confirming. If you still have some of these instances online, you could also check in the nova DB whether any of them have entries for 'network_allocated' in the system_metadata table (cf. https://bugs.launchpad.net/nova/+bug/1597596/comments/7). Assuming there are none, that would undermine my assumption that these async processes are not executed in the way described in the code.

Revision history for this message
Jesse Keating (jesse-keating) wrote :

@jens, I'm attaching photos to show the instance in question and the system metadata associated with it.

Revision history for this message
Jesse Keating (jesse-keating) wrote :
Revision history for this message
Jesse Keating (jesse-keating) wrote :
Revision history for this message
Aihua Edward Li (aihuaedwardli) wrote :

@Jesse, Thanks for sharing. This matches to what we see.
Are you planning to propose a fix ( in nova scheduler )? That might fix my problem also.

Revision history for this message
Jesse Keating (jesse-keating) wrote :

I have no plans to propose a fix. That's a bit beyond what I'm capable of at this time.

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

I've still not found a proper way to fix this, but I would like to share the attached patch that will allow to easily reproduce the issue on a simple devstack node. It consists of three parts:

1. Patch nova/virt/libvirt/driver.py to insert a single error when spawning an instance. This will trigger a Reschedule and let spawning succeed on the second attempt.
2. Patch nova/scheduler/utils.py so that the Reschedule may happen on the same host again. The default behaviour will exclude the hosts where the instance has been scheduled first, which would imply that one needs a multi-node setup in order to reproduce this issue.
3. Add some debugging output to nova/compute/manager.py

The first instance booted after applying this patch (and restarting n-cond & n-cpu) will be running fine but with two addresses allocated.

It looks to me like the first address is generated properly in the n-cpu manager, but is then treated correctly in the conductor during rescheduling.

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

@Aihua: Are you still working on this? https://review.openstack.org/335788 is in merge conflict and hasn't been updated for some time. If not, please unassign and let someone else take over.

Changed in nova:
assignee: Aihua Edward Li (aihuaedwardli) → nobody
Changed in nova:
status: In Progress → Confirmed
tags: added: network
Changed in nova:
assignee: nobody → srividyaketharaju (srividya)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/335788
Reason: This review is > 6 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in nova:
assignee: srividyaketharaju (srividya) → nobody
Revision history for this message
MarginHu (margin2017) wrote :

comment #13 ,

In my environment,I experienced both two scenes,
1. one port is existed in DB, but only the last port is attached VM.
2. two ports are attached VM (refer https://bugs.launchpad.net/nova/+bug/1722559)

Matt Riedemann (mriedem)
Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/520248

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/555418

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/520248
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3a503a8f2b934f19049531c5c92130ca7cdd6a7f
Submitter: Zuul
Branch: master

commit 3a503a8f2b934f19049531c5c92130ca7cdd6a7f
Author: Matt Riedemann <email address hidden>
Date: Wed Nov 15 19:15:44 2017 -0500

    Always deallocate networking before reschedule if using Neutron

    When a server build fails on a selected compute host, the compute
    service will cast to conductor which calls the scheduler to select
    another host to attempt the build if retries are not exhausted.

    With commit 08d24b733ee9f4da44bfbb8d6d3914924a41ccdc, if retries
    are exhausted or the scheduler raises NoValidHost, conductor will
    deallocate networking for the instance. In the case of neutron, this
    means unbinding any ports that the user provided with the server
    create request and deleting any ports that nova-compute created during
    the allocate_for_instance() operation during server build.

    When an instance is deleted, it's networking is deallocated in the same
    way - unbind pre-existing ports, delete ports that nova created.

    The problem is when rescheduling from a failed host, if we successfully
    reschedule and build on a secondary host, any ports created from the
    original host are not cleaned up until the instance is deleted. For
    Ironic or SR-IOV ports, those are always deallocated.

    The ComputeDriver.deallocate_networks_on_reschedule() method defaults
    to False just so that the Ironic driver could override it, but really
    we should always cleanup neutron ports before rescheduling.

    Looking over bug report history, there are some mentions of different
    networking backends handling reschedules with multiple ports differently,
    in that sometimes it works and sometimes it fails. Regardless of the
    networking backend, however, we are at worst taking up port quota for
    the tenant for ports that will not be bound to whatever host the instance
    ends up on.

    There could also be legacy reasons for this behavior with nova-network,
    so that is side-stepped here by just restricting this check to whether
    or not neutron is being used. When we eventually remove nova-network we
    can then also remove the deallocate_networks_on_reschedule() method and
    SR-IOV check.

    Change-Id: Ib2abf73166598ff14fce4e935efe15eeea0d4f7d
    Closes-Bug: #1597596

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/555907

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/555418
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9203326f84cd35243e9e6a73cd5fac62af27aaf5
Submitter: Zuul
Branch: stable/queens

commit 9203326f84cd35243e9e6a73cd5fac62af27aaf5
Author: Matt Riedemann <email address hidden>
Date: Wed Nov 15 19:15:44 2017 -0500

    Always deallocate networking before reschedule if using Neutron

    When a server build fails on a selected compute host, the compute
    service will cast to conductor which calls the scheduler to select
    another host to attempt the build if retries are not exhausted.

    With commit 08d24b733ee9f4da44bfbb8d6d3914924a41ccdc, if retries
    are exhausted or the scheduler raises NoValidHost, conductor will
    deallocate networking for the instance. In the case of neutron, this
    means unbinding any ports that the user provided with the server
    create request and deleting any ports that nova-compute created during
    the allocate_for_instance() operation during server build.

    When an instance is deleted, it's networking is deallocated in the same
    way - unbind pre-existing ports, delete ports that nova created.

    The problem is when rescheduling from a failed host, if we successfully
    reschedule and build on a secondary host, any ports created from the
    original host are not cleaned up until the instance is deleted. For
    Ironic or SR-IOV ports, those are always deallocated.

    The ComputeDriver.deallocate_networks_on_reschedule() method defaults
    to False just so that the Ironic driver could override it, but really
    we should always cleanup neutron ports before rescheduling.

    Looking over bug report history, there are some mentions of different
    networking backends handling reschedules with multiple ports differently,
    in that sometimes it works and sometimes it fails. Regardless of the
    networking backend, however, we are at worst taking up port quota for
    the tenant for ports that will not be bound to whatever host the instance
    ends up on.

    There could also be legacy reasons for this behavior with nova-network,
    so that is side-stepped here by just restricting this check to whether
    or not neutron is being used. When we eventually remove nova-network we
    can then also remove the deallocate_networks_on_reschedule() method and
    SR-IOV check.

    Change-Id: Ib2abf73166598ff14fce4e935efe15eeea0d4f7d
    Closes-Bug: #1597596
    (cherry picked from commit 3a503a8f2b934f19049531c5c92130ca7cdd6a7f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/555907
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ad20a87028523f0a1fdf2e9319fac4537c9fbbf3
Submitter: Zuul
Branch: stable/pike

commit ad20a87028523f0a1fdf2e9319fac4537c9fbbf3
Author: Matt Riedemann <email address hidden>
Date: Wed Nov 15 19:15:44 2017 -0500

    Always deallocate networking before reschedule if using Neutron

    When a server build fails on a selected compute host, the compute
    service will cast to conductor which calls the scheduler to select
    another host to attempt the build if retries are not exhausted.

    With commit 08d24b733ee9f4da44bfbb8d6d3914924a41ccdc, if retries
    are exhausted or the scheduler raises NoValidHost, conductor will
    deallocate networking for the instance. In the case of neutron, this
    means unbinding any ports that the user provided with the server
    create request and deleting any ports that nova-compute created during
    the allocate_for_instance() operation during server build.

    When an instance is deleted, it's networking is deallocated in the same
    way - unbind pre-existing ports, delete ports that nova created.

    The problem is when rescheduling from a failed host, if we successfully
    reschedule and build on a secondary host, any ports created from the
    original host are not cleaned up until the instance is deleted. For
    Ironic or SR-IOV ports, those are always deallocated.

    The ComputeDriver.deallocate_networks_on_reschedule() method defaults
    to False just so that the Ironic driver could override it, but really
    we should always cleanup neutron ports before rescheduling.

    Looking over bug report history, there are some mentions of different
    networking backends handling reschedules with multiple ports differently,
    in that sometimes it works and sometimes it fails. Regardless of the
    networking backend, however, we are at worst taking up port quota for
    the tenant for ports that will not be bound to whatever host the instance
    ends up on.

    There could also be legacy reasons for this behavior with nova-network,
    so that is side-stepped here by just restricting this check to whether
    or not neutron is being used. When we eventually remove nova-network we
    can then also remove the deallocate_networks_on_reschedule() method and
    SR-IOV check.

    NOTE(mriedem): There are a couple of changes to the unit test for code
    that didn't exist in Pike, due to the change for alternate hosts
    Iae904afb6cb4fcea8bb27741d774ffbe986a5fb4 and the change to pass the
    request spec to conductor Ie5233bd481013413f12e55201588d37a9688ae78.

    Change-Id: Ib2abf73166598ff14fce4e935efe15eeea0d4f7d
    Closes-Bug: #1597596
    (cherry picked from commit 3a503a8f2b934f19049531c5c92130ca7cdd6a7f)
    (cherry picked from commit 9203326f84cd35243e9e6a73cd5fac62af27aaf5)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.2

This issue was fixed in the openstack/nova 17.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.1

This issue was fixed in the openstack/nova 16.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b1

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.