Bug #1597596 “network not always cleaned up when spawning VMs” : Bugs : OpenStack Compute (nova)

Aihua Edward Li (aihuaedwardli) on 2016-06-30

summary:	- network not alwasy cleaned up when spawning VMs + network not always cleaned up when spawning VMs
Changed in nova:
assignee:	nobody → Aihua Edward Li (aihuaedwardli)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-30: Fix proposed to nova (master)

#1

Fix proposed to branch: master
Review: https://review.openstack.org/335788

Aihua Edward Li (aihuaedwardli) on 2016-06-30

Changed in nova:
status:	New → In Progress
status:	In Progress → Fix Committed

Revision history for this message

Takashi Natsume (natsume-takashi) wrote on 2016-07-05:

#2

The status is 'Fix Committed'.
But the patch has not been merged yet.

Has the issue already been fixed?

Revision history for this message

Aihua Edward Li (aihuaedwardli) wrote on 2016-07-05:

#3

The patch is in review state. I am waiting for core reviewer's +2. Any help would be appreciated.

Revision history for this message

Markus Zoeller (markus_z) (mzoeller) wrote on 2016-07-06:

#4

Cleanup of inconsistency: Bug reports which have a change for review in Gerrit should have the status "In Progress". When the change gets merged the status changes automatically to "Fix Released". We don't use "Fix Committed" anymore [1].

References:
[1] "[openstack-dev] [release][all] bugs will now close automatically
when patches merge"; Doug Hellmann; 2015-12-07;
http://lists.openstack.org/pipermail/openstack-dev/2015-December/081612.html

Changed in nova:
status:	Fix Committed → In Progress

Revision history for this message

Dr. Jens Harbott (j-harbott) wrote on 2016-07-07:

#5

So I'm wondering whether the proper fix for this bug indeed is to delete the first port. If I understand the code correctly, the intention of not deleting the port when rescheduling is that it could be reused on the second compute node. But that reuse does not seem to happen, instead nova allocates another port, leaving the first one pending. I've added some sample output at http://paste.openstack.org/show/526875/

Revision history for this message

Dr. Jens Harbott (j-harbott) wrote on 2016-07-07:

#6

Also note that cleanup if the rescheduling fails three times seems to have been implemented in https://bugs.launchpad.net/nova/+bug/1510979. So when I have permanent rescheduling failures, I will get an instance with three ports assigned, but after a short interval, these will be cleaned up and the failed instance will have no network assigned anymore.

The real issue happens if the instance can be created successfully after one or two reschedules, then only the last address will be active on the instance, while e.g. associating a floatingip will by default bind it to the first address, leading to broken connectivity.

Revision history for this message

Dr. Jens Harbott (j-harbott) wrote on 2016-07-07:

#7

Looking at nova/compute/manager.py in _allocate_network_async() there is code and comment

                instance.system_metadata['network_allocated'] = 'True'
                # NOTE(JoshNang) do not save the instance here, as it can cause
                # races. The caller shares a reference to instance and waits
                # for this async greenthread to finish before calling
                # instance.save().

But that doesn't seem to be true, the corresponding code in

                    # NOTE(JoshNang) This also saves the changes to the
                    # instance from _allocate_network_async, as they aren't
                    # saved in that function to prevent races.
                    instance.save(expected_task_state=
                            task_states.BLOCK_DEVICE_MAPPING)

gets executed earlier, as in the log I can see the self.driver.spawn() call below this code being executed before _allocate_network_async logs the assigned network info.

Matt Riedemann (mriedem) on 2016-07-07

Changed in nova:
importance:	Undecided → Medium

Revision history for this message

Aihua Edward Li (aihuaedwardli) wrote on 2016-07-07:

#8

In response to Dr. Rosenboom's comments:
1. "the intention of not deleting the port when rescheduling is that it could be reused on the second compute node."
In our use case, we are using bridged network mode, the network_info allocated initially on first compute is useless and cause adverse side effect. We need to clean up the network_resource associated with the first compute.
2. "Also note that cleanup if the rescheduling fails three times seems to have been implemented in https://bugs.launchpad.net/nova/+bug/1510979."
This is a different issue than what we encounter, in our case, the VM was retried and spawn up on a second compute, there is no "failure" to nova code and there is no chance for the network to get cleaned up.
3. "the self.driver.spawn() call below this code being executed before _allocate_network_async logs the assigned network info."
This is also a separate issue. We like to address the issue that network info was not cleaned up on some code path.

Revision history for this message

Dr. Jens Harbott (j-harbott) wrote on 2016-07-11:

#9

> In our use case, we are using bridged network mode, the network_info allocated initially on
> first compute is useless and cause adverse side effect. We need to clean up
> the network_resource associated with the first compute.

If that is indeed the case, then your driver should return true instead of false in self.driver.deallocate_networks_on_reschedule() and your issue would be fixed.

Revision history for this message

Aihua Edward Li (aihuaedwardli) wrote on 2016-07-11:

#10

We used standard nova/virt/driver.py, the current implementation is

    def deallocate_networks_on_reschedule(self, instance):
        """Does the driver want networks deallocated on reschedule?"""
        return False

Revision history for this message

Jesse Keating (jesse-keating) wrote on 2016-07-19:

#11

This looks like a similar issue that we're seeing, and we're using the libvirt driver, which doesn't override deallocate_networks_on_reschedule.

In our scenario, we're seeing the instance get built with 2 or 3 fixed IPs instead of just the one. It seems that on retry a whole new port is allocated, even though there was already one allocated from the first attempt. Since our networks are more portable, they succeed in building with multiple ports, which is slightly different than what the proposed patch is trying to address.

Revision history for this message

Aihua Edward Li (aihuaedwardli) wrote on 2016-07-19:

#12

Hi, Jesse,

Thanks for the info. Yes, we also see multiple ports allocated after retry. The root cause is the same, the network resources are not cleaned up. The proposed fix should resolve your issue as well.

Aihua

Revision history for this message

Dr. Jens Harbott (j-harbott) wrote on 2016-07-19:

#13

@Jesse: When you say that building with multiple ports succeeds, do you see all of the ports connected to your instance or only the final one?

Revision history for this message

Jesse Keating (jesse-keating) wrote on 2016-07-19:

#14

@Aihua Looking at the patch, it seems to only clean up the network resources if the ports are not portable, in your case the ports only work on specific HVs. This isn't the case for us, so I don't think your change would impact our scenario.

@Jens, I will research this and report back.

Revision history for this message

Jesse Keating (jesse-keating) wrote on 2016-07-20:

#15

@Jens It appears that only the final address exists in the VM itself, and the other ports are just assigned to the VM in the databases.

Revision history for this message

Dr. Jens Harbott (j-harbott) wrote on 2016-07-20:

#16

@Jesse Ok, that matches what I am seeing, thanks for confirming. If you still have some of these instances online, you could also check in the nova DB whether any of them have entries for 'network_allocated' in the system_metadata table (cf. https://bugs.launchpad.net/nova/+bug/1597596/comments/7). Assuming there are none, that would undermine my assumption that these async processes are not executed in the way described in the code.

Revision history for this message

Jesse Keating (jesse-keating) wrote on 2016-07-22:

#17

@jens, I'm attaching photos to show the instance in question and the system metadata associated with it.

Revision history for this message

Jesse Keating (jesse-keating) wrote on 2016-07-22:

#18

instance system metadata Edit (18.6 KiB, image/png)

Revision history for this message

Jesse Keating (jesse-keating) wrote on 2016-07-22:

#19

instance show Edit (32.4 KiB, image/png)

Revision history for this message

Aihua Edward Li (aihuaedwardli) wrote on 2016-07-23:

#20

@Jesse, Thanks for sharing. This matches to what we see.
Are you planning to propose a fix ( in nova scheduler )? That might fix my problem also.

Revision history for this message

Jesse Keating (jesse-keating) wrote on 2016-08-04:

#21

I have no plans to propose a fix. That's a bit beyond what I'm capable of at this time.

Revision history for this message

Dr. Jens Harbott (j-harbott) wrote on 2016-08-25:

#22

nova.test.patch Edit (4.6 KiB, text/plain)

I've still not found a proper way to fix this, but I would like to share the attached patch that will allow to easily reproduce the issue on a simple devstack node. It consists of three parts:

1. Patch nova/virt/libvirt/driver.py to insert a single error when spawning an instance. This will trigger a Reschedule and let spawning succeed on the second attempt.
2. Patch nova/scheduler/utils.py so that the Reschedule may happen on the same host again. The default behaviour will exclude the hosts where the instance has been scheduled first, which would imply that one needs a multi-node setup in order to reproduce this issue.
3. Add some debugging output to nova/compute/manager.py

The first instance booted after applying this patch (and restarting n-cond & n-cpu) will be running fine but with two addresses allocated.

It looks to me like the first address is generated properly in the n-cpu manager, but is then treated correctly in the conductor during rescheduling.

Revision history for this message

Dr. Jens Harbott (j-harbott) wrote on 2016-10-11:

#23

@Aihua: Are you still working on this? https://review.openstack.org/335788 is in merge conflict and hasn't been updated for some time. If not, please unassign and let someone else take over.

Aihua Edward Li (aihuaedwardli) on 2016-10-11

Changed in nova:
assignee:	Aihua Edward Li (aihuaedwardli) → nobody

Dr. Jens Harbott (j-harbott) on 2016-11-04

Changed in nova:
status:	In Progress → Confirmed

Augustina Ragwitz (auggy) on 2016-11-29

tags:

added: network

srividyaketharaju (srividya) on 2016-12-02

Changed in nova:
assignee:	nobody → srividyaketharaju (srividya)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-12-09: Change abandoned on nova (master)

#24

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/335788
Reason: This review is > 6 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Dr. Jens Harbott (j-harbott) on 2017-04-06

Changed in nova:
assignee:	srividyaketharaju (srividya) → nobody

Revision history for this message

MarginHu (margin2017) wrote on 2017-10-15:

#25

comment #13 ,

In my environment，I experienced both two scenes,
1. one port is existed in DB, but only the last port is attached VM.
2. two ports are attached VM (refer https://bugs.launchpad.net/nova/+bug/1722559)

Matt Riedemann (mriedem) on 2017-11-15

Changed in nova:
assignee:	nobody → Matt Riedemann (mriedem)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-11-16: Fix proposed to nova (master)

#26

Fix proposed to branch: master
Review: https://review.openstack.org/520248

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-03-22: Fix proposed to nova (stable/queens)

#27

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/555418

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-03-23: Fix merged to nova (master)

#28

Reviewed: https://review.openstack.org/520248
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3a503a8f2b934f19049531c5c92130ca7cdd6a7f
Submitter: Zuul
Branch: master

commit 3a503a8f2b934f19049531c5c92130ca7cdd6a7f
Author: Matt Riedemann <email address hidden>
Date: Wed Nov 15 19:15:44 2017 -0500

Always deallocate networking before reschedule if using Neutron

    When a server build fails on a selected compute host, the compute
    service will cast to conductor which calls the scheduler to select
    another host to attempt the build if retries are not exhausted.

    With commit 08d24b733ee9f4da44bfbb8d6d3914924a41ccdc, if retries
    are exhausted or the scheduler raises NoValidHost, conductor will
    deallocate networking for the instance. In the case of neutron, this
    means unbinding any ports that the user provided with the server
    create request and deleting any ports that nova-compute created during
    the allocate_for_instance() operation during server build.

When an instance is deleted, it's networking is deallocated in the same
way - unbind pre-existing ports, delete ports that nova created.

    The problem is when rescheduling from a failed host, if we successfully
    reschedule and build on a secondary host, any ports created from the
    original host are not cleaned up until the instance is deleted. For
    Ironic or SR-IOV ports, those are always deallocated.

    The ComputeDriver.deallocate_networks_on_reschedule() method defaults
    to False just so that the Ironic driver could override it, but really
    we should always cleanup neutron ports before rescheduling.

    Looking over bug report history, there are some mentions of different
    networking backends handling reschedules with multiple ports differently,
    in that sometimes it works and sometimes it fails. Regardless of the
    networking backend, however, we are at worst taking up port quota for
    the tenant for ports that will not be bound to whatever host the instance
    ends up on.

    There could also be legacy reasons for this behavior with nova-network,
    so that is side-stepped here by just restricting this check to whether
    or not neutron is being used. When we eventually remove nova-network we
    can then also remove the deallocate_networks_on_reschedule() method and
    SR-IOV check.

Change-Id: Ib2abf73166598ff14fce4e935efe15eeea0d4f7d
Closes-Bug: #1597596

Reviewed:  https://review.openstack.org/520248
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3a503a8f2b934f19049531c5c92130ca7cdd6a7f
Submitter: Zuul
Branch:    master

commit 3a503a8f2b934f19049531c5c92130ca7cdd6a7f
Author: Matt Riedemann <mriedem.os@gmail.com>
Date:   Wed Nov 15 19:15:44 2017 -0500

Always deallocate networking before reschedule if using Neutron
    
    When a server build fails on a selected compute host, the compute
    service will cast to conductor which calls the scheduler to select
    another host to attempt the build if retries are not exhausted.
    
    With commit 08d24b733ee9f4da44bfbb8d6d3914924a41ccdc, if retries
    are exhausted or the scheduler raises NoValidHost, conductor will
    deallocate networking for the instance. In the case of neutron, this
    means unbinding any ports that the user provided with the server
    create request and deleting any ports that nova-compute created during
    the allocate_for_instance() operation during server build.
    
    When an instance is deleted, it's networking is deallocated in the same
    way - unbind pre-existing ports, delete ports that nova created.
    
    The problem is when rescheduling from a failed host, if we successfully
    reschedule and build on a secondary host, any ports created from the
    original host are not cleaned up until the instance is deleted. For
    Ironic or SR-IOV ports, those are always deallocated.
    
    The ComputeDriver.deallocate_networks_on_reschedule() method defaults
    to False just so that the Ironic driver could override it, but really
    we should always cleanup neutron ports before rescheduling.
    
    Looking over bug report history, there are some mentions of different
    networking backends handling reschedules with multiple ports differently,
    in that sometimes it works and sometimes it fails. Regardless of the
    networking backend, however, we are at worst taking up port quota for
    the tenant for ports that will not be bound to whatever host the instance
    ends up on.
    
    There could also be legacy reasons for this behavior with nova-network,
    so that is side-stepped here by just restricting this check to whether
    or not neutron is being used. When we eventually remove nova-network we
    can then also remove the deallocate_networks_on_reschedule() method and
    SR-IOV check.
    
    Change-Id: Ib2abf73166598ff14fce4e935efe15eeea0d4f7d
    Closes-Bug: #1597596

Changed in nova:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-03-23: Fix proposed to nova (stable/pike)

#29

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/555907

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-03-29: Fix merged to nova (stable/queens)

#30

Reviewed: https://review.openstack.org/555418
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9203326f84cd35243e9e6a73cd5fac62af27aaf5
Submitter: Zuul
Branch: stable/queens

commit 9203326f84cd35243e9e6a73cd5fac62af27aaf5
Author: Matt Riedemann <email address hidden>
Date: Wed Nov 15 19:15:44 2017 -0500