Bug #1535918 “instance.host not updated on evacuation” : Bugs : OpenStack Compute (nova)

Revision history for this message

Kyle L. Henderson (kyleh) wrote on 2016-01-22:

#1

To point out the issue a little more.

The compute manager's virtapi allows the compute driver to wait for external events via wait_for_instance_event() method. The common use case is for a compute driver to wait for the vifs to be plugged by neutron before proceeding through the spawn. The pattern is also present in the libvirt driver. See libvirt driver.py -> _create_domain_and_network(). In there you'll see the use of the wait_for_instance_event context manager.

The flow for the events to come into Nova is through nova/api/openstack/compute/server_external_events.py. which eventually calls compute_api.external_instance_event() to dispatch the events. In external_instance_event() you'll see it's using instance.host to call compute_rpcapi.external_instance_event(). So the RPC message will go to whatever host is currently set. In the case of evacuate, at that point in time (while the new host is spawning the recreated VM) it's set to the original host. Which is down. So the compute driver that initiated the action and is waiting for the event will never get it.

The question was raised why libvirt doesn't suffer the same fate. I can't answer that authoritatively, but libvirt has a lot of conditions that have to be met before it'll wait for the event. Here's what it's currently checking before waiting for a plug vif event:

        timeout = CONF.vif_plugging_timeout
        if (self._conn_supports_start_paused and
            utils.is_neutron() and not
            vifs_already_plugged and power_on and timeout):
            events = self._get_neutron_events(network_info)
        else:
            events = []

But it does seem (from reading the code) that if all those conditions are met and the operation is an evacuate, it too would fail. Though I have not tried it.

Augustina Ragwitz (auggy) on 2016-01-26

Changed in nova:
status:	New → Confirmed
tags:	added: libvirt
tags:	added: compute

Wenzhi Yu (yuywz) on 2016-01-27

Changed in nova:
assignee:	nobody → Wen Zhi Yu (yuywz)
status:	Confirmed → In Progress

Revision history for this message

Drew Thorstensen (thorst) wrote on 2016-01-27:

#2

We discussed this issue at the mid cycle. They asked the PowerVM team to re-evaluate because this works in libvirt. What is different in PowerVM's implementation.

I believe both drivers have the same semantic for rebuild/evacuate. The instance is destroyed on the source system and then the spawn is run on the target host. This is the compute manager's default implementation.

The next question was what was different about our criteria to determine if the vif plug time out should be adhered to.

PowerVM's implementation is pretty simple:

        if (utils.is_neutron() and CONF.vif_plugging_timeout):
            return [('network-vif-plugged', vif['id'])
                    for vif in self.network_info
                    if vif.get('active', True) is False]
        else:
            return []

Libvirt's is:
        timeout = CONF.vif_plugging_timeout
        if (self._conn_supports_start_paused and
            utils.is_neutron() and not
            vifs_already_plugged and power_on and timeout):
            events = self._get_neutron_events(network_info)
        else:
            events = []

In a rebuild scenario, the libvirt should hit this.
- self._conn_supports_start_paused: True if KVM or QEMU
- utils.is_neutron(): assumed to be true.
- vifs_already_plugged: False (in that this is a rebuild)
- power_on: True (in that this is a rebuild)
- timeout: Assumed to be set to some number.

I guess I'm wondering if libvirt could be affected by this bug? It could be hitting this, but then passing a rebuild test case if the CONF.vif_plugging_is_fatal is set to False.

Another reason that libvirt may not be impacted is perhaps they are doing an instance.save elsewhere in the flow. Thus inadvertantly updating to the right host. But I don't believe this to be the case...it looks like the only places instance.save is called is in cleanup and in the _live_migration_monitor. Also, nothing in the driver is updating the host, that is done solely in the manager (as one would expect).

Kyle - did I mis-interpret the issue?

We discussed this issue at the mid cycle.  They asked the PowerVM team to re-evaluate because this works in libvirt.  What is different in PowerVM's implementation.

I believe both drivers have the same semantic for rebuild/evacuate.  The instance is destroyed on the source system and then the spawn is run on the target host.  This is the compute manager's default implementation.

The next question was what was different about our criteria to determine if the vif plug time out should be adhered to.

PowerVM's implementation is pretty simple:

if (utils.is_neutron() and CONF.vif_plugging_timeout):
            return [('network-vif-plugged', vif['id'])
                    for vif in self.network_info
                    if vif.get('active', True) is False]
        else:
            return []

Libvirt's is:
        timeout = CONF.vif_plugging_timeout
        if (self._conn_supports_start_paused and
            utils.is_neutron() and not
            vifs_already_plugged and power_on and timeout):
            events = self._get_neutron_events(network_info)
        else:
            events = []

In a rebuild scenario, the libvirt should hit this.
 - self._conn_supports_start_paused: True if KVM or QEMU
 - utils.is_neutron(): assumed to be true.
 - vifs_already_plugged: False (in that this is a rebuild)
 - power_on: True (in that this is a rebuild)
 - timeout: Assumed to be set to some number.

I guess I'm wondering if libvirt could be affected by this bug?  It could be hitting this, but then passing a rebuild test case if the CONF.vif_plugging_is_fatal is set to False.

Another reason that libvirt may not be impacted is perhaps they are doing an instance.save elsewhere in the flow.  Thus inadvertantly updating to the right host.  But I don't believe this to be the case...it looks like the only places instance.save is called is in cleanup and in the _live_migration_monitor.  Also, nothing in the driver is updating the host, that is done solely in the manager (as one would expect).

Kyle - did I mis-interpret the issue?

Revision history for this message

Kyle L. Henderson (kyleh) wrote on 2016-01-27:

#3

You documented the issue correct Drew.

One correction on the evacuate semantics: The instance is not destroyed on the source system (since it's down and confirmed to be down by the compute api) until the source host is available again (if ever). This would happen after the rebuild (recreate=True) is completed on on the destination host.

Revision history for this message

Drew Thorstensen (thorst) wrote on 2016-01-28:

#4

The issue with the PowerVM driver is actually in neutron. I set up a libvirt environment, and the difference is that the PowerVM VIF is for some reason in a BUILD state, where as it is ACTIVE in libvirt.

If the PowerVM VIF was in an ACTIVE state, this wouldn't occur, and no neutron events would need to be waited for.

I'll investigate what's going on with the port state for networking-powervm. The state up is being sent...so this requires some verification.

It is true that the nova instance.host isn't updated until after the spawn in nova. That could be investigated...but this is the root reason why PowerVM is seeing different behavior than Libvirt.

affects:	nova → networking-powervm
Changed in networking-powervm:
assignee:	Wen Zhi Yu (yuywz) → Drew Thorstensen (thorst)

Revision history for this message

Drew Thorstensen (thorst) wrote on 2016-01-28:

#5

I see the issue. The agent does periodic 'get_device_details' calls. It turns out that nexted within Neutron, if you 'get the device details', it reverts the port state to BUILD. It expects an immediate 'UP' request back. The agent doesn't do this.

Will need to add some logic.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-28: Fix proposed to networking-powervm (master)

#6

Fix proposed to branch: master
Review: https://review.openstack.org/273728

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-02-02: Fix merged to networking-powervm (master)

#7

Reviewed: https://review.openstack.org/273728
Committed: https://git.openstack.org/cgit/openstack/networking-powervm/commit/?id=65f53ab2412f1865f50d8dba701420350a7f68ec
Submitter: Jenkins
Branch: master

commit 65f53ab2412f1865f50d8dba701420350a7f68ec
Author: Drew Thorstensen <email address hidden>
Date: Thu Jan 28 19:59:45 2016 +0000

Update heal code to ensure device up

    The heal code within the networking-powervm project would ensure that
    the VLAN and client device was routed out to the network. However, due
    to it calling 'get_device_details', the neutron code was changing the
    state back to BUILD.

    Given this behavior, it became apparent that the best path forward was
    to have the heal code call a full provision request for the client
    device. This actually will no-op very quickly if the VLAN is already on
    the client device, but tells Neutron that it is not in fact in a build
    state...but rather is now ACTIVE.

    This allows for a more robust provisioning scheme and allows the neutron
    state to reflect reality. It also updates any existing ports in the
    field that may be affected by this with the next 'heal' cycle.

Change-Id: I02f2c4cd1d63b7a712e50c273e043e6a7ea5a5e1
Closes-Bug: 1535918

Changed in networking-powervm:
status:	In Progress → Fix Released

Revision history for this message

Kyle L. Henderson (kyleh) wrote on 2016-02-17:

#8

I pulled the latest code on my systems with devstack. Removed the work around for the issue from the nova-powervm code base (which was to force the update of the instance.host to the target host) and ran an evacuation. I hit the same problem as seen before. While recreating the instance on the target host, the instance.host is pointing to the old source host and the event that is expected to be received by the target host's compute manager is sent to the source host (which is down.)

Changed in networking-powervm:
status:	Fix Released → In Progress

Revision history for this message

Drew Thorstensen (thorst) wrote on 2016-02-17:

#9

I looked at Kyle's box. The port is going back to a build state for some reason. Need to figure out why...

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-02-17: Fix proposed to networking-powervm (master)

#10

Fix proposed to branch: master
Review: https://review.openstack.org/281469

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-02-18: Fix merged to networking-powervm (master)

#11

Reviewed: https://review.openstack.org/281469
Committed: https://git.openstack.org/cgit/openstack/networking-powervm/commit/?id=9f29aa1ef982a1dd421f55bbb5784c5c36b257e0
Submitter: Jenkins
Branch: master

commit 9f29aa1ef982a1dd421f55bbb5784c5c36b257e0
Author: Drew Thorstensen <email address hidden>
Date: Wed Feb 17 14:15:51 2016 -0500

Fix the heal code to invoke with the rpc_device

This resolves a bug in the heal code to correctly pass in the right
parameter to the _get_nb_and_vlan method.

Change-Id: Ibda1d3581b56a7a4a1fd163b406d28d32f9dd82c
Closes-Bug: 1535918

Changed in networking-powervm:
status:	In Progress → Fix Released

Revision history for this message

Taylor Peoples (tpeoples) wrote on 2016-05-16:

#12

Download full text (11.6 KiB)

I am able to reproduce this same issue on a multinode devstack running libvirt.

On the source host, the last call to nova/network/base_api.py::update_instance_cache_with_nw_info for a specific instance before the source host crashes has the nw_info passed in as a VIF object with the "active" attribute set to False. This is because the VM has just been deployed and the network was just created. In other words, the last time the instance's InstanceInfoCache's network_info attribute was updated before the source host went down, the VIF was considered not active. In some environments, especially when doing concurrent deploys, it may take a while for the InstanceInfoCache to update the network_info to show as active.

What this boils down to is that Nova's InstanceInfoCache can potentially have a stale network_info active state. This causes the rebuild flow (which is the same as the spawn flow) to potentially end up waiting for the network-vif-plugged event, which will never come because it was sent to the source host instead of the destination. This results in the rebuild to fail because the VIF plugging times out.

Steps:

1) Deploy VM(s) to host A
2) Take host A down (e.g., kill it's nova api and nova compute processes) once VM(s) from (1) are finished deploying
3) Try to evacuate VM(s) from host A to host B
4) Evacuation will potentially time out based on explanation above. It is much easier to reproduce if you do step (2) as soon as possible after the VM(s) finish deploying

stack@controller:~$ glance image-list
+--------------------------------------+---------------------------------+
| ID | Name |
+--------------------------------------+---------------------------------+
| f91197db-16b5-44b2-beb4-72a9e57041c2 | cirros-0.3.4-x86_64-uec |
| 1348de9b-501d-426c-8cb5-e65381208085 | cirros-0.3.4-x86_64-uec-kernel |
| 790ebadb-bc5b-48be-b1f0-95a9214a11ae | cirros-0.3.4-x86_64-uec-ramdisk |
+--------------------------------------+---------------------------------+
stack@controller:~$
stack@controller:~$ neutron net-list
+--------------------------------------+---------+----------------------------------------------------------+
| id | name | subnets |
+--------------------------------------+---------+----------------------------------------------------------+
| 4ba74a3e-e7a8-4ca4-9de5-8a1d9e1042b8 | public | c9210289-4895-481b-946a-b406ba5889b4 2001:db8::/64 |
| | | 9a044095-ab4d-4767-817e-02d81cbe90ef 172.24.4.0/24 |
| d7faf346-1a26-41a0-bb62-b08808f6ba13 | private | f45ab890-a0d6-48c1-906e-9c8f81659d65 fdfd:f0f5:a83a::/64 |
| | | 0e85f797-0270-49e9-9600-6f21b9cf47d0 10.254.1.0/24 |
+--------------------------------------+---------+----------------------------------------------------------+
stack@controller:~$
stack@controller:~$ nova boot tdp-test-vm --flavor 1 --availability-zone nova:hostA --block-device id=f91197db-16b5-44b2-beb4-72a9e57041c2,source=image,dest=volume,size=1,bootind...

I am able to reproduce this same issue on a multinode devstack running libvirt.

On the source host, the last call to nova/network/base_api.py::update_instance_cache_with_nw_info for a specific instance before the source host crashes has the nw_info passed in as a VIF object with the "active" attribute set to False. This is because the VM has just been deployed and the network was just created. In other words, the last time the instance's InstanceInfoCache's network_info attribute was updated before the source host went down, the VIF was considered not active. In some environments, especially when doing concurrent deploys, it may take a while for the InstanceInfoCache to update the network_info to show as active.

What this boils down to is that Nova's InstanceInfoCache can potentially have a stale network_info active state. This causes the rebuild flow (which is the same as the spawn flow) to potentially end up waiting for the network-vif-plugged event, which will never come because it was sent to the source host instead of the destination. This results in the rebuild to fail because the VIF plugging times out.

Steps:

1) Deploy VM(s) to host A
2) Take host A down (e.g., kill it's nova api and nova compute processes) once VM(s) from (1) are finished deploying
3) Try to evacuate VM(s) from host A to host B
4) Evacuation will potentially time out based on explanation above. It is much easier to reproduce if you do step (2) as soon as possible after the VM(s) finish deploying

stack@controller:~$ glance image-list
+--------------------------------------+---------------------------------+
| ID                                   | Name                            |
+--------------------------------------+---------------------------------+
| f91197db-16b5-44b2-beb4-72a9e57041c2 | cirros-0.3.4-x86_64-uec         |
| 1348de9b-501d-426c-8cb5-e65381208085 | cirros-0.3.4-x86_64-uec-kernel  |
| 790ebadb-bc5b-48be-b1f0-95a9214a11ae | cirros-0.3.4-x86_64-uec-ramdisk |
+--------------------------------------+---------------------------------+
stack@controller:~$
stack@controller:~$ neutron net-list
+--------------------------------------+---------+----------------------------------------------------------+
| id                                   | name    | subnets                                                  |
+--------------------------------------+---------+----------------------------------------------------------+
| 4ba74a3e-e7a8-4ca4-9de5-8a1d9e1042b8 | public  | c9210289-4895-481b-946a-b406ba5889b4 2001:db8::/64       |
|                                      |         | 9a044095-ab4d-4767-817e-02d81cbe90ef 172.24.4.0/24       |
| d7faf346-1a26-41a0-bb62-b08808f6ba13 | private | f45ab890-a0d6-48c1-906e-9c8f81659d65 fdfd:f0f5:a83a::/64 |
|                                      |         | 0e85f797-0270-49e9-9600-6f21b9cf47d0 10.254.1.0/24       |
+--------------------------------------+---------+----------------------------------------------------------+
stack@controller:~$
stack@controller:~$ nova boot tdp-test-vm --flavor 1 --availability-zone nova:hostA --block-device id=f91197db-16b5-44b2-beb4-72a9e57041c2,source=image,dest=volume,size=1,bootindex=0 --nic net-id=4ba74a3e-e7a8-4ca4-9de5-8a1d9e1042b8 --min-count 5 --poll
+--------------------------------------+-------------------------------------------------+
| Property                             | Value                                           |
+--------------------------------------+-------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                          |
| OS-EXT-AZ:availability_zone          | nova                                            |
| OS-EXT-SRV-ATTR:host                 | -                                               |
| OS-EXT-SRV-ATTR:hostname             | tdp-test-vm-1                                   |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -                                               |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000021                               |
| OS-EXT-SRV-ATTR:kernel_id            | 1348de9b-501d-426c-8cb5-e65381208085            |
| OS-EXT-SRV-ATTR:launch_index         | 0                                               |
| OS-EXT-SRV-ATTR:ramdisk_id           | 790ebadb-bc5b-48be-b1f0-95a9214a11ae            |
| OS-EXT-SRV-ATTR:reservation_id       | r-erf2jgt0                                      |
| OS-EXT-SRV-ATTR:root_device_name     | -                                               |
| OS-EXT-SRV-ATTR:user_data            | -                                               |
| OS-EXT-STS:power_state               | 0                                               |
| OS-EXT-STS:task_state                | scheduling                                      |
| OS-EXT-STS:vm_state                  | building                                        |
| OS-SRV-USG:launched_at               | -                                               |
| OS-SRV-USG:terminated_at             | -                                               |
| accessIPv4                           |                                                 |
| accessIPv6                           |                                                 |
| adminPass                            | YvcgM3bNF7TH                                    |
| config_drive                         |                                                 |
| created                              | 2016-05-16T01:55:53Z                            |
| description                          | -                                               |
| flavor                               | m1.tiny (1)                                     |
| hostId                               |                                                 |
| host_status                          |                                                 |
| id                                   | 2a99f5b4-f060-4e3e-8799-f021bca2b056            |
| image                                | Attempt to boot from volume - no image supplied |
| key_name                             | -                                               |
| locked                               | False                                           |
| metadata                             | {}                                              |
| name                                 | tdp-test-vm-1                                   |
| os-extended-volumes:volumes_attached | []                                              |
| progress                             | 0                                               |
| security_groups                      | default                                         |
| status                               | BUILD                                           |
| tenant_id                            | 2794951c7a194b7d8a5047dc69882a14                |
| updated                              | 2016-05-16T01:55:56Z                            |
| user_id                              | aae1397168124897b2065d7bed9da4e2                |
+--------------------------------------+-------------------------------------------------+

stack@hostB:~$ grep -A17 "ERROR nova.compute.manager.*Setting instance vm_state to ERROR" ~/logs/n-cpu.log
2016-05-15 22:27:21.006 ERROR nova.compute.manager [req-f56f252f-ed1d-4e47-9401-25cd5bce74aa admin admin] [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056] Setting instance vm_state to ERROR
2016-05-15 22:27:21.006 TRACE nova.compute.manager [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056] Traceback (most recent call last):
2016-05-15 22:27:21.006 TRACE nova.compute.manager [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056]   File "/opt/stack/nova/nova/compute/manager.py", line 6434, in _error_out_instance_on_exception
2016-05-15 22:27:21.006 TRACE nova.compute.manager [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056]     yield
2016-05-15 22:27:21.006 TRACE nova.compute.manager [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056]   File "/opt/stack/nova/nova/compute/manager.py", line 2633, in rebuild_instance
2016-05-15 22:27:21.006 TRACE nova.compute.manager [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056]     bdms, recreate, on_shared_storage, preserve_ephemeral)
2016-05-15 22:27:21.006 TRACE nova.compute.manager [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056]   File "/opt/stack/nova/nova/compute/manager.py", line 2677, in _do_rebuild_instance_with_claim
2016-05-15 22:27:21.006 TRACE nova.compute.manager [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056]     self._do_rebuild_instance(*args, **kwargs)
2016-05-15 22:27:21.006 TRACE nova.compute.manager [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056]   File "/opt/stack/nova/nova/compute/manager.py", line 2793, in _do_rebuild_instance
2016-05-15 22:27:21.006 TRACE nova.compute.manager [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056]     self._rebuild_default_impl(**kwargs)
2016-05-15 22:27:21.006 TRACE nova.compute.manager [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056]   File "/opt/stack/nova/nova/compute/manager.py", line 2558, in _rebuild_default_impl
2016-05-15 22:27:21.006 TRACE nova.compute.manager [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056]     block_device_info=new_block_device_info)
2016-05-15 22:27:21.006 TRACE nova.compute.manager [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056]   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2569, in spawn
2016-05-15 22:27:21.006 TRACE nova.compute.manager [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056]     block_device_info=block_device_info)
2016-05-15 22:27:21.006 TRACE nova.compute.manager [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056]   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4738, in _create_domain_and_network
2016-05-15 22:27:21.006 TRACE nova.compute.manager [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056]     raise exception.VirtualInterfaceCreateException()
2016-05-15 22:27:21.006 TRACE nova.compute.manager [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056] VirtualInterfaceCreateException: Virtual Interface creation failed
2016-05-15 22:27:21.006 TRACE nova.compute.manager [instance: 2a99f5b4-f060-4e3e-8799-f021bca2b056]
stack@hostB:~$

affects:	networking-powervm → nova
Changed in nova:
assignee:	Drew Thorstensen (thorst) → nobody

Taylor Peoples (tpeoples) on 2016-05-16

affects:	nova → nova-powervm
Changed in nova-powervm:
assignee:	nobody → Drew Thorstensen (thorst)
Changed in nova:
assignee:	nobody → Taylor Peoples (tpeoples)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-16: Change abandoned on nova-powervm (master)

#13

Change abandoned by Drew Thorstensen (<email address hidden>) on branch: master
Review: https://review.openstack.org/315874
Reason: Superseded by https://review.openstack.org/#/c/316417/

Taylor Peoples (tpeoples) on 2016-05-20

Changed in nova:
assignee:	Taylor Peoples (tpeoples) → nobody

Sridhar Venkat (svenkat) on 2016-06-18

Changed in nova:
assignee:	nobody → Sridhar Venkat (svenkat)

Revision history for this message

Sridhar Venkat (svenkat) wrote on 2016-06-18:

#14

Problem is reproducible when more than one evacuate is attempted simultaneously (4 in my devstack environment). If evacuate is attempted one at a time, this problem is not exhibited.

Revision history for this message

Sridhar Venkat (svenkat) wrote on 2016-06-20:

#15

My previous statement needs correction. The problem is reproducible event with one VM. To reproduce, deploy a VM on source Host and shutdown source host before corresponding VIF is activated. Examine nova compute log and searching for "vif_type=" should reveal active state of VIF. If it is 'false', evacuation of such a VM results in the error reported in this bug.

If you can wait till VIF is activated and then shutdown source host, such a VM can be successfully evacuated.

Changed in nova:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-20: Fix proposed to nova (master)

#16

Fix proposed to branch: master
Review: https://review.openstack.org/331707

OpenStack Infra (hudson-openstack) on 2016-10-08

Changed in nova:
assignee:	Sridhar Venkat (svenkat) → Artom Lifshitz (notartom)

Revision history for this message

Artom Lifshitz (notartom) wrote on 2016-10-08:

#17

Since the bot doesn't seem to have picked it up:
Fix proposed to nova (master):
https://review.openstack.org/#/c/371048/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-10-11:

#18

Fix proposed to branch: master
Review: https://review.openstack.org/385086

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-01: Fix merged to nova (master)

#19

Reviewed: https://review.openstack.org/371048
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a5b920a197c70d2ae08a1e1335d979857f923b4f
Submitter: Jenkins
Branch: master

commit a5b920a197c70d2ae08a1e1335d979857f923b4f
Author: Artom Lifshitz <email address hidden>
Date: Wed Oct 5 14:37:03 2016 -0400

Send events to all relevant hosts if migrating

    Previously, external events were sent to the instance object's host
    field. This patch fixes the external event dispatching to check for
    migration. If an instance is being migrated, the source and
    destination compute are added to the set of hosts to which the event
    is sent.

    Change-Id: If00736ab36df4a5a3be4f02b0a550e4bcae77b1b
    Closes-bug: 1535918
    Closes-bug: 1624052

Changed in nova:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-01: Fix proposed to nova (stable/newton)

#20

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/392219

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-09: Fix merged to nova (stable/newton)

#21

Reviewed: https://review.openstack.org/392219
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5de902a3163c9c079fab22754388bd4e02981298
Submitter: Jenkins
Branch: stable/newton

commit 5de902a3163c9c079fab22754388bd4e02981298
Author: Artom Lifshitz <email address hidden>
Date: Wed Oct 5 14:37:03 2016 -0400

Send events to all relevant hosts if migrating

    Previously, external events were sent to the instance object's host
    field. This patch fixes the external event dispatching to check for
    migration. If an instance is being migrated, the source and
    destination compute are added to the set of hosts to which the event
    is sent.

    Change-Id: If00736ab36df4a5a3be4f02b0a550e4bcae77b1b
    Closes-bug: 1535918
    Closes-bug: 1624052
    (cherry picked from commit a5b920a197c70d2ae08a1e1335d979857f923b4f)

tags:

added: in-stable-newton

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-17: Fix included in openstack/nova 15.0.0.0b1

#22

This issue was fixed in the openstack/nova 15.0.0.0b1 development milestone.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-12-09: Change abandoned on nova (master)

#23

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/331707
Reason: This review is > 6 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-12-19: Fix included in openstack/nova 14.0.3

#24

This issue was fixed in the openstack/nova 14.0.3 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-23: Change abandoned on nova (master)

#25

Change abandoned by Artom Lifshitz (<email address hidden>) on branch: master
Review: https://review.openstack.org/385086
Reason: Nothing is technically broken anymore, since the patch that actually fixes the bug has merged. The race is still present I believe, but it doesn't actually affect anything now that event dispatching is fixed.

Revision history for this message

Narendra Pal Singh (narendrapal) wrote on 2017-04-25:

#26

i am hitting this issue.

1) After nova evacuate on two compute setup, instance's host parameter is updated to it's new host.

2) while adding storage to instance, its failing. As rpc call to compute(old host) is getting timeout exception.

Revision history for this message

Narendra Pal Singh (narendrapal) wrote on 2017-04-25:

#27

minor correction to comment #26
1) After nova evacuate on two compute setup, instance's host parameter is *not* updated to it's new host.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-05-02: Fix proposed to nova (stable/mitaka)

#28

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/461678

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-06-21: Change abandoned on nova (stable/mitaka)

#29

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/mitaka
Review: https://review.openstack.org/461678
Reason: mitaka is basically end of life

Seyeong Kim (seyeongkim) on 2017-08-18

description:

updated

Seyeong Kim (seyeongkim) on 2017-08-18

description:

updated

Revision history for this message

Seyeong Kim (seyeongkim) wrote on 2017-08-21:

#30

lp1535918_xenial.debdiff Edit (23.2 KiB, text/plain)

description:	updated
tags:	added: sts-sru-needed

Revision history for this message

Seyeong Kim (seyeongkim) wrote on 2017-08-21:

#31

lp1535918_uca_mitaka.debdiff Edit (46.7 KiB, text/plain)

Seyeong Kim (seyeongkim) on 2017-08-21

description:

updated

Edward Hope-Morley (hopem) on 2017-08-21

Changed in cloud-archive:
status:	New → Fix Released

Seyeong Kim (seyeongkim) on 2017-08-21

description:

updated

Seyeong Kim (seyeongkim) on 2017-08-21

description:

updated

Eric Desrochers (slashd) on 2017-08-21

Changed in nova (Ubuntu Xenial):
assignee:	nobody → Seyeong Kim (xtrusia)

Eric Desrochers (slashd) on 2017-08-21

Changed in nova (Ubuntu Artful):
status:	New → Fix Released
Changed in nova (Ubuntu Zesty):
status:	New → Fix Released
Changed in nova (Ubuntu Xenial):
status:	New → In Progress

Seyeong Kim (seyeongkim) on 2017-08-22

description:

updated

Revision history for this message

Eric Desrochers (slashd) wrote on 2017-08-23:

#32

Uploaded in Xenial upload queue.

Revision history for this message

Brian Murray (brian-murray) wrote on 2017-08-24: Please test proposed package

#33

Hello Kyle, or anyone else affected,

Accepted nova into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/2:13.1.4-0ubuntu3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nova (Ubuntu Xenial):
status:	In Progress → Fix Committed
tags:	added: verification-needed verification-needed-xenial

Revision history for this message

Seyeong Kim (seyeongkim) wrote on 2017-08-28:

#34

Hello

I tested -proposed is working fine.

rc nova-api 2:13.1.4-0ubuntu3 all OpenStack Compute - API frontend
ii nova-api-os-compute 2:13.1.4-0ubuntu3 all OpenStack Compute - OpenStack Compute API frontend
ii nova-cert 2:13.1.4-0ubuntu3 all OpenStack Compute - certificate management
ii nova-common 2:13.1.4-0ubuntu3 all OpenStack Compute - common files
ii nova-conductor 2:13.1.4-0ubuntu3 all OpenStack Compute - conductor service
ii nova-consoleauth 2:13.1.4-0ubuntu3 all OpenStack Compute - Console Authenticator
ii nova-novncproxy 2:13.1.4-0ubuntu3 all OpenStack Compute - NoVNC proxy
ii nova-scheduler 2:13.1.4-0ubuntu3 all OpenStack Compute - virtual machine scheduler
ii python-nova 2:13.1.4-0ubuntu3 all OpenStack Compute Python libraries

tags:

added: verification-done-xenial
removed: verification-needed-xenial

Revision history for this message

Seyeong Kim (seyeongkim) wrote on 2017-08-28:

#35

I deployed openstack env with my script on description [test case]. and got error as reproduction.

upgraded nova-cloud-controller, nova-compute

then evacuate those error state vm again, got ACTIVE.

Revision history for this message

James Page (james-page) wrote on 2017-09-04:

#36

Hello Kyle, or anyone else affected,

Accepted nova into mitaka-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

sudo add-apt-repository cloud-archive:mitaka-proposed
sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-mitaka-needed to verification-mitaka-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-mitaka-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags:

added: verification-mitaka-needed

Revision history for this message

Łukasz Zemczak (sil2100) wrote on 2017-09-04: Update Released

#37

The verification of the Stable Release Update for nova has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2017-09-04:

#38

This bug was fixed in the package nova - 2:13.1.4-0ubuntu3

---------------
nova (2:13.1.4-0ubuntu3) xenial; urgency=medium

* Fix evacuation error when nova-compute is down just
after VM is started.

- d/p/make-sure-to-rebuild-claim-on-recreate.patch
(backported from newton 0f2d874, upstream a2b0824)

- d/p/Send-events-to-all-relevant-hosts-if-migrating.patch (LP: #1535918)
(backported from a5b920)

-- Seyeong Kim <email address hidden> Fri, 04 Aug 2017 04:46:40 +0900

Changed in nova (Ubuntu Xenial):
status:	Fix Committed → Fix Released

Revision history for this message

Seyeong Kim (seyeongkim) wrote on 2017-09-05:

#39

had same test as xenial, ( this is for mitaka uca )

and verification done

ii nova-api-os-compute 2:13.1.4-0ubuntu3~cloud0 all OpenStack Compute - OpenStack Compute API frontend
ii nova-cert 2:13.1.4-0ubuntu3~cloud0 all OpenStack Compute - certificate management
ii nova-common 2:13.1.4-0ubuntu3~cloud0 all OpenStack Compute - common files
ii nova-conductor 2:13.1.4-0ubuntu3~cloud0 all OpenStack Compute - conductor service
ii nova-consoleauth 2:13.1.4-0ubuntu3~cloud0 all OpenStack Compute - Console Authenticator
ii nova-novncproxy 2:13.1.4-0ubuntu3~cloud0 all OpenStack Compute - NoVNC proxy
ii nova-scheduler 2:13.1.4-0ubuntu3~cloud0 all OpenStack Compute - virtual machine scheduler
ii python-nova 2:13.1.4-0ubuntu3~cloud0 all OpenStack Compute Python libraries

tags:

added: verification-mitaka-done
removed: verification-mitaka-needed

Revision history for this message

James Page (james-page) wrote on 2017-10-30:

#40

nova (2:13.1.4-0ubuntu4.1~cloud0) trusty-mitaka; urgency=medium
.
   * New update for the Ubuntu Cloud Archive.
.
nova (2:13.1.4-0ubuntu4.1) xenial; urgency=medium
.
   * d/nova.conf: Add connection strings to default config for sqlite. This
     enables daemons to start by default and fixes failing autopkgtests.
   * d/tests/nova-daemons: Update test to be resilient to timing failures.

	Status	Importance	Assigned to
OpenStack Compute (nova)	Fix Released	Undecided	Artom Lifshitz
Ubuntu Cloud Archive	Fix Released	Undecided	Unassigned
Mitaka	Fix Released	Undecided	Unassigned
nova-powervm	Fix Released	Undecided	Drew Thorstensen
nova (Ubuntu)	Fix Released	Undecided	Unassigned
Xenial	Fix Released	Undecided	Seyeong Kim
Zesty	Fix Released	Undecided	Unassigned
Artful	Fix Released	Undecided	Unassigned

OpenStack Compute (nova)

instance.host not updated on evacuation

Bug Description

Duplicates of this bug

Other bug subscribers

Patches

Remote bug watches