Cannot rebuild a VM created from a Cinder volume backed by NetApp

Bug #1400881 reported by Julian Montez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Medium
David
Juno
Fix Released
Medium
Jesse Pretorius
Kilo
Fix Released
Medium
Jesse Pretorius
Trunk
Fix Released
Medium
David

Bug Description

After successfully creating a VM from a Cinder volume backed by NetApp, issuing a rebuild fails. The error returned is: `Failed to terminate process 9233 with SIGKILL: Device or resource busy`. This is occurring within SAT6-LAB02.

Looking through the Compute error logs, it looks like the DOM kill on the VM returns an error code and throws the server's status into error. As the error log also states, I'm unable to view the process which fails because it can't be found on the Compute node. Before the rebuild's issued, you can see the VM as active according to the Nova CLI.

```
root@node89_utility_container-35d99d12:~# nova boot --flavor 2 --block-device source=image,id=b9e8c879-9cdf-4a97-bdd0-7eab3920919f,dest=volume,size=10,shutdown=remove,bootindex=0 --nic net-id=84d11930-043e-4e76-81fe-fa9beb45cd3c --poll test
+--------------------------------------+-------------------------------------------------+
| Property | Value |
+--------------------------------------+-------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | - |
| OS-EXT-SRV-ATTR:hypervisor_hostname | - |
| OS-EXT-SRV-ATTR:instance_name | instance-0000033f |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | - |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| adminPass | 3KedpFDJ3453 |
| config_drive | |
| created | 2014-12-09T19:53:31Z |
| flavor | m1.small (2) |
| hostId | |
| id | 676b0cf4-07eb-4145-b6de-5767d8468f66 |
| image | Attempt to boot from volume - no image supplied |
| key_name | - |
| metadata | {} |
| name | test |
| os-extended-volumes:volumes_attached | [] |
| progress | 0 |
| security_groups | default |
| status | BUILD |
| tenant_id | a79197cfca65452298be0ff9801885b2 |
| updated | 2014-12-09T19:53:31Z |
| user_id | 2912a11dd3b64810aaa9c59ead0fefc0 |
+--------------------------------------+-------------------------------------------------+

Server building... 100% complete
Finished

---

root@node89_utility_container-35d99d12:~# nova image-list
+--------------------------------------+--------------+--------+--------------------------------------+
| ID | Name | Status | Server |
+--------------------------------------+--------------+--------+--------------------------------------+
| b9e8c879-9cdf-4a97-bdd0-7eab3920919f | cirros | ACTIVE | |
| 5f4aac88-168e-4878-98a0-09c493e56dda | cirros-0.3.2 | ACTIVE | |
| c2f864f8-153d-4175-946e-b44becfb6bdc | cirros-0.3.3 | ACTIVE | |
| 1920c7f2-7e47-415e-bbbf-c86144863f5d | image237842 | ACTIVE | c0d6190b-e791-4151-bbb3-31202a903e21 |
| b7dd98cc-22e2-46c0-873d-2f5052c8d6b3 | image873715 | ACTIVE | cbbc6d45-8cde-439e-bebc-8f4bb7b1d94e |
+--------------------------------------+--------------+--------+--------------------------------------+

---

root@node89_utility_container-35d99d12:~# nova rebuild --poll 676b0cf4-07eb-4145-b6de-5767d8468f66 c2f864f8-153d-4175-946e-b44becfb6bdc
+-------------------+----------------------------------------------------------+
| Property | Value |
+-------------------+----------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| accessIPv4 | |
| accessIPv6 | |
| adminPass | LBqu8H94d7TP |
| created | 2014-12-09T19:53:31Z |
| flavor | m1.small (2) |
| hostId | fa32297f7a7cfbe1f2cbd61bd78932f0e5980cdcb89b9447254b75d4 |
| id | 676b0cf4-07eb-4145-b6de-5767d8468f66 |
| image | cirros-0.3.3 (c2f864f8-153d-4175-946e-b44becfb6bdc) |
| metadata | {} |
| name | test |
| private network | 172.31.0.126 |
| progress | 0 |
| status | REBUILD |
| tenant_id | a79197cfca65452298be0ff9801885b2 |
| updated | 2014-12-09T19:54:50Z |
| user_id | 2912a11dd3b64810aaa9c59ead0fefc0 |
+-------------------+----------------------------------------------------------+

Server rebuilding... 0% complete
Error rebuilding server
ERROR (InstanceInErrorState): Failed to terminate process 9233 with SIGKILL: Device or resource busy

---

# /var/log/nova/nova-compute.log - node92
2014-12-09 20:03:03.613 3235 AUDIT nova.compute.manager [req-d8a82313-8ed8-40a8-9c80-3ef17ed55c2a None] [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] Rebuilding instance
2014-12-09 20:03:19.369 3235 ERROR nova.virt.libvirt.driver [req-d8a82313-8ed8-40a8-9c80-3ef17ed55c2a None] [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] Error from libvirt during destroy. Code=38 Erro
r=Failed to terminate process 22831 with SIGKILL: Device or resource busy
2014-12-09 20:03:19.370 3235 ERROR nova.compute.manager [req-d8a82313-8ed8-40a8-9c80-3ef17ed55c2a None] [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] Setting instance vm_state to ERROR
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] Traceback (most recent call last):
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] File "/usr/local/lib/python2.7/dist-packages/nova/compute/manager.py", line 6097, in _error_out_i
nstance_on_exception
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] yield
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] File "/usr/local/lib/python2.7/dist-packages/nova/compute/manager.py", line 2825, in rebuild_inst
ance
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] self._rebuild_default_impl(**kwargs)
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] File "/usr/local/lib/python2.7/dist-packages/nova/compute/manager.py", line 2666, in _rebuild_def
ault_impl
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] block_device_info=block_device_info)
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] File "/usr/local/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1054, in destroy
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] self._destroy(instance)
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] File "/usr/local/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1010, in _destroy
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] instance=instance)
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] File "/usr/local/lib/python2.7/dist-packages/nova/openstack/common/excutils.py", line 82, in __ex
it__
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] six.reraise(self.type_, self.value, self.tb)
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] File "/usr/local/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 979, in _destroy
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] virt_dom.destroy()
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] result = proxy_call(self._autowrap, f, *args, **kwargs)
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] rv = execute(f, *args, **kwargs)
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] six.reraise(c, e, tb)
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] rv = meth(*args, **kwargs)
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] File "/usr/lib/python2.7/dist-packages/libvirt.py", line 918, in destroy
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] if ret == -1: raise libvirtError ('virDomainDestroy() failed', dom=self)
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69] libvirtError: Failed to terminate process 22831 with SIGKILL: Device or resource busy
2014-12-09 20:03:19.370 3235 TRACE nova.compute.manager [instance: 4a57af5e-0472-4080-b5d4-8721fbb29e69]
2014-12-09 20:03:19.482 3235 INFO nova.scheduler.client.report [req-d8a82313-8ed8-40a8-9c80-3ef17ed55c2a None] Compute_service record updated for ('node92.sat6.lab', 'node92.sat6.lab')
2014-12-09 20:03:19.639 3235 INFO nova.scheduler.client.report [req-d8a82313-8ed8-40a8-9c80-3ef17ed55c2a None] Compute_service record updated for ('node92.sat6.lab', 'node92.sat6.lab')
2014-12-09 20:03:19.642 3235 ERROR oslo.messaging.rpc.dispatcher [req-d8a82313-8ed8-40a8-9c80-3ef17ed55c2a ] Exception during message handling: Failed to terminate process 22831 with SIGKILL: Device or resource busy
```

Tags: upstream-bug

CVE References

Changed in openstack-ansible:
status: New → Triaged
importance: Undecided → Critical
importance: Critical → Medium
milestone: none → next
Revision history for this message
Matt Thompson (mattt416) wrote :

I was able to replicate this on cloud servers (using standard LVM backend). I'm still digging into it, but this seems to be the same issue:

https://bugs.launchpad.net/nova/+bug/1353939

Revision history for this message
Matt Thompson (mattt416) wrote :

Tried replicating this on stable/icehouse but was not able to. Updated https://bugs.launchpad.net/nova/+bug/1353939 to indicate we're also hitting that bug.

git-harry (git-harry)
tags: added: upstream
tags: added: upstream-bug
removed: upstream
Changed in openstack-ansible:
status: Triaged → Confirmed
Changed in openstack-ansible:
milestone: next → 11.0.4
milestone: 11.0.4 → none
Revision history for this message
David (david-alfano) wrote :

Seems that the nova bug was fixed as of 1030 today.

https://bugs.launchpad.net/nova/+bug/1353939

Revision history for this message
Darren Birkett (darren-birkett) wrote :

Upstream fix was committed to master (liberty) and stable/kilo. No backport has yet been proposed to juno/stable.

Revision history for this message
David (david-alfano) wrote :

I've have been trying to verify if the patch merged in nova for https://bugs.launchpad.net/nova/+bug/1353939 fixes this problem in the master branch. Unfortunately, this patch runs into https://bugs.launchpad.net/nova/+bug/1440762 which has yet to have a fix.

History: http://paste.openstack.org/show/356278/

nova-compute.log: http://paste.openstack.org/show/356276/

It seems that we'll have to wait on an upstream fix for bug #1440762

Revision history for this message
David (david-alfano) wrote :

The nova bug that I linked to previously has been fixed and merged.

https://review.openstack.org/#/c/176891/

Would I be correct in thinking that this bug could be closed?

Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

Unless this is fixed upstream, this will have to just be noted as a known issue. I'm going to mark this as "won't fix" - if the upstream bug is merged then it'll automatically be fixed for us.

Revision history for this message
Darren Birkett (darren-birkett) wrote :

The patch has merged into upstream master, and is pending on the kilo and juno backports. This bug can be fixed in each of our branches by a sha bump as the patches merge upstream. I'd argue that it's worth keeping this bug open and 'releasing' the fixes as we consume them with sha bumps.

Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

stable/kilo fix backport review: https://review.openstack.org/203236
stable/juno fix backport review: https://review.openstack.org/203253

Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

The stable/kilo patch has merged. This will be incorporated into the next SHA bump.

Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

The stable/kilo upstream fix is incorporated into os-ansible-deployment's https://review.openstack.org/211265

Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

The stable/juno upstream fix has now merged.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-ansible-deployment (juno)

Fix proposed to branch: juno
Review: https://review.openstack.org/217098

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (juno)

Reviewed: https://review.openstack.org/217098
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=5bb99c8fbac479e74c93f954b48d316e7c1755d4
Submitter: Jenkins
Branch: juno

commit 5bb99c8fbac479e74c93f954b48d316e7c1755d4
Author: Jesse Pretorius <email address hidden>
Date: Wed Aug 26 12:23:41 2015 +0100

    Updated juno to include fix for CVE-2015-3241 - 26 Aug 2015

    Updates all repo SHAs to include:
     - [OSSA 2015-015] Nova instance migration process does not stop when
       instance is deleted (CVE-2015-3241)
     - Cannot rebuild a VM created from a Cinder volume backed by NetApp
       https://review.openstack.org/203253

    This patch removes all SHA specifications for the OpenStack clients
    so that any client requirement versions are determined solely from
    the requirements of the OpenStack service and Global requirements
    files.

    This resolves issues where the client versions we carry are
    incompatible with the services, resulting in unexpected failures.

    Tempest requirements are specifically ignored as the tempest role
    downloads its required clients directly from pip.

    Finally, due to the change in the openstack-client version used the
    heat domain/role/user tasks fail as they require the CLI to specify
    the user and project domain name/id. This patch includes the CLI
    changes to ensure that the deployment completes properly.

    Change-Id: I88e705d063575f55c32350ead25bc7cc0bcdec08
    Closes-Bug: #1489947
    Closes-Bug: #1488315
    Closes-Bug: #1400881

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.