Instance became to ERROR status during deleting with Failed to terminate process <id> with SIGKILL: Device or resource busy

Bug #1584029 reported by Georgy Dyuldin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Invalid
High
Georgy Dyuldin
9.x
Invalid
High
Georgy Dyuldin

Bug Description

Scenario:

1. Deploy OpenStack with MOS 9.0 iso #363 on one server (with fuel-qa)
2. Create image from Ubuntu trusty cloud image
3. Boot instance from created image
4. Delete instance

Expected result:

Instance disappear from instances list

Actual result:

Instance reach ERROR state

Instance fault details:

Failed to terminate process 10380 with SIGKILL: Device or resource busy
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 375, in decorated_function
    return function(self, context, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2522, in terminate_instance
    do_terminate_instance(instance, bdms)
  File "/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
    return f(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2520, in do_terminate_instance
    self._set_instance_obj_error_state(context, instance)
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2510, in do_terminate_instance
    self._delete_instance(context, instance, bdms, quotas)
  File "/usr/lib/python2.7/dist-packages/nova/hooks.py", line 154, in inner
    rv = f(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2473, in _delete_instance
    quotas.rollback()
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2437, in _delete_instance
    self._shutdown_instance(context, instance, bdms)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2346, in _shutdown_instance
    requested_networks)
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2333, in _shutdown_instance
    block_device_info)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1029, in destroy
    self._destroy(instance)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 973, in _destroy
    self._destroy(instance, attempt + 1)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 973, in _destroy
    self._destroy(instance, attempt + 1)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 974, in _destroy
    return
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 915, in _destroy
    guest.poweroff()
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 146, in poweroff
    self._domain.destroy()
  File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
    result = proxy_call(self._autowrap, f, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
    rv = execute(f, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
    six.reraise(c, e, tb)
  File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
    rv = meth(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1055, in destroy
    if ret == -1: raise libvirtError ('virDomainDestroy() failed', dom=self)

Tags: area-nova
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Georgy, could you please provide a diagnostic snapshot? (specifically, we need a full nova-compute log and a libvirtd log from the affected compute node).

Changed in mos:
status: New → Incomplete
assignee: nobody → Georgy Dyuldin (g-dyuldin)
Revision history for this message
Georgy Dyuldin (g-dyuldin) wrote :

I've catch it again.

Steps:
1. Deploy MOS 9.0 #495
2. Update quotas for creation a lot of networks:
   neutron quota-update --network 50 --subnet 50 --router 50 --port 250
3. Create max count of networks, subnets, launch instance on each
4. When no more place on computes to create new instances - wait until all created instances reach ACTIVE status.
5. Delete created instance

Actual result:

One of instances reach ERROR status

Trace (from instance):

http://paste.openstack.org/show/520593/

Diagnostic snapshot attached.

Changed in mos:
status: Incomplete → New
assignee: Georgy Dyuldin (g-dyuldin) → nobody
tags: removed: need-info
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Hi Nova team, could you please take a look?

Thank you!

Revision history for this message
Timofey Durakov (tdurakov) wrote :

Hi, @Timur, @Georgy, could you please provide following info:
- how often this bug is reproduced,
- please provide cloud image to use(link to download is ok).
- is this bug valid only for ubuntu cloud images?
- any additional info which could help to improve reproducibility will be good too.

Revision history for this message
Georgy Dyuldin (g-dyuldin) wrote :

Hi @Timofey.

We have not so much tests with deleting instances, so i think it can be reproduced in 1-3 cases from 10.
About 2-3 months ago it was appear almost always with ubuntu trusty (tests downloads latest image from https://cloud-images.ubuntu.com/trusty/current/trusty-server-cloudimg-amd64-disk1.img, so, unfortunately, i can't say version). Snapshot was made after deleting predefined cirros image "TestVM" from MOS 9.0 # 495.

Seem this bug appear more often while deleting instances under network workload (e.g. running iperf in background).

I'll try to reproduce it and pause test execution, and provide access to environment to you.

Revision history for this message
Andrey Volkov (avolkov) wrote :

Workaround for this situation is already in master and in branches.
http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/libvirt/driver.py#n766

I see 3 attemts to destroy this instance in nova compute:
2016-06-20T02:47:37.588209+00:00 warning: 2016-06-20 02:47:37.595 7810
WARNING nova.virt.libvirt.driver
[req-ba3487c0-8de2-420b-b886-43d56fbc67af
50fc999997d746ed8a6f9a2ca3b982a7 46fbc6960ebb4b11a96d5e1cf9dc3183 - -
-] [instance: da3222fb-618d-4153-a33f-26957b380075] Error from libvirt
during destroy. Code=38 Error=Failed to terminate process 29053 with
SIGKILL: Device or resource busy; attempt 1 of 3

2016-06-20T02:48:04.902519+00:00 warning: 2016-06-20 02:48:04.909 7810
WARNING nova.virt.libvirt.driver
[req-ba3487c0-8de2-420b-b886-43d56fbc67af
50fc999997d746ed8a6f9a2ca3b982a7 46fbc6960ebb4b11a96d5e1cf9dc3183 - -
-] [instance: da3222fb-618d-4153-a33f-26957b380075] Error from libvirt
during destroy. Code=38 Error=Failed to terminate process 29053 with
SIGKILL: Device or resource busy; attempt 2 of 3

2016-06-20T02:48:29.106934+00:00 warning: 2016-06-20 02:48:29.111 7810
WARNING nova.virt.libvirt.driver
[req-ba3487c0-8de2-420b-b886-43d56fbc67af
50fc999997d746ed8a6f9a2ca3b982a7 46fbc6960ebb4b11a96d5e1cf9dc3183 - -
-] [instance: da3222fb-618d-4153-a33f-26957b380075] Error from libvirt
during destroy. Code=38 Error=Failed to terminate process 29053 with
SIGKILL: Device or resource busy; attempt 3 of 3

And according to qemu log instance wasn't actually shutdown.
instance-00000028:
qemu: terminating on signal 15 from pid 4552
or log was truncated.

It would be great to have libvirtd debug logs:
echo -e "log_level = 1\nlog_outputs="1:file:/var/log/libvirt/libvirtd.log"" >> /etc/libvirt/libvirtd.conf && service libvirtd restart

Revision history for this message
Timofey Durakov (tdurakov) wrote :

moving to incomplete, as libvirtd logs required

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

It looks like we can't reproduce the issue right now, probably it is reproduced not in 100 % of cases.

I'm going to move the bug to Invalid status, please feel free to change status to Confirmed and attach the libvirt logs if the issue will be reproduced again.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.