Ceph volumes attached to local deleted instance could not be correctly handled

Bug #1672624 reported by Zhenyu Zheng
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Cinder
New
Undecided
Unassigned
OpenStack Compute (nova)
Invalid
Undecided
Unassigned

Bug Description

How to reproduce:
1. Launch an instance.
2. Create a volume with ceph backend.
3. Attach the volume created in step 3.
4. Kill nova-compute
5. Delete the instance, this will go to local_delete
6. Check volumes status using "cinder list", the volume is in "available" status
7. Try to delete the volume, failed:
2017-03-14 11:40:41.050 DEBUG oslo_messaging._drivers.amqpdriver mreceived message with unique_id: 061b4f9b52aa425d97811c066133b170 from (pid=480) __call__ /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:215
15:05 2017-03-14 11:40:41.056 DEBUG cinder.coordination req-774b4680-d861-4e16-bad4-a032ff0b3579 None Lock "7c7d03d9-3244-4923-b72e-459677ee48aa-delete_volume" acquired by "delete_volume" :: waited 0.000s from (pid=480) _synchronized /opt/stack/cinder/cinder/coordination.py:300
15:05 2017-03-14 11:40:41.155 DEBUG cinder.volume.drivers.rbd req-774b4680-d861-4e16-bad4-a032ff0b3579 00;36madmin None connecting to ceph (timeout=-1). from (pid=480) _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:299
15:05 2017-03-14 11:40:42.376 DEBUG cinder.volume.drivers.rbd req-774b4680-d861-4e16-bad4-a032ff0b3579 None volume has no backup snaps from (pid=480) _delete_backup_snaps /opt/stack/cinder/cinder/volume/drivers/rbd.py:660
15:05 2017-03-14 11:40:42.377 DEBUG cinder.volume.drivers.rbd req-774b4680-d861-4e16-bad4-a032ff0b3579 admin None Volume volume-7c7d03d9-3244-4923-b72e-459677ee48aa is not a clone. from (pid=480) _get_clone_info /opt/stack/cinder/cinder/volume/drivers/rbd.py:683
15:06 2017-03-14 11:40:42.382 DEBUG cinder.volume.drivers.rbd req-774b4680-d861-4e16-bad4-a032ff0b3579 None deleting rbd volume volume-7c7d03d9-3244-4923-b72e-459677ee48aa from (pid=480) delete_volume /opt/stack/cinder/cinder/volume/drivers/rbd.py:781
15:06 2017-03-14 11:40:42.570 DEBUG cinder.utils req-774b4680-d861-4e16-bad4-a032ff0b3579 admin None Failed attempt 1 from (pid=480) _print_stop /opt/stack/cinder/cinder/utils.py:780
...
15:07 2017-03-14 11:41:12.950 WARNING cinder.volume.drivers.rbd req-774b4680-d861-4e16-bad4-a032ff0b3579 ^admin NoneImageBusy error raised while deleting rbd volume. This may have been caused by a connection from a client that has crashed and, if so, may be resolved by retrying the delete after 30 seconds has elapsed.
15:07 2017-03-14 11:41:12.955 ERROR cinder.volume.manager req-774b4680-d861-4e16-bad4-a032ff0b3579 admin None^Unable to delete busy volume.

Tags: cells
summary: - Volumes attached to local deleted ceph volume could not be correctly
+ Ceph volumes attached to local deleted instance could not be correctly
handled
description: updated
affects: nova → cinder
affects: cinder → nova
Revision history for this message
John Garbutt (johngarbutt) wrote :

Add the cells tag, as the work has added lots of new code paths for local delete cases. Seeing errors around inconsistent handling of quotas, notifications and other things that hit this code path.

tags: added: cells
Revision history for this message
melanie witt (melwitt) wrote :
Download full text (6.5 KiB)

I couldn't reproduce this issue, I found that the volumes were correctly detached for the local delete.

$ git rev-parse HEAD
edf51119fa59ff8a3337abb9107a06fa33d3c68f

$ nova boot --flavor m1.nano --block-device source=image,id=59f7eeb3-700a-456f-8d3f-9dfc4cce797b,dest=volume,size=1,shutdown=remove,bootindex=0 --poll hi

+--------------------------------------+-------------------------------------------------+
| Property | Value |
+--------------------------------------+-------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | |
| OS-EXT-SRV-ATTR:host | - |
| OS-EXT-SRV-ATTR:hostname | hi |
| OS-EXT-SRV-ATTR:hypervisor_hostname | - |
| OS-EXT-SRV-ATTR:instance_name | |
| OS-EXT-SRV-ATTR:kernel_id | |
| OS-EXT-SRV-ATTR:launch_index | 0 |
| OS-EXT-SRV-ATTR:ramdisk_id | |
| OS-EXT-SRV-ATTR:reservation_id | r-nk9ygaf0 |
| OS-EXT-SRV-ATTR:root_device_name | - |
| OS-EXT-SRV-ATTR:user_data | - |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | - |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| adminPass | aiGbhNhGC7nB |
| config_drive | |
| created | 2017-03-15T04:27:47Z |
| description | - |
| flavor | m1.nano (42) |
| hostId | |
| host_status | |
| id | 957b1fa8-0b9c-483d-a049-499cd4545188 |
| image | Attempt to boot from volume - no image supplied |
| key_name ...

Read more...

Changed in nova:
status: New → Incomplete
Revision history for this message
melanie witt (melwitt) wrote :
Download full text (10.3 KiB)

Just tried another case with shutdown=preserve and found I couldn't delete the volume from cinder, as you described. So far, it looks like a problem in cinder? The volume showed as "available" and not attached to anything in cinder, then the "cinder delete" failed to complete.

$ nova boot --flavor m1.nano --block-device source=image,id=59f7eeb3-700a-456f-8d3f-9dfc4cce797b,dest=volume,size=1,shutdown=preserve,bootindex=0 --poll hi

+--------------------------------------+-------------------------------------------------+
| Property | Value |
+--------------------------------------+-------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | |
| OS-EXT-SRV-ATTR:host | - |
| OS-EXT-SRV-ATTR:hostname | hi |
| OS-EXT-SRV-ATTR:hypervisor_hostname | - |
| OS-EXT-SRV-ATTR:instance_name | |
| OS-EXT-SRV-ATTR:kernel_id | |
| OS-EXT-SRV-ATTR:launch_index | 0 |
| OS-EXT-SRV-ATTR:ramdisk_id | |
| OS-EXT-SRV-ATTR:reservation_id | r-1j6d6cne |
| OS-EXT-SRV-ATTR:root_device_name | - |
| OS-EXT-SRV-ATTR:user_data | - |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | - |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| adminPass | M6onRYNPbEJ4 |
| config_drive | |
| created | 2017-03-15T04:37:29Z |
| description | - |
| flavor | m1.nano (42) |
| hostId | |
| host_status | |
| id | 48c9f377-52d4-46b9-bd6b-d9de2d0dc540 |
| image ...

Revision history for this message
Zhenyu Zheng (zhengzhenyu) wrote :

@melanie witt, Hi, thanks for the test, but I think you might not doing it correctly,
LVM volume can be correctly handled, but CEPH volume can't, and by looking at the code
it is likely that only LVM driver can be correctly handled. I will ask my colleague update
our test log to the report.

Changed in nova:
status: Incomplete → New
Revision history for this message
Zhenyu Zheng (zhengzhenyu) wrote :

@melanie witt, Ah, looks like you are doing with CEPH backend, and yes, what we tried is
create a volume first and then attach it to an instance, it should be equal with the
shutdown=preserve scenario?

And it is an Nova-Cinder issue, because nova didn't disconnect the volume due to compute down, and
currently cinder can only handle LVM volume in this kind of scenario. So it is probably a common
issue for all other volume backends except LVM.

Revision history for this message
wangxiyuan (wangxiyuan) wrote :

LVM works well because LVM removes export and remove the iscsi target by itself:

https://github.com/openstack/cinder/blob/master/cinder/volume/targets/iscsi.py#L220

But ceph does nothing.

https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/rbd.py#L1019
https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/rbd.py#L1043

So when nove-compute is down and nova deletes the vm with local delete logic, the vm in hypervisor is still alive and the volume is still attached to the vm. Only when the vm is destroyed, the volume can be deleted successfully.

Take a look at the cinder third part driver, we can see that most of drivers leave these two function empty (except dell_emc, dothill, hpe and so on). So I guess most of the drivers has this problem as well.

reproduce:
http://paste.openstack.org/show/602801/

Revision history for this message
Lee Yarwood (lyarwood) wrote :

To be clear, the issue here is that the domain is still active on the compute node and accessing the rbd volume so any attempt to delete the rbd volume itself fails.

I can't see how that's anything other than a failure by the operator to ensure that the original compute node has been fenced before attempting a local delete of an active instance.

Revision history for this message
Matthew Booth (mbooth-9) wrote :

Well, the operator hasn't done a local delete, they've just done a delete. Nova did a local delete because nova compute wasn't running. We also wouldn't want to fence in this case because, in general, we don't want to kill running instances unless we really have to. Nova compute being down isn't 'really have to' under most circumstances because it can just be restarted: an HA response should focus on trying to get it back up again. I don't think we can reasonably blame the operator here.

As you say, though, this does mean that an attempt to delete the volume fails because ceph won't allow us to delete a volume that still has an active connection.

Presumably when nova compute eventually comes back up we will clean up the running instance which was deleted. At this point, presumably the ceph volume can also be deleted. IOW, if we wait a bit for normal maintenance to happen, this will resolve itself automatically. I think that's a cloudy 'working as expected', especially for a weird edge case like this.

Revision history for this message
melanie witt (melwitt) wrote :
Download full text (6.4 KiB)

Confirmed that the default cleanup periodic task in compute will destroy local deleted domains when nova-compute is brought back up. The default action is "reap" which will delete orphaned domains and the periodic task runs every 30 minutes by default. It doesn't run immediately when nova-compute starts, so it takes 30 minutes after nova-compute comes back up for the local deleted instance to be reaped.

Afterward, the volume can be deleted from cinder.

The config options for cleaning local deleted instances are:

    cfg.StrOpt("running_deleted_instance_action",
        default="reap",
        choices=('noop', 'log', 'shutdown', 'reap'),
        help="""
The compute service periodically checks for instances that have been
deleted in the database but remain running on the compute node. The
above option enables action to be taken when such instances are
identified.

Possible values:

* reap: Powers down the instances and deletes them(default)
* log: Logs warning message about deletion of the resource
* shutdown: Powers down instances and marks them as non-
  bootable which can be later used for debugging/analysis
* noop: Takes no action

    cfg.IntOpt("running_deleted_instance_poll_interval",
        default=1800,
        help="""
Time interval in seconds to wait between runs for the clean up action.
If set to 0, above check will be disabled. If "running_deleted_instance
_action" is set to "log" or "reap", a value greater than 0 must be set.

Possible values:

* Any positive integer in seconds enables the option.
* 0: Disables the option.
* 1800: Default value.

    cfg.IntOpt("running_deleted_instance_timeout",
        default=0,
        help="""
Time interval in seconds to wait for the instances that have
been marked as deleted in database to be eligible for cleanup.

Possible values:

* Any positive integer in seconds(default is 0).

n-cpu.log:

2017-03-15 16:54:44.939 DEBUG oslo_service.periodic_task [req-24c2664b-fe22-44a8-8e72-0b809adf50f1 None None] Running periodic task ComputeManager._cleanup_running_deleted_instances from (pid=14524) run_periodic_tasks /usr/local/lib/python2.7/dist-packages/oslo_service/periodic_task.py:215
2017-03-15 16:54:44.947 DEBUG oslo_messaging._drivers.amqpdriver [req-24c2664b-fe22-44a8-8e72-0b809adf50f1 None None] CALL msg_id: b7adb6d039274979a8c970f83bbef78f exchange 'nova' topic 'conductor' from (pid=14524) _send /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:442
2017-03-15 16:54:44.991 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: b7adb6d039274979a8c970f83bbef78f from (pid=14524) __call__ /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:299
2017-03-15 16:54:44.995 INFO nova.compute.manager [req-24c2664b-fe22-44a8-8e72-0b809adf50f1 None None] [instance: 67244c6e-7e04-4bf1-8dc2-06d6dcfb9a89] Destroying instance with name label 'instance-00000002' which is marked as DELETED but still present on host.
2017-03-15 16:54:44.996 DEBUG oslo_messaging._drivers.amqpdriver [req-24c2664b-fe22-44a8-8e72-0b809adf50f1 None None] CALL msg_id: 8094e4320a41471b9e8d928c1fd23ef8 exchange 'nova' topic 'conductor' from (pid=14524) _send /usr/loca...

Read more...

Revision history for this message
melanie witt (melwitt) wrote :

This behavior is expected in the scenario where nova-compute is brought down but the instance domain is still running and keeping the volume busy. Usually in practice, nova-compute should only be down in the case of a failed compute host or one that has been properly fenced by the operator. However, if nova-compute goes down temporarily and an instance is local deleted, the instance domain will be reaped by the nova-compute periodic task after nova-compute comes back up. After the domain is reaped, the volume can be successfully deleted from cinder.

Changed in nova:
status: New → Invalid
Revision history for this message
Zhenyu Zheng (zhengzhenyu) wrote :

This may also affect "evacuate" operation.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.