Volume status will be changed to "available" in spite of still attached to VM instance

Bug #1587285 reported by Rikimaru Honjo
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Incomplete
Medium
Rikimaru Honjo

Bug Description

* (06/06/2016)I corrected steps to reproduce by #3's description.
* (15/09/2016)I improved steps to reproduce by #5's description.

[Summary]
Volume status will be changed to "available" in spite of still attached to VM instance.

[Version]
Later than 13.0.0

[Impact]
Under specific condition, volume status will be changed to "available" in spite of still attached to VM instance.
In this case, guest OS of VM instance can I/O.

If this volume would be attached to other VM instance, volume data would corrupted by I/O from both VM instance.

[Steps to reproduce]

1. Create a volume named "volume-A".

2. Add following break-point to nova-compute.
   And, Please restart nova-compute.

-------------------------------------------------------
diff --git a/nova/compute/manager.py b/nova/compute/manager.py
index 9783d39..948a02e 100644
--- a/nova/compute/manager.py
+++ b/nova/compute/manager.py
    def _build_and_run_instance(self, context, instance, image, injected_files,
            admin_password, requested_networks, security_groups,
            block_device_mapping, node, limits, filter_properties):
[...]
                 self._validate_instance_group_policy(context, instance,
                         filter_properties)
                 image_meta = objects.ImageMeta.from_dict(image)
+ import pdb;pdb.set_trace()
                 with self._build_resources(context, instance,
                         requested_networks, security_groups, image_meta,
                         block_device_mapping) as resources:

-------------------------------------------------------

3. Launch "VM-A" without volume.
   Please wait until "VM-A"'s status is changed to "ACTIVE".

4. Launch "VM-B" with "volume-A".
   (Please specify "block-device-mapping" option.)

5. Kill nova-compute process while that is stopped by break-point.
   (Use "kill" command. This is instead of unexpected disaster.)
   After killing, please restart nova-compute.

6. "VM-B"'s status is changed to "ERROR" from "BUILD" as a result.
   "Volume-A"'s status is still "available".

7. Attach "volume-A" to "VM-A" by volume-attach API.

8. Volume-attach API is completed.
   "volume-A"'s status is changed to "in-use" from "available".

9. Delete "VM-B".

10. Deleting "VM-B" is completed.
    And "volume-A"'s status is changed to "available" from "in-use"!
    Even "volume-A" is still attached to "VM-A"!

I think "volume-A"'s status should not be changed by deleting "VM-A" at step-10 because "volume-A" was attached to "VM-B".

Revision history for this message
Prateek Arora (parora) wrote :

Rikimaru San

The case you are talking about looks like unreproducible to me.

The code snippet posted by you

         volume_bdm = self._create_volume_bdm(
             context, instance, device, volume_id, disk_bus=disk_bus,
             device_type=device_type)
+ raise Exception

would raise an exception at that point, which won't be handled as it has no corresponding exception handling code. As such nova wouldn't come further.

Assuming the first exception was not present and only the 2nd one was present, a proper cleanup would be done and the bdm database entry would be destroyed.

Can I please ask you whether you faced such a situation somewhere without the code hack ?

Changed in nova:
status: New → Incomplete
Revision history for this message
Rikimaru Honjo (honjo-rikimaru-c6) wrote :

Hi Prateek,

Thank you for reproducing.
And, sorry for bothering you.
The steps to reproduce was incomplete.

> Assuming the first exception was not present and only the 2nd one was present, a proper cleanup would be done and the bdm database entry would be destroyed.
Yes, you are right if cleanup would be done.
But, I would like to say that there is a possibility of failing to clean bdm records.

> Can I please ask you whether you faced such a situation somewhere without the code hack ?
For example, nova-api process was dead between creating bdm record and executing reserve-volume API.
In this case, bdm record will remain.

Following steps are improved procedures to reproduce.

1. Add following break-point to nova-api.

-------------------------------------------------------
--- a/nova/compute/api.py
+++ b/nova/compute/api.py
@@ -3108,10 +3108,12 @@ class API(base.Base):
         volume_bdm = self._create_volume_bdm(
             context, instance, device, volume_id, disk_bus=disk_bus,
             device_type=device_type)
         try:
+ import pdb;pdb.set_trace()
             self._check_attach_and_reserve_volume(context, volume_id, instance)
             self.compute_rpcapi.attach_volume(context, instance, volume_bdm)
         except Exception:
             with excutils.save_and_reraise_exception():
                 volume_bdm.destroy()
-------------------------------------------------------

2. Please launch two nova-api processes that's like High availability.(*1)
   (Two processes reference same DB, and listen different address & port.)

3. Attach "volume-A" to "VM-A" by volume-attach API.

4. Kill nova-api process that was received volume-attach API while nova-api process was stopped by break-point.

   "Volume-A"'s status is still "available" as a result.

5. Attach "volume-A" to "VM-B" by volume-attach API.
   (Send API to the nova-api process that was not killed.)

6. Please press "c" at break-point for continuing volume-attach.

7. Volume-attach API is completed.
   "volume-A"'s status is changed to "in-use" from "available".

8. Delete "VM-A".
   (Send API to the nova-api process that was not killed.)

9. Deleting "VM-A" is completed.
   And "volume-A"'s status is changed to "available" from "in-use"!

*1: If there is only one process, remaining bdm record will be cleaned when nova-api is restarted.

summary: - Unexpected volume-detach happen under a specific condition
+ Volume status will be change "available" in spite of still attached to
+ VM instance
summary: - Volume status will be change "available" in spite of still attached to
- VM instance
+ Volume status will be changed to "available" in spite of still attached
+ to VM instance
description: updated
description: updated
Changed in nova:
assignee: nobody → Rikimaru Honjo (honjo-rikimaru-c6)
Changed in nova:
assignee: Rikimaru Honjo (honjo-rikimaru-c6) → nobody
tags: added: volumes
tags: added: compute
Changed in nova:
assignee: nobody → Rikimaru Honjo (honjo-rikimaru-c6)
Changed in nova:
status: Incomplete → Opinion
status: Opinion → Incomplete
Revision history for this message
Rikimaru Honjo (honjo-rikimaru-c6) wrote :

I wrote another reproduce steps.
This is easier than above procedures.

1. Create a volume named "volume-A".

2. Add following break-point to nova-compute.
   And, Please restart nova-compute.

-------------------------------------------------------
diff --git a/nova/compute/manager.py b/nova/compute/manager.py
index 9783d39..948a02e 100644
--- a/nova/compute/manager.py
+++ b/nova/compute/manager.py
    def _build_and_run_instance(self, context, instance, image, injected_files,
            admin_password, requested_networks, security_groups,
            block_device_mapping, node, limits, filter_properties):
[...]
                 self._validate_instance_group_policy(context, instance,
                         filter_properties)
                 image_meta = objects.ImageMeta.from_dict(image)
+ import pdb;pdb.set_trace()
                 with self._build_resources(context, instance,
                         requested_networks, security_groups, image_meta,
                         block_device_mapping) as resources:

-------------------------------------------------------

3. Launch "VM-A" without volume.
   Please wait until "VM-A"'s status is changed to "ACTIVE".

4. Launch "VM-B" with "volume-A".
   (Please specify "block-device-mapping" option.)

5. Kill nova-compute process while that is stopped by break-point.
   (Use "kill" command. This is instead of unexpected disaster.)
   After killing, please restart nova-compute.

6. "VM-B"'s status is changed to "ERROR" from "BUILD" as a result.
   "Volume-A"'s status is still "available".

7. Attach "volume-A" to "VM-A" by volume-attach API.

8. Volume-attach API is completed.
   "volume-A"'s status is changed to "in-use" from "available".

9. Delete "VM-B".

10. Deleting "VM-B" is completed.
    And "volume-A"'s status is changed to "available" from "in-use"!
    Even "volume-A" is still attached to "VM-A"!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/354617

Changed in nova:
status: Incomplete → In Progress
Revision history for this message
Maciej Szankin (mszankin) wrote :

I do not like the "under specific conditions" part. Intentional killing compute process plus pdb may simulate a possible unexpected disaster, but I just can't see anything but a very narrow corner case. I guess someone wiser has to take a look; adding needs-attention tag.

tags: added: needs-attention
Revision history for this message
Rikimaru Honjo (honjo-rikimaru-c6) wrote :

Thank you for adding tag!

Certainly this is a narrow corner case.
But, I think that this case is very dangerous.
Someone could attach the volume to other instances if the attached volume's status have been changed to "available".
And the volume will be attached to two instances at the same time.

Revision history for this message
John Garbutt (johngarbutt) wrote :

Ah, so the problem is that when we do a delete, we clean up an attachment that we never actually owned. Thats seems like it should be fixed.

Changed in nova:
importance: Undecided → Low
Revision history for this message
John Garbutt (johngarbutt) wrote :

While the impact might be high, the likelyhood is low, so I have gone for medium.

Changed in nova:
importance: Low → Medium
tags: removed: needs-attention
tags: added: mitaka-backport-potential
description: updated
Revision history for this message
Lee Yarwood (lyarwood) wrote :

> 6. "VM-B"'s status is changed to "ERROR" from "BUILD" as a result.
> "Volume-A"'s status is still "available".

Shouldn't we focus on this part and ensure that when the instance is moved to an ERROR state that we also clean up any associated bdms etc from the DB to avoid :

> 10. Deleting "VM-B" is completed.
> And "volume-A"'s status is changed to "available" from "in-use"!
> Even "volume-A" is still attached to "VM-A"!

Revision history for this message
Rikimaru Honjo (honjo-rikimaru-c6) wrote :

Hi Lee,

Sorry for replying late.

Your idea is basically good.
But, the idea doesn't resolve following cases.

* case1: Call volume-attach API instead of creating VM-B. And, volume-attach API is failed between creating bdm and reserve_volume by nova-api death.
In this case, VM-B is not changed to "ERROR", and bdm record remains.

* case2: Attach "volume-A" to "VM-A" before restarting nova-compute. And, delete VM-B.

Revision history for this message
Rikimaru Honjo (honjo-rikimaru-c6) wrote :

(Reprint from gerrit)

I have rethought how to resolve this problem after submitting patch-set 9.
But, I couldn't bethink better ideas.
I think that the current idea is the only feasible one.

The other ideas below have big problems.
--------------
(1)Delete garbage BDM records when new BDM record is created.

 pros:
* This idea is very simple.
* This idea guards BDM table from corruption.

 cons:
* There will be several corrected BDM records if we specify a multi-attached volume.[1] In this case, distinguishing gaarbage records from in-progress records is impossible.

[1]"multi-attach-volume" bp (https://blueprints.launchpad.net/nova/+spec/multi-attach-volume) is in-progress.

--------------
(2)Cinder decide to process or not.
Cinder decide to process or not by self when it receives terminate_connection API/detach API.

I think that we can use volume_attachment records for above decision.
But, volume_attachment record is currently created at detach API process. This is too late.
volume_attachment record should be created at first of initialize_connection API process for above decision.

And, we should add "instance_uuid" parameter to initialize_connection API.
Otherwise we couldn't find out the volume_attachment record created at initialize_connection API in detach API process.

pros:
* Cinder already decide to change status or not in detach process. So adding conditional branches to cinder is looks natural to me.

cons:
* Changing initialize-connection API interface doesn't support backward compatibility.

--------------

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/354617
Reason: This review is > 4 weeks without comment, and is not mergable in it's current state. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing the status back to the previous state and unassigning. If there are active reviews related to this bug, please include links in comments.

Changed in nova:
status: In Progress → Incomplete
assignee: Rikimaru Honjo (honjo-rikimaru-c6) → nobody
Revision history for this message
Rikimaru Honjo (honjo-rikimaru-c6) wrote :

I'm still working to this bug.
But I'm fixing following a related bug now.

Unnecessary bdm entry is created if same volume is attached twice to an instance
https://bugs.launchpad.net/nova/+bug/1427060

I'll return to this bug after finishing that.

Changed in nova:
assignee: nobody → Rikimaru Honjo (honjo-rikimaru-c6)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Balazs Gibizer (<email address hidden>) on branch: master
Review: https://review.opendev.org/354617
Reason: This is a pretty old patch with negative review. Feel free to restore it (or ask gibi on irc to restore it) if you still working on it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.