OpenStack Compute (nova)

VM re-scheduler mechanism will cause BDM-volumes conflict

Bug #1195947 reported by wingwj on 2013-06-29

This bug affects 6 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	High	wingwj	OpenStack Compute (nova) 2014.1 "icehouse"
	Havana	Fix Released	High	Nikola Đipanov	OpenStack Compute (nova) 2013.2.1

Bug Description

Due to re-scheduler mechanism, when a user tries to
create (in error) an instance using a volume
which is already in use by another instance,
the error is correctly detected, but the recovery code
will incorrectly affect the original instance.

Need to raise exception directly when the situation above occurred.

------------------------
------------------------
We can create VM1 with BDM-volumes (for example, one volume we called it “Vol-1”).

But when the attached-volume (Vol-1..) involved in BDM parameters to create a new VM2, due to VM re-scheduler mechanism, the volume will change to attach on the new VM2 in Nova & Cinder, instead of raise an “InvalidVolume” exception of “Vol-1 is already attached on VM1”.

In actually, Vol-1 both attached on VM1 and VM2 on hypervisor. But when you operate Vol-1 on VM1, you can’t see any corresponding changes on VM2…

I reproduced it and wrote in the doc. Please check the attachment for details~

-------------------------
I checked on the Nova codes, the problem is caused by VM re-scheduler mechanism:

Now Nova will check the state of BDM-volumes from Cinder now [def _setup_block_device_mapping() in manager.py]. If any state is “in-use”, this request will fail, and trigger VM re-scheduler.

According to existing processes in Nova, before VM re-scheduler, it will shutdown VM and detach all BDM-volumes in Cinder for rollback [def _shutdown_instance() in manager.py]. As the result, the state of Vol-1 will change from “in-use” to “available” in Cinder. But, there’re nothing detach-operations on the Nova side…

Therefore, after re-scheduler, it will pass the BDM-volumes checking in creating VM2 on the second time, and all VM1’s BDM-volumes (Vol-1) will be possessed by VM2 and are recorded in Nova & Cinder DB. But Vol-1 is still attached on VM1 on hypervisor, and will also attach on VM2 after VM creation success…

---------------

Moreover, the problem mentioned-above will occur when “delete_on_termination” of BDMs is “False”. If the flag is “True”, all BDM-volumes will be deleted in Cinder because the states are already changed from “in-use” to “available” before [def _cleanup_volumes() in manager.py].
(P.S. Success depends on the specific implementation of Cinder Driver)

Thanks~

See original description

Tags:

Revision history for this message

wingwj (wingwj) wrote on 2013-06-29:

VM-BDM error-report.docx Edit (1.5 MiB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)

Revision history for this message

wingwj (wingwj) wrote on 2013-06-29:

Patch Test.docx Edit (481.2 KiB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)

We can add a new “InvalidVolume”exception branch processing in _run_instance(). If it occurred, raise the exception directly to instead of re-scheduler.
That’s the easiest way in my opinion.

The new patch I made is based on the master branch version on Jun,29th. Plz check the test-doc~~

Thanks~

Revision history for this message

wingwj (wingwj) wrote on 2013-06-29:

My Patch~ Edit (1.0 KiB, text/plain)

Here is the patch I made. Plz check it.

Thanks~

wingwj (wingwj) on 2013-07-17

Changed in nova:
assignee:	nobody → wingwj (wingwj)

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2013-07-22:

Wow - this is pretty horrible!!!

Thanks for reporting this, and doing a fix. I will comment more there.

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2013-07-22:

Since the review wasn't picked up by LP - here it is: https://review.openstack.org/#/c/38073

Nikola Đipanov (ndipanov) on 2013-07-22

Changed in nova:
status:	New → Incomplete

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2013-07-22:

I can't seem to repoduce this actually.

Nova will block this in the API since 24fffd9d8b77e9b71e8013fc22c172f76bb4e84c on master, and this was backported to both Grizzly and Folsom stable branches.

Nikola Đipanov (ndipanov) on 2013-07-22

Changed in nova:
status:	Incomplete → Confirmed

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2013-07-22:

Oooops - looks like I spoke too soon on this one - there is a race condition there.

If you fire off the two requests close to each other - but not like in the attached doc - the API will *NOT* see it as attached (depending on the race of course) and it will error out on the compute side, and detach the volume from inderneath the running instance.

To reporoduce try:

for x in 1 2; do nova boot --image 539b1a8a-f5f5-4f1b-afa0-f371337def9f --flavor 1 --block-device-mapping vdc=<VOLUME_ID>:None:1: testvm; done; watch nova list;

and see the volume become briefly attached and then unavailable as the other instance errors out and cleans it up in _reschedule_on_error.

We might need to come up with something different to avoid this completely.

Vish Ishaya (vishvananda) on 2013-07-22

Changed in nova:
importance:	Undecided → High
tags:	added: folsom-backport-potential grizzly-backport-potential

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2013-07-22:

s/unavailable/available again/ in the previous comment

Revision history for this message

wingwj (wingwj) wrote on 2013-07-25:

#11

I post the patch to Gerrit, plz check it~
https://review.openstack.org/#/c/38073/

OpenStack Infra (hudson-openstack) on 2013-08-19

Changed in nova:
status:	Confirmed → In Progress

wingwj (wingwj) on 2013-08-21

description:

updated

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-12-10: Fix merged to nova (stable/havana)

#12

Reviewed: https://review.openstack.org/54916
Committed: http://github.com/openstack/nova/commit/a2487116d583e189dcbfe6f665ba360bf147163f
Submitter: Jenkins
Branch: stable/havana

commit a2487116d583e189dcbfe6f665ba360bf147163f
Author: Nikola Dipanov <email address hidden>
Date: Fri Nov 1 13:37:13 2013 +0100

Prevent rescheduling on block device failure

    Due to a race condition - it is possible for more instances to race for
    the same volume. In such a scenario, the one that fails will get
    rescheduled, and in the process detach the volume of a successful
    instance.

    To prevent this, this patch makes nova not reschedule on block device
    failures. This is actually reasonable behaviour as block device failures
    are rarely related to the compute host itself and so rescheduling is not
    usually useful.

This is a stable/havana only fix! This same issue is addressed on the
master branch by Iefab71047996b7cc08107794d5bc628c11680a70.

Closes-bug: 1195947

Change-Id: I6b68965ac65cdb0e1da3b44e83428f056b1693aa

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2014-03-17:

#13

We were hoping that https://blueprints.launchpad.net/nova/+spec/remove-cast-to-schedule-run-instance would be making Icehouse, but sadly that did not happen, so I believe it is reasonable to propose the Havana fix that is already merged to the Icehouse tree as well.

Changed in nova:
milestone:	none → icehouse-rc1

Revision history for this message

Tracy Jones (tjones-i) wrote on 2014-03-18:

#14

wingwj - can you propose this fix then quickly? we are closing down on all but shipstoppers/regressions very soon

Revision history for this message

Tracy Jones (tjones-i) wrote on 2014-03-18:

#15

this bug could be pushed to icehouse-rc-potential if not merged by 2/24 12pm UTC

Revision history for this message

wingwj (wingwj) wrote on 2014-03-19:

#16

Hi Nikola & Tracy,

I got your message. I'll use the Nikola's Havana patch to fix this issue ASAP.

Revision history for this message

wingwj (wingwj) wrote on 2014-03-19: Re: [Bug 1195947] Re: VM re-scheduler mechanism will cause BDM-volumes conflict

#17

Download full text (3.1 KiB)

Hi Tracy,

Sorry for my late reply first.

I've already updated the patch for bug/1195947 on
https://review.openstack.org/#/c/38073/.
Please review it.

Thanks~

On Wed, Mar 19, 2014 at 5:34 AM, Tracy Jones <email address hidden> wrote:

> this bug could be pushed to icehouse-rc-potential if not merged by 2/24
> 12pm UTC
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1195947
>
> Title:
> VM re-scheduler mechanism will cause BDM-volumes conflict
>
> Status in OpenStack Compute (Nova):
> In Progress
> Status in OpenStack Compute (nova) havana series:
> Fix Released
>
> Bug description:
> Due to re-scheduler mechanism, when a user tries to
> create (in error) an instance using a volume
> which is already in use by another instance,
> the error is correctly detected, but the recovery code
> will incorrectly affect the original instance.
>
> Need to raise exception directly when the situation above occurred.
>
> ------------------------
> ------------------------
> We can create VM1 with BDM-volumes (for example, one volume we called it
> "Vol-1").
>
> But when the attached-volume (Vol-1..) involved in BDM parameters to
> create a new VM2, due to VM re-scheduler mechanism, the volume will
> change to attach on the new VM2 in Nova & Cinder, instead of raise an
> "InvalidVolume" exception of "Vol-1 is already attached on VM1".
>
> In actually, Vol-1 both attached on VM1 and VM2 on hypervisor. But
> when you operate Vol-1 on VM1, you can't see any corresponding changes
> on VM2...
>
> I reproduced it and wrote in the doc. Please check the attachment for
> details~
>
> -------------------------
> I checked on the Nova codes, the problem is caused by VM re-scheduler
> mechanism:
>
> Now Nova will check the state of BDM-volumes from Cinder now [def
> _setup_block_device_mapping() in manager.py]. If any state is "in-
> use", this request will fail, and trigger VM re-scheduler.
>
> According to existing processes in Nova, before VM re-scheduler, it
> will shutdown VM and detach all BDM-volumes in Cinder for rollback
> [def _shutdown_instance() in manager.py]. As the result, the state of
> Vol-1 will change from "in-use" to "available" in Cinder. But,
> there're nothing detach-operations on the Nova side...
>
> Therefore, after re-scheduler, it will pass the BDM-volumes checking
> in creating VM2 on the second time, and all VM1's BDM-volumes (Vol-1)
> will be possessed by VM2 and are recorded in Nova & Cinder DB. But
> Vol-1 is still attached on VM1 on hypervisor, and will also attach on
> VM2 after VM creation success...
>
> ---------------
>
> Moreover, the problem mentioned-above will occur when
> "delete_on_termination" of BDMs is "False". If the flag is "True", all
> BDM-volumes will be deleted in Cinder because the states are already
> changed from "in-use" to "available" before [def _cleanup_volumes() in
> manager.py].
> (P.S. Success depends on the specific implementation of Cinder Driver)
>
> Thanks~
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/11...

Hi Tracy,

Sorry for my late reply first.

I've already updated the patch for bug/1195947 on
https://review.openstack.org/#/c/38073/.
Please review it.

Thanks~

On Wed, Mar 19, 2014 at 5:34 AM, Tracy Jones <tjones@vmware.com> wrote:

> this bug could be pushed to icehouse-rc-potential if not merged by 2/24
> 12pm UTC
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1195947
>
> Title:
>   VM re-scheduler mechanism will cause BDM-volumes conflict
>
> Status in OpenStack Compute (Nova):
>   In Progress
> Status in OpenStack Compute (nova) havana series:
>   Fix Released
>
> Bug description:
>   Due to re-scheduler mechanism, when a user tries to
>    create (in error) an instance using a volume
>    which is already in use by another instance,
>   the error is correctly detected, but the recovery code
>    will incorrectly affect the original instance.
>
>   Need to raise exception directly when the situation above occurred.
>
>   ------------------------
>   ------------------------
>   We can create VM1 with BDM-volumes (for example, one volume we called it
> "Vol-1").
>
>   But when the attached-volume (Vol-1..) involved in BDM parameters to
>   create a new VM2, due to VM re-scheduler mechanism, the volume will
>   change to attach on the new VM2 in Nova & Cinder, instead of raise an
>   "InvalidVolume" exception of "Vol-1 is already attached on VM1".
>
>   In actually, Vol-1 both attached on VM1 and VM2 on hypervisor. But
>   when you operate Vol-1 on VM1, you can't see any corresponding changes
>   on VM2...
>
>   I reproduced it and wrote in the doc. Please check the attachment for
>   details~
>
>   -------------------------
>   I checked on the Nova codes, the problem is caused by VM re-scheduler
> mechanism:
>
>   Now Nova will check the state of BDM-volumes from Cinder now [def
>   _setup_block_device_mapping() in manager.py]. If any state is "in-
>   use", this request will fail, and trigger VM re-scheduler.
>
>   According to existing processes in Nova, before VM re-scheduler, it
>   will shutdown VM and detach all BDM-volumes in Cinder for rollback
>   [def _shutdown_instance() in manager.py]. As the result, the state of
>   Vol-1 will change from "in-use" to "available" in Cinder. But,
>   there're nothing detach-operations on the Nova side...
>
>   Therefore, after re-scheduler, it will pass the BDM-volumes checking
>   in creating VM2 on the second time, and all VM1's BDM-volumes (Vol-1)
>   will be possessed by VM2 and are recorded in Nova & Cinder DB. But
>   Vol-1 is still attached on VM1 on hypervisor, and will also attach on
>   VM2 after VM creation success...
>
>   ---------------
>
>   Moreover, the problem mentioned-above will occur when
> "delete_on_termination" of BDMs is "False". If the flag is "True", all
> BDM-volumes will be deleted in Cinder because the states are already
> changed from "in-use" to "available" before [def _cleanup_volumes() in
> manager.py].
>   (P.S. Success depends on the specific implementation of Cinder Driver)
>
>   Thanks~
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/1195947/+subscriptions
>

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2014-03-19:

#18

I also posted a patch for this https://review.openstack.org/#/c/80945/

I have no idea why it did not get picked up by LP. At this moment we can use either.

Revision history for this message

Tracy Jones (tjones-i) wrote on 2014-03-19:

#19

I think we can mark this as fix released since Nikola's patch got merged

https://review.openstack.org/#/c/80945/

Changed in nova:
status:	In Progress → Fix Committed

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2014-03-19:

#20

Tracy - I assume you meant "fix committed" (fix released is usually for once the release is actually cut).

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-03-20: Fix merged to nova (master)

#21

Reviewed: https://review.openstack.org/80945
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8f932311da19ea9de7ba1b344484ccdb748f5786
Submitter: Jenkins
Branch: master

commit 8f932311da19ea9de7ba1b344484ccdb748f5786
Author: Nikola Dipanov <email address hidden>
Date: Fri Nov 1 13:37:13 2013 +0100

Prevent rescheduling on block device failure

    This bug does not exist in the new boot code in the manager which will
    be used once remove-cast-to-schedule-run-instance bp lands (see
    Iefab71047996b7cc08107794d5bc628c11680a70). However, it is now clear
    that this will not be merged for Icehouse, so this patch is a
    "forward port" of a patch we already applied to stable/havana.

Closes-bug: #1195947

Change-Id: I6b68965ac65cdb0e1da3b44e83428f056b1693aa

Alan Pevec (apevec) on 2014-03-30

tags:

removed: folsom-backport-potential grizzly-backport-potential

Thierry Carrez (ttx) on 2014-03-31