cinder does not handle missing volume group gracefully (stuck in "creating")

Bug #1242942 reported by Phil Frost
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Low
Flavio Percoco
Havana
Fix Released
High
Flavio Percoco

Bug Description

Tested with Havana rc2 from the UCA on Precise.

If the cinder-volumes (in a default configuration) LVM volume group does not exist, cinder will try to create a volume, which is bound to fail. The volume then gets stuck in the "creating" state and can't be deleted. The log will contain:

2013-10-17 09:29:58.188 16676 ERROR cinder.brick.local_dev.lvm [req-5c03777b-4acc-4784-b463-278dee0d2e08 None None] Unable to locate Volume Group cinder-volumes
2013-10-17 09:29:58.189 16676 ERROR cinder.volume.manager [req-5c03777b-4acc-4784-b463-278dee0d2e08 None None] Error encountered during initialization of driver: LVMISCSIDriver
2013-10-17 09:29:58.189 16676 ERROR cinder.volume.manager [req-5c03777b-4acc-4784-b463-278dee0d2e08 None None] Bad or unexpected response from the storage volume backend API: Volume Group cinder-volumes does not exist
2013-10-17 09:29:58.189 16676 TRACE cinder.volume.manager Traceback (most recent call last):
2013-10-17 09:29:58.189 16676 TRACE cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 190, in init_host
2013-10-17 09:29:58.189 16676 TRACE cinder.volume.manager self.driver.check_for_setup_error()
2013-10-17 09:29:58.189 16676 TRACE cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/lvm.py", line 94, in check_for_setup_error
2013-10-17 09:29:58.189 16676 TRACE cinder.volume.manager raise exception.VolumeBackendAPIException(data=message)
2013-10-17 09:29:58.189 16676 TRACE cinder.volume.manager VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Volume Group cinder-volumes does not exist
2013-10-17 09:29:58.189 16676 TRACE cinder.volume.manager
2013-10-17 09:30:47.198 16676 WARNING cinder.volume.manager [req-d5f8a463-c097-4518-829b-504bf02763b2 None None] Unable to update stats, driver is uninitialized

Resetting the state with "cinder reset-state" will get the volume to the "available" state, which it isn't. Deleting or force-deleting will also fail, getting stuck in "deleting" state forever. The only solution I found was to directly kill the relevant rows in the volumes table and cinder-manage db sync.

I know in grizzly, cinder-volume would refuse to start if it couldn't find the volume-group. I thought that behavior was better. Failing when attempting to create the volume, instead of getting stuck in the creating state, would also be acceptable.

Revision history for this message
John Griffith (john-griffith) wrote :

So we made some changes that now allow the volume service to start, mostly to accommodate the case of multiple back-ends, there's no reason that if one backend service isn't ready/available why the entire service should not run.

The delete should now work on a volume in error-state (ie you should not have needed to use the reset-state cmd). By changing the state to "available" you've now created a situation where the manager is going to try and connect to the backend driver which isn't running and thus your hang.

To summarize, you should be able to just do a regular delete in this scenario. That being said we should have a look and try to make things a bit more forgiving if the state is changed like in your case.

tags: added: service-startup
Changed in cinder:
status: New → Triaged
importance: Undecided → Low
Revision history for this message
Phil Frost (bitglue) wrote : Re: [Bug 1242942] Re: cinder does not handle missing volume group gracefully

On Oct 21, 2013, at 19:08, John Griffith <email address hidden> wrote:

> That being said we should have a look and try to make things
> a bit more forgiving if the state is changed like in your case.

I was also not able to delete before resetting the state. The state remains stuck at "creating" and never fails. I can not delete in such a state. Also can not delete after resetting state.

Revision history for this message
zhangyanzi (zhangyanzi) wrote : Re: cinder does not handle missing volume group gracefully

I think it is better the state of volume become 'error' when LVM volume group does not exist.

Revision history for this message
wanghao (wanghao749) wrote :

Maybe we can fix this bug. If LVM volume group does not exist, the volume state return 'error' when creating them, then the user is able to delete it without resetting the state.

Changed in cinder:
assignee: nobody → wanghao (wanghao749)
Revision history for this message
John Griffith (john-griffith) wrote :

so that's kinda my point here, I tested this on latest master by removing the VG and restarting the service. The result was as expected with the volume going to error state, not hung in creating. I also tested this by setting an invalid VG name in my cinder.conf file.

perhaps more details can be provided on how you got in this state?

Revision history for this message
Phil Frost (bitglue) wrote : Re: [Bug 1242942] Re: cinder does not handle missing volume group gracefully

On 10/22/2013 08:51 PM, John Griffith wrote:
> so that's kinda my point here, I tested this on latest master by
> removing the VG and restarting the service. The result was as expected
> with the volume going to error state, not hung in creating. I also
> tested this by setting an invalid VG name in my cinder.conf file.
>
> perhaps more details can be provided on how you got in this state?

I'll do some more testing. That's definitely not the behavior I saw. I
was testing with a slightly old Havana rc2, so maybe it's been fixed
since then.

Revision history for this message
Phil Frost (bitglue) wrote : Re: cinder does not handle missing volume group gracefully
Download full text (6.5 KiB)

The proper Havana release in in the UCA now, so I've upgraded to that. I'm still able to reproduce this. It's simple for me: I'm deploying with the puppet-openstack modules: https://forge.puppetlabs.com/puppetlabs/openstack. I can't think of anything unusual about the setup -- simply don't create a cinder-volumes volume group, and try to make a volume.

cinder-volume will log, at startup:

2013-10-31 11:45:32.311 15709 TRACE cinder.volume.manager Traceback (most recent call last):
2013-10-31 11:45:32.311 15709 TRACE cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 190, in init_host
2013-10-31 11:45:32.311 15709 TRACE cinder.volume.manager self.driver.check_for_setup_error()
2013-10-31 11:45:32.311 15709 TRACE cinder.volume.manager File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/lvm.py", line 94, in check_for_setup_error
2013-10-31 11:45:32.311 15709 TRACE cinder.volume.manager raise exception.VolumeBackendAPIException(data=message)
2013-10-31 11:45:32.311 15709 TRACE cinder.volume.manager VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Volume Group cinder-volumes does not exist

then when a volume is asked to be created:

2013-10-31 11:46:11.638 15709 ERROR cinder.openstack.common.rpc.amqp [req-70eebc51-fca6-410d-8caa-9415a4a21530 b7b8f92e13534c2bbd32b0ff1b801b76 a2e59bca1d7a48eb895f4f7806bb89d6] Exception during message handling
2013-10-31 11:46:11.638 15709 TRACE cinder.openstack.common.rpc.amqp Traceback (most recent call last):
2013-10-31 11:46:11.638 15709 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/cinder/openstack/common/rpc/amqp.py", line 441, in _process_data
2013-10-31 11:46:11.638 15709 TRACE cinder.openstack.common.rpc.amqp **args)
2013-10-31 11:46:11.638 15709 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/cinder/openstack/common/rpc/dispatcher.py", line 148, in dispatch
2013-10-31 11:46:11.638 15709 TRACE cinder.openstack.common.rpc.amqp return getattr(proxyobj, method)(ctxt, **kwargs)
2013-10-31 11:46:11.638 15709 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 807, in wrapper
2013-10-31 11:46:11.638 15709 TRACE cinder.openstack.common.rpc.amqp raise exception.DriverNotInitialized(driver=driver_name)
2013-10-31 11:46:11.638 15709 TRACE cinder.openstack.common.rpc.amqp DriverNotInitialized: Volume driver 'LVMISCSIDriver' not initialized.

At this point, there's a volume, and it's stuck in "creating", forever:

pfrost@os-controller01:~$ cinder list
+--------------------------------------+----------+--------------+------+-------------+----------+-------------+
| ID | Status | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+----------+--------------+------+-------------+----------+-------------+
| c950bca7-5d31-465f-a0c7-9503f845b265 | creating | test | 1 | None | false | |
+--------------------------------------+----------+--------------+------+-------------+---------...

Read more...

Revision history for this message
Phil Frost (bitglue) wrote :

version information:

pfrost@os-controller01:~$ apt-cache policy cinder-api
cinder-api:
  Installed: 1:2013.2-0ubuntu1~cloud0
  Candidate: 1:2013.2-0ubuntu1~cloud0
  Version table:
 *** 1:2013.2-0ubuntu1~cloud0 0
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu/ precise-updates/havana/main amd64 Packages
        100 /var/lib/dpkg/status
pfrost@os-controller01:~$ cinder --version
1.0.6
pfrost@os-controller01:~$ sudo cinder-manage version list
2013.2

Revision history for this message
John Griffith (john-griffith) wrote :

Very odd, I guess I'll have to duplicate your setup exactly (ie Puppet, Ubuntu pkg's etc)

In devstack just commented out the vgcreate lines in lib/cinder I get exactly as described earlier:

=============================
vagrant@precise64 ~ $ sudo vgs
  VG #PV #LV #SN Attr VSize VFree
  precise64 1 2 0 wz--n- 79.76g 0
vagrant@precise64 ~ $ cinder create 1
+---------------------+--------------------------------------+
| Property | Value |
+---------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| created_at | 2013-10-31T17:15:07.517469 |
| display_description | None |
| display_name | None |
| id | e0206c48-c859-4c3f-860e-b9ca3493acf3 |
| metadata | {} |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| volume_type | None |
+---------------------+--------------------------------------+
vagrant@precise64 ~ $ cinder list
+--------------------------------------+--------+--------------+------+-------------+----------+-------------+
| ID | Status | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+--------+--------------+------+-------------+----------+-------------+
| e0206c48-c859-4c3f-860e-b9ca3493acf3 | error | None | 1 | None | false | |
+--------------------------------------+--------+--------------+------+-------------+----------+-------------+
vagrant@precise64 ~ $ cd /opt/stack/cinder/
vagrant@precise64 /opt/stack/cinder $ git status
# On branch stable/havana
nothing to commit (working directory clean)
vagrant@precise64 /opt/stack/cinder $

Changed in cinder:
status: Triaged → Incomplete
Revision history for this message
Phil Frost (bitglue) wrote : Re: [Bug 1242942] Re: cinder does not handle missing volume group gracefully

On 10/31/2013 01:17 PM, John Griffith wrote:
> Very odd, I guess I'll have to duplicate your setup exactly (ie Puppet,
> Ubuntu pkg's etc)

I could try expressing the entire environment with Vagrant. Would that
be helpful?

Revision history for this message
John Griffith (john-griffith) wrote : Re: cinder does not handle missing volume group gracefully

Phil,
That would be great and that means I'd get to it much sooner :)

Revision history for this message
Eric Harney (eharney) wrote :

My Fedora 19 environment does this as well. Invalid volume_group specified in cinder.conf -> create a volume -> stays in "creating" state. Calling "force-delete" then results in the volume staying in the "deleting" state.

Revision history for this message
wanghao (wanghao749) wrote :

I used the devstack to build the new environment, it looks like this bug still exist.

There is cinder version:
root@ubuntu:/opt/stack/cinder# cinder --version
1.0.7

root@ubuntu:/opt/stack/cinder# git status
# On branch master
nothing to commit (working directory clean)

The volume status keep 'creating' all the time.

root@ubuntu:/opt/stack/cinder# cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
| ID | Status | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
| 1b703aa5-2711-43e9-b419-01a9f07d933e | creating | None | 1 | None | false | |
| f613155d-1094-4804-910e-dd510f53a9d0 | available | None | 1 | None | false | |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+

Revision history for this message
Eric Harney (eharney) wrote :

I can reproduce this -- updating so it doesn't get lost.

Changed in cinder:
assignee: wanghao (wanghao749) → Eric Harney (eharney)
status: Incomplete → Confirmed
Revision history for this message
Eric Harney (eharney) wrote :

Related: bug 1211839

Revision history for this message
Eric Harney (eharney) wrote :

Related: bug 1053931

Eric Harney (eharney)
summary: - cinder does not handle missing volume group gracefully
+ cinder does not handle missing volume group gracefully (stuck in
+ "creating")
Revision history for this message
Flavio Percoco (flaper87) wrote :

I was able to replicate this issue. Since the driver was not initialized correctly - due to thenon existing VG - the rpc cast fails even before getting into the volume manager. This failure is caused by the check done in this decorator[0].

Unfortunately, this is true not just in the case of volume creation but also for all other decorated methods. One possible solution for this is to move the decorator inside the volume manager method and set the volume status as error if there's no driver available. Another possible solution, perhaps a more complex one, would be to re-schedule the task and send it to another volume node in order to fulfill the volume creation request.

In both cases, I think this decorator should be moved into the method body.

[0] https://git.openstack.org/cgit/openstack/cinder/tree/cinder/volume/manager.py#n236

Changed in cinder:
assignee: Eric Harney (eharney) → Flavio Percoco (flaper87)
Revision history for this message
Flavio Percoco (flaper87) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/61088

Changed in cinder:
status: Confirmed → In Progress
Mike Perez (thingee)
Changed in cinder:
milestone: none → icehouse-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/61088
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=5be4620ae5bb50c8436de0e11269c85a095ed40b
Submitter: Jenkins
Branch: master

commit 5be4620ae5bb50c8436de0e11269c85a095ed40b
Author: Flavio Percoco <email address hidden>
Date: Tue Dec 10 12:31:50 2013 +0100

    Move driver initialization check into the method

    Volumes and backups managers' methods are decorated with
    `require_initialized_driver` which checks whether the driver has been
    initialized or not. The decorator fails with a `DriverNotInitialized`
    exception if the driver hasn't been initialized.

    This early failure leaves volumes and backups in a wrong status which is
    not just confusing for the user but it also makes it difficult to do
    anything with the resources after they've been left in a 'bogus' status.

    For example, when a volume creation is requested, the volume is first
    created in the database and its status is set to 'creating'. Then the
    scheduler will pick an available volume node and send the task to it. If
    the driver has not been initialized, the volume status will be left as
    'creating' instead of 'error'.

    This patch fixes that issue by moving the driver initialization check
    into the various manager's methods. In some cases this check is done at
    the very beginning of the method, in some others - either to avoid code
    duplication or because the lines above the check made sense to be
    executed first - this check is done later in the method.

    Change-Id: I2610be6ba1aa7df417f1a1f7bb27af30273e4814
    Closes-bug: #1242942

Changed in cinder:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/67097

Thierry Carrez (ttx)
Changed in cinder:
status: Fix Committed → Fix Released
Revision history for this message
Alan Pevec (apevec) wrote :

This leaves the volumes in inconsistent status and is user visible issue, setting High for Havana branch.
It was reported in RHOS4.0/Havana: https://bugzilla.redhat.com/show_bug.cgi?id=1016224

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/havana)

Reviewed: https://review.openstack.org/67097
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=4228c0ebc237d821a8784a1dcf165f238cddc56e
Submitter: Jenkins
Branch: stable/havana

commit 4228c0ebc237d821a8784a1dcf165f238cddc56e
Author: Flavio Percoco <email address hidden>
Date: Tue Dec 10 12:31:50 2013 +0100

    Move driver initialization check into the method

    Volumes and backups managers' methods are decorated with
    `require_initialized_driver` which checks whether the driver has been
    initialized or not. The decorator fails with a `DriverNotInitialized`
    exception if the driver hasn't been initialized.

    This early failure leaves volumes and backups in a wrong status which is
    not just confusing for the user but it also makes it difficult to do
    anything with the resources after they've been left in a 'bogus' status.

    For example, when a volume creation is requested, the volume is first
    created in the database and its status is set to 'creating'. Then the
    scheduler will pick an available volume node and send the task to it. If
    the driver has not been initialized, the volume status will be left as
    'creating' instead of 'error'.

    This patch fixes that issue by moving the driver initialization check
    into the various manager's methods. In some cases this check is done at
    the very beginning of the method, in some others - either to avoid code
    duplication or because the lines above the check made sense to be
    executed first - this check is done later in the method.

    NOTE: Regardless the conflicts noted below, this patch should be
    backported. The issue it fixes is a source of several bug reports, user
    frustration and confusion. The conflicts were related to some additions
    in the master branch. Resolving the conflicts was pretty
    straightforward.

    Conflicts:
     cinder/tests/test_volume.py
     cinder/utils.py
     cinder/volume/flows/create_volume/__init__.py
     cinder/volume/manager.py

    Closes-bug: #1242942
    (cherry picked from commit 5be4620ae5bb50c8436de0e11269c85a095ed40b)
    Change-Id: I2610be6ba1aa7df417f1a1f7bb27af30273e4814

Thierry Carrez (ttx)
Changed in cinder:
milestone: icehouse-2 → 2014.1
Revision history for this message
Bhargava (bhargava-nagaraj) wrote :

This bug still exists in devstack (openstack-juno)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.