OpenStack Compute (nova)

Volume hangs in "creating" status even though scheduler raises "No valid host" exception

Bug #1053931 reported by Dmitry Galkin on 2012-09-21

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Cinder	Fix Released	Undecided	Unassigned
	OpenStack Compute (nova)	Fix Released	Medium	Russell Bryant	OpenStack Compute (nova) 2012.2 "folsom"

Bug Description

When the volume creation process fails during scheduling (i.e. there is no appropriate host) the status in DB (and in nova volume-list output as a result) hangs with a "creating..." value.

In such case to figure out that volume creation failed one should go and see /var/log/nova/nova-scheduler.log (which is not an obvious action to do). Moreover, volume stuck with "creating..." status cannot be deleted with nova volume-delete. To delete it one have to change it's status to error in DB.

Simple scheduler is being used (nova.conf):

--scheduler_driver=nova.scheduler.simple.SimpleScheduler

Here is a sample output from DB:

*************************** 3. row ***************************
         created_at: 2012-09-21 09:55:42
         updated_at: NULL
         deleted_at: NULL
            deleted: 0
                 id: 15
             ec2_id: NULL
            user_id: b0aadfc80b094d94b78d68dcdc7e8757
         project_id: 3b892f660ea2458aa9aa9c9a21352632
               host: NULL
               size: 1
  availability_zone: nova
        instance_id: NULL
         mountpoint: NULL
        attach_time: NULL
             status: creating
      attach_status: detached
       scheduled_at: NULL
        launched_at: NULL
      terminated_at: NULL
       display_name: NULL
display_description: NULL
  provider_location: NULL
      provider_auth: NULL
        snapshot_id: NULL
     volume_type_id: NULL

Here is a part of interest in nova-scheduler.log:

    pic': u'volume', u'filter_properties': {u'scheduler_hints': {}}, u'snapshot_id': None, u'volume_id': 16}, u'_context_auth_token': '<SANITIZED>', u'_context_is_admin': True, u'_context_project_id': u'3b 892f660ea2458aa9aa9c9a21352632', u'_context_timestamp': u'2012-09-21T10:15:47.091307', u'_context_user_id': u'b0aadfc80b094d94b78d68dcdc7e8757', u'method': u'create_volume', u'_context_remote_address': u'172.18.67.146'} from (pid=11609) _safe_log /usr/lib/python2.7/dist-packages/nova/rpc/common.py:160
15 2012-09-21 10:15:47 DEBUG nova.rpc.amqp [req-01f7dd30-0421-4ef3-a675-16b0cf1362eb b0aadfc80b094d94b78d68dcdc7e8757 3b892f660ea2458aa9aa9c9a21352632] unpacked context: {'user_id': u'b0aadfc80b094d94b78d 68dcdc7e8757', 'roles': [u'admin'], 'timestamp': '2012-09-21T10:15:47.091307', 'auth_token': '<SANITIZED>', 'remote_address': u'172.18.67.146', 'is_admin': True, 'request_id': u'req-01f7dd30-0421-4ef3- a675-16b0cf1362eb', 'project_id': u'3b892f660ea2458aa9aa9c9a21352632', 'read_deleted': u'no'} from (pid=11609) _safe_log /usr/lib/python2.7/dist-packages/nova/rpc/common.py:160
14 2012-09-21 10:15:47 WARNING nova.scheduler.manager [req-01f7dd30-0421-4ef3-a675-16b0cf1362eb b0aadfc80b094d94b78d68dcdc7e8757 3b892f660ea2458aa9aa9c9a21352632] Failed to schedule_create_volume: No vali d host was found. Is the appropriate service running?
13 2012-09-21 10:15:47 ERROR nova.rpc.amqp [req-01f7dd30-0421-4ef3-a675-16b0cf1362eb b0aadfc80b094d94b78d68dcdc7e8757 3b892f660ea2458aa9aa9c9a21352632] Exception during message handling
12 2012-09-21 10:15:47 TRACE nova.rpc.amqp Traceback (most recent call last):
11 2012-09-21 10:15:47 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 253, in _process_data
10 2012-09-21 10:15:47 TRACE nova.rpc.amqp rval = node_func(context=ctxt, **node_args)
  9 2012-09-21 10:15:47 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 97, in _schedule
  8 2012-09-21 10:15:47 TRACE nova.rpc.amqp context, ex, *args, **kwargs)
  7 2012-09-21 10:15:47 TRACE nova.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
  6 2012-09-21 10:15:47 TRACE nova.rpc.amqp self.gen.next()
  5 2012-09-21 10:15:47 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 92, in _schedule
  4 2012-09-21 10:15:47 TRACE nova.rpc.amqp return driver_method(*args, **kwargs)
  3 2012-09-21 10:15:47 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/scheduler/simple.py", line 227, in schedule_create_volume
  2 2012-09-21 10:15:47 TRACE nova.rpc.amqp raise exception.NoValidHost(reason=msg)
  1 2012-09-21 10:15:47 TRACE nova.rpc.amqp NoValidHost: No valid host was found. Is the appropriate service running?
  0 2012-09-21 10:15:47 TRACE nova.rpc.amqp

Revision history for this message

Russell Bryant (russellb) wrote on 2012-09-21:

What version of nova?

Changed in nova:
status:	New → Incomplete

Revision history for this message

Russell Bryant (russellb) wrote on 2012-09-21:

Read through current nova master and appears to still be a problem. It looks like it applies to cinder, as well.

Changed in nova:
status:	Incomplete → Confirmed
importance:	Undecided → Medium

Revision history for this message

Russell Bryant (russellb) wrote on 2012-09-21:

Actually, this does not appear to affect Cinder as of this commit:

commit f758bde47439be52a743b2b4181d4900f2c1bc8a
Author: ZhuRongze <email address hidden>
Date: Sat Jul 28 08:14:03 2012 +0000

Scheduler-clean-up

Some instance related functions and opts were still present in Cinder
Scheduler.

Remove the functions and opts and modify some computer related comments.

Change-Id: Ia01099e66bcbb129493870dfbf212d5e6669ebe1

Changed in cinder:
status:	New → Invalid

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-09-21: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/13479

Changed in nova:
assignee:	nobody → Russell Bryant (russellb)
status:	Confirmed → In Progress

Russell Bryant (russellb) on 2012-09-21

tags:

added: folsom-rc-potential

Vish Ishaya (vishvananda) on 2012-09-21

Changed in nova:
milestone:	none → folsom-rc2

Vish Ishaya (vishvananda) on 2012-09-21

tags:

removed: folsom-rc-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-09-21: Fix proposed to nova (milestone-proposed)

Fix proposed to branch: milestone-proposed
Review: https://review.openstack.org/13496

John Griffith (john-griffith) on 2012-09-21

Changed in cinder:
milestone:	none → folsom-rc2
milestone:	folsom-rc2 → none

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-09-21: Fix merged to nova (master)

Reviewed: https://review.openstack.org/13479
Committed: http://github.com/openstack/nova/commit/75fa03557fd6f1e7c62079e9e89556f1af139202
Submitter: Jenkins
Branch: master

commit 75fa03557fd6f1e7c62079e9e89556f1af139202
Author: Russell Bryant <email address hidden>
Date: Fri Sep 21 13:51:15 2012 -0400

Set volume status to error if scheduling fails.

Fix bug 1053931.

    When scheduling volume creation fails, the volume was left with a status
    of 'creating'. This patch changes the scheduler manager to set the
    status to 'error' if scheduling fails. This matches the behavior of the
    cinder scheduler manager in this case.

This particular issue was addressed in Cinder as a part of commit
f758bde47439be52a743b2b4181d4900f2c1bc8a.

Change-Id: Ieb453ab05b3b84de53f72323c536a9567555df1e

Changed in nova:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-09-21: Fix merged to nova (milestone-proposed)

Reviewed: https://review.openstack.org/13496
Committed: http://github.com/openstack/nova/commit/fb1f3502da1839d7bb700af16cc3a107da5d8624
Submitter: Jenkins
Branch: milestone-proposed

commit fb1f3502da1839d7bb700af16cc3a107da5d8624
Author: Russell Bryant <email address hidden>
Date: Fri Sep 21 13:51:15 2012 -0400

Set volume status to error if scheduling fails.

Fix bug 1053931.

This particular issue was addressed in Cinder as a part of commit
f758bde47439be52a743b2b4181d4900f2c1bc8a.

Change-Id: Ieb453ab05b3b84de53f72323c536a9567555df1e
(cherry picked from commit 75fa03557fd6f1e7c62079e9e89556f1af139202)

Changed in nova:
status:	Fix Committed → Fix Released

Revision history for this message

Dmitry Galkin (galkindmitrii) wrote on 2012-09-23:

I faced it with a stable/essex nova. (without any milestones, however I was unable to find such a bug in milestones 2012.1.1 or 2012.1.2).

Thanks a lot for the fix!

Thierry Carrez (ttx) on 2012-09-27

Changed in nova:
milestone:	folsom-rc2 → 2012.2

Revision history for this message

Dafna Ron (dron-3) wrote on 2013-10-07:

Download full text (5.5 KiB)

This bug is happening again in Havana release.
we had a power outage and I sent a create command while the storage was unavailable and also had some other volume related commands running.

This bug is happening again in Havana release. 
we had a power outage and I sent a create command while the storage was unavailable and also had some other volume related commands running.

2013-10-07 18:41:06.136 8288 WARNING cinder.scheduler.host_manager [req-462656bc-4bb2-478a-8fa4-90ac89e1c39e c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] volume service is down or disabled. (host: cougar06.scl.lab.tlv.redhat.com)
2013-10-07 18:41:06.152 8288 ERROR cinder.volume.flows.create_volume [req-462656bc-4bb2-478a-8fa4-90ac89e1c39e c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] Failed to schedule_create_volume: No valid host was found. 
2013-10-07 18:44:31.280 8288 WARNING cinder.scheduler.host_manager [req-65c2f4e1-71da-4340-b9f0-afdd05ccdaa9 c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] volume service is down or disabled. (host: cougar06.scl.lab.tlv.redhat.com)
2013-10-07 18:44:31.281 8288 ERROR cinder.volume.flows.create_volume [req-65c2f4e1-71da-4340-b9f0-afdd05ccdaa9 c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] Failed to schedule_create_volume: No valid host was found. 
2013-10-07 18:44:50.730 8288 WARNING cinder.scheduler.host_manager [req-1c132eb5-ca74-4ab5-91dc-73c25b305165 c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] volume service is down or disabled. (host: cougar06.scl.lab.tlv.redhat.com)
2013-10-07 18:44:50.731 8288 ERROR cinder.volume.flows.create_volume [req-1c132eb5-ca74-4ab5-91dc-73c25b305165 c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] Failed to schedule_create_volume: No valid host was found. 
2013-10-07 18:47:01.577 8288 WARNING cinder.scheduler.host_manager [req-538ad552-0e19-4307-bea8-10e0a35a8a36 c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] volume service is down or disabled. (host: cougar06.scl.lab.tlv.redhat.com)
2013-10-07 18:47:01.578 8288 ERROR cinder.volume.flows.create_volume [req-538ad552-0e19-4307-bea8-10e0a35a8a36 c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] Failed to schedule_create_volume: No valid host was found. 
2013-10-07 18:47:18.421 8288 WARNING cinder.scheduler.host_manager [req-3a788eb8-56f5-45f6-b4fd-ade01a05cf9d c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] volume service is down or disabled. (host: cougar06.scl.lab.tlv.redhat.com)
2013-10-07 18:47:18.422 8288 ERROR cinder.volume.flows.create_volume [req-3a788eb8-56f5-45f6-b4fd-ade01a05cf9d c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] Failed to schedule_create_volume: No valid host was found. 
2013-10-07 18:48:27.732 8288 WARNING cinder.scheduler.host_manager [req-1ef1b47a-27b8-4667-9823-2f91dcc0f29e c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] volume service is down or disabled. (host: cougar06.scl.lab.tlv.redhat.com)
2013-10-07 18:48:27.733 8288 ERROR cinder.volume.flows.create_volume [req-1ef1b47a-27b8-4667-9823-2f91dcc0f29e c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] Failed to schedule_create_volume: No valid host was found. 
2013-10-07 18:48:51.125 8288 WARNING cinder.scheduler.host_manager [req-7a45f5ed-c6b2-4b9a-9e5f-c90d60b1bba8 c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] volume service is down or disabled. (host: cougar06.scl.lab.tlv.redhat.com)
2013-10-07 18:48:51.126 8288 ERROR cinder.volume.flows.create_volume [req-7a45f5ed-c6b2-4b9a-9e5f-c90d60b1bba8 c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] Failed to schedule_create_volume: No valid host was found. 
2013-10-07 18:49:54.705 8288 WARNING cinder.scheduler.host_manager [req-5fadfa9b-6d82-4ea4-ac36-15de15aa236a c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] volume service is down or disabled. (host: cougar06.scl.lab.tlv.redhat.com)
2013-10-07 18:49:54.706 8288 ERROR cinder.volume.flows.create_volume [req-5fadfa9b-6d82-4ea4-ac36-15de15aa236a c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] Failed to schedule_create_volume: No valid host was found. 
2013-10-07 19:31:10.716 8288 CRITICAL cinder [-] need more than 0 values to unpack
2013-10-07 19:37:27.334 2542 WARNING cinder.scheduler.host_manager [req-53603c25-424e-4c05-9eee-de5ae15fb300 c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] volume service is down or disabled. (host: cougar06.scl.lab.tlv.redhat.com)
2013-10-07 19:37:27.350 2542 ERROR cinder.volume.flows.create_volume [req-53603c25-424e-4c05-9eee-de5ae15fb300 c02995f25ba44cfab1a3cbd419f045a1 c77235c29fd0431a8e6628ef6d18e07f] Failed to schedule_create_volume: No valid host was found. 
2013-10-07 20:02:18.403 2542 CRITICAL cinder [-] need more than 0 values to unpack

Changed in cinder:
status:	Invalid → New

Revision history for this message

John Griffith (john-griffith) wrote on 2014-01-05:

#10

Dafna,
Could you confirm that the volume was stuck in "creating" or did it go to "error". The bug was actually that the scheduler failed but never marked the volume as such and left it sitting in creating state. Let me know if this is what you're still seeing, I believe this is still fixed based on the code and some quick tests.

Revision history for this message

gabriel staicu (gabriel-staicu) wrote on 2014-01-28:

#11

Hi,

Hi,

Revision history for this message

Huang Zhiteng (zhiteng-huang) wrote on 2014-01-28:

#12

Gabriel, creating a volume from a snapshot doesn't go through cinder-scheduler at all. In your case, if the back-end serving the snapshot is down, and the RPC message that Cinder API service sent to Cinder Volume is left in message queue. That's why volume status stuck in 'creating'. It's different.

Revision history for this message

Huang Zhiteng (zhiteng-huang) wrote on 2014-01-28:

#13

Dafna,

I think this bug has been fixed.

Let me explain a little bit more on the workflow of creating a volume: 1) User sends request to Cinder API service; 2) API creates a DB entry for the volume and marks its status to 'creating' (https://github.com/openstack/cinder/blob/stable/havana/cinder/volume/flows/create_volume/__init__.py#L545) and sends a RPC message to scheduler; 3) scheduler picks up the message and makes placement decision and if a back-end is available, it sends the request via RPC to volume service; 4) volume service picks up the message to perform the real job creating a volume for user.

There are multiple cases in which a volume's status can be stuck in 'creating':

a) something wrong happened during RPC message being processed by scheduler (e.g. scheduler service is down - related to this change & bug: https://review.openstack.org/#/c/64014/, message is lost, scheduler service goes down while scheduler processing the message);

b) something went wrong AFTER backend is chosen, which means scheduler successfully sends out the message to target back-end, but somehow the message isn't picked up by target volume service or there is unhandled exception while volume service handling the request.

If somehow this bug happened again, can you describe the steps how to reproduce it?

Changed in cinder:
status:	New → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.