OpenStack Compute (nova)

"Failure prepping block device" error when creating volumes

Bug #1401288 reported by Jakob Gillich on 2014-12-10

This bug affects 15 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Expired	Undecided	Unassigned

Bug Description

I get a "Failure prepping block device" error whenever I create an instance with a volume or just a volume. It has been happening a few times after setup, then I was able to create an instance once, now it again occurs every time. I tried at least 10 times with various configurations (empty volume, from image), also tried rebooting in between.

I run RDO Juno, CentOS 7, XFS + MD-RAID1. It is a fresh installation, I didn't really change much other than adding a few images.

SSH access can be provided for debugging purposes.

Relevant nova-compute log: https://gist.githubusercontent.com/jgillich/fac1667fac71f5f459a6/raw/2794e3434f39d0333e70145a30a5c18ebbe34db5/gistfile1.txt

See original description

Tags:

Jakob Gillich (jgillich) on 2014-12-10

description:

updated

melanie witt (melwitt) on 2014-12-15

tags:

added: volumes

Revision history for this message

Jakob Gillich (jgillich) wrote on 2014-12-16:

I've tried another operating system (Fedora 21), another file system (ext4), and a completely different machine; still get this error. The only thing that these had in common are the hard disks, two WD Red. No (other) issues with the disks so far, but I need to make a few more tests to be sure.

Fabricio Costi (fabricio-9) on 2015-01-18

no longer affects:

ubuntu

Revision history for this message

Sam Stoelinga (sammiestoel) wrote on 2015-02-02:

I've seen the same issue in Juno. What did work for me was the following:

1. Create a volume manually using cinder and specificy an image to be copied into the volume. Let's name this newly created volume ubuntu14_04_volume
2. Create a snapshot of the volume ubuntu_14_04_volume
3. Launch a new instance specifying instance boot source: Volume snapshot creates new volume

This way I was able to still achieve the same, similar affect.

Possible issue: Maybe a timeout happened while downloading the image to RBD and nova doesn't wait long enough for larger images? Or when Ceph is on a slow network.

Revision history for this message

Sam Stoelinga (sammiestoel) wrote on 2015-02-02:

Update: I'm positive that the volume which was created by Boot from image(Creates new volume) is still in downloading state, while nova thinks there was a timeout, the image was infact still being downloaded. Nova shouldn't have a hard timeout if the image is still being downloaded. The current default hard timeout is too low. Any place to configure this?

Changed in nova:
assignee:	nobody → Sam Stoelinga (sammiestoel)

Davanum Srinivas (DIMS) (dims-v) on 2015-02-11

Changed in nova:
status:	New → Confirmed
importance:	Undecided → Low

Revision history for this message

Bernd (bernd-vogt) wrote on 2015-02-11:

Same problem on our installation (Juno, Centos7, ext4).

Revision history for this message

Vallard Benincosa (vallard) wrote on 2015-02-11:

Same problem on installation (Juno, Ubuntu 14.04)

Sam Stoelinga (sammiestoel) on 2015-03-23

Changed in nova:
assignee:	Sam Stoelinga (sammiestoel) → nobody

Revision history for this message

Joshua Holmes (ethode) wrote on 2015-04-03:

Confirming same problem in Ubuntu 14.04 and Juno combination

Revision history for this message

Jervin R (revin) wrote on 2015-04-09:

I get the same error, however on cinder volume logs I saw:

ImageUnacceptable: Image 2963117e-3616-4b8a-b97f-97b30963b670 is unacceptable: Size is 8GB and doesn't fit in a volume of size 5GB.

It seems that the qcow2 image I was using, uncompressed was using 8G, if I am not mistaken. Increasing the device volume from 5G to 10G got me past this error.

Revision history for this message

Doug (darbymorrison) wrote on 2015-06-10:

I am seeing the same error on Ubuntu 14-04 Juno and ceph Friefly.

My understanding is that since I'm using ceph for nova, glance and cinder that I shouldn't have any timeout issues.

However I ran into this issue while diagnosing another issue in our environment which is that spinning up instances from images takes on the order of minutes and is dependent on the size of the image.

This should not be happening since the instance ephemeral volumes should be cloned within ceph which takes less than a second.

Creating a volume from an image and then booting from the instance does the same thing.

I am also seeing the same behavior as Sam Stoelinga (sammiestoel) reported above where if you make a volume snapshot of the volume (once it finished downloading) and boot from that then it does make a clone in ceph and it does boot rapidly.
Unfortunately, that scenario isn't feasible in our environment since volumes and volume snapshots are not shareable between tenants.

In summary:

Booting an instance from an image .
Booting an instance from a volume created from an image via the --block-device source=image option to nova boot

Both make a full copy not a COW clone of the image.

Booting an instance from a volume snapshot does operate correctly.

Revision history for this message

Doug (darbymorrison) wrote on 2015-06-10:

There are 2 nova.conf options which can almost be used to fix this issue:

1) [xenserver]
block_device_creation_timeout = 10 (IntOpt) Time to wait for a block device to be created

Should work if you're using Xen. This is also true of the option to disable nova image caching.

2) [DEFAULT]
instance_build_timeout = 0 (IntOpt) Amount of time in seconds an instance can be in BUILD before going into ERROR status.Set to 0 to disable.

Almost what i was looking for, but the instance build failed while in "Mapping Block Device" since it takes too long for the image to load onto a volume.

Revision history for this message

Jason (jwitko1) wrote on 2015-08-01:

#10

I am having this issue currently using Netapp as backend on Ubuntu 14.04 LTS, version Kilo. Have yet to find any answers but will update here if I do.

Revision history for this message

Prashant Zanwar (pzanwar) wrote on 2015-08-18:

#11

I am using redhat 7, and I am also seeing issue, same tests I conducted..

Revision history for this message

Prashant Zanwar (pzanwar) wrote on 2015-08-18:

#12

And I am using Juno distribution

Revision history for this message

Prashant Zanwar (pzanwar) wrote on 2015-08-18:

#13

My observation to this is, and I am using netapp cdot backend. Whenever cache image file doesn't exist for source image, netapp will try prepare that first and then add volume. If I change size of destination volume, cache image file will be redone, and it takes longer than nova timeout for accessing boot volume, and it errors out. Image cache file can be created manually, copying image file from Glace directory to volume directory, and name it like netapp expects cache-img-*

Revision history for this message

HT_Sergio (sergio-martins) wrote on 2015-09-28:

#14

I'm seeing this issue using Kilo on Ubuntu 14.04. For me the cause of this error was slightly different. I'm posting in case anyone else comes here with the same cause:
http://paste.openstack.org/show/474506/

I was creating an instance by passing an image ID and telling nova to create a volume from it. Before nova finished setting up the instance I was invalidating the user's keystone token. When nova made the API call to Cinder (using the user's token) for creating the volume. The token was no longer valid, causing my stack trace. Hopefully you folks were not making the same silly mistake I was :p

Revision history for this message

Jeffrey Guan (double12gzh) wrote on 2016-06-29:

#15

We can also set the value of "hw_disk_discard" to unmap in nova.conf.

Details can be found at:
http://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/libvirt-disk-discard-option.html

Revision history for this message

Markus Zoeller (markus_z) (mzoeller) wrote on 2016-07-05: Cleanup EOL bug report

#16

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
importance:	Low → Undecided
status:	Confirmed → Expired

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.