"Failure prepping block device" error when creating volumes

Bug #1401288 reported by Jakob Gillich
70
This bug affects 15 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

I get a "Failure prepping block device" error whenever I create an instance with a volume or just a volume. It has been happening a few times after setup, then I was able to create an instance once, now it again occurs every time. I tried at least 10 times with various configurations (empty volume, from image), also tried rebooting in between.

I run RDO Juno, CentOS 7, XFS + MD-RAID1. It is a fresh installation, I didn't really change much other than adding a few images.

SSH access can be provided for debugging purposes.

Relevant nova-compute log: https://gist.githubusercontent.com/jgillich/fac1667fac71f5f459a6/raw/2794e3434f39d0333e70145a30a5c18ebbe34db5/gistfile1.txt

Tags: volumes
Jakob Gillich (jgillich)
description: updated
melanie witt (melwitt)
tags: added: volumes
Revision history for this message
Jakob Gillich (jgillich) wrote :

I've tried another operating system (Fedora 21), another file system (ext4), and a completely different machine; still get this error. The only thing that these had in common are the hard disks, two WD Red. No (other) issues with the disks so far, but I need to make a few more tests to be sure.

no longer affects: ubuntu
Revision history for this message
Sam Stoelinga (sammiestoel) wrote :

I've seen the same issue in Juno. What did work for me was the following:

1. Create a volume manually using cinder and specificy an image to be copied into the volume. Let's name this newly created volume ubuntu14_04_volume
2. Create a snapshot of the volume ubuntu_14_04_volume
3. Launch a new instance specifying instance boot source: Volume snapshot creates new volume

This way I was able to still achieve the same, similar affect.

Possible issue: Maybe a timeout happened while downloading the image to RBD and nova doesn't wait long enough for larger images? Or when Ceph is on a slow network.

Revision history for this message
Sam Stoelinga (sammiestoel) wrote :

Update: I'm positive that the volume which was created by Boot from image(Creates new volume) is still in downloading state, while nova thinks there was a timeout, the image was infact still being downloaded. Nova shouldn't have a hard timeout if the image is still being downloaded. The current default hard timeout is too low. Any place to configure this?

Changed in nova:
assignee: nobody → Sam Stoelinga (sammiestoel)
Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
Bernd (bernd-vogt) wrote :

Same problem on our installation (Juno, Centos7, ext4).

Revision history for this message
Vallard Benincosa (vallard) wrote :

Same problem on installation (Juno, Ubuntu 14.04)

Changed in nova:
assignee: Sam Stoelinga (sammiestoel) → nobody
Revision history for this message
Joshua Holmes (ethode) wrote :

Confirming same problem in Ubuntu 14.04 and Juno combination

Revision history for this message
Jervin R (revin) wrote :

I get the same error, however on cinder volume logs I saw:

ImageUnacceptable: Image 2963117e-3616-4b8a-b97f-97b30963b670 is unacceptable: Size is 8GB and doesn't fit in a volume of size 5GB.

It seems that the qcow2 image I was using, uncompressed was using 8G, if I am not mistaken. Increasing the device volume from 5G to 10G got me past this error.

Revision history for this message
Doug (darbymorrison) wrote :

I am seeing the same error on Ubuntu 14-04 Juno and ceph Friefly.

My understanding is that since I'm using ceph for nova, glance and cinder that I shouldn't have any timeout issues.

However I ran into this issue while diagnosing another issue in our environment which is that spinning up instances from images takes on the order of minutes and is dependent on the size of the image.

This should not be happening since the instance ephemeral volumes should be cloned within ceph which takes less than a second.

Creating a volume from an image and then booting from the instance does the same thing.

I am also seeing the same behavior as Sam Stoelinga (sammiestoel) reported above where if you make a volume snapshot of the volume (once it finished downloading) and boot from that then it does make a clone in ceph and it does boot rapidly.
Unfortunately, that scenario isn't feasible in our environment since volumes and volume snapshots are not shareable between tenants.

In summary:

Booting an instance from an image .
Booting an instance from a volume created from an image via the --block-device source=image option to nova boot

Both make a full copy not a COW clone of the image.

Booting an instance from a volume snapshot does operate correctly.

Revision history for this message
Doug (darbymorrison) wrote :

There are 2 nova.conf options which can almost be used to fix this issue:

1) [xenserver]
block_device_creation_timeout = 10 (IntOpt) Time to wait for a block device to be created

Should work if you're using Xen. This is also true of the option to disable nova image caching.

2) [DEFAULT]
instance_build_timeout = 0 (IntOpt) Amount of time in seconds an instance can be in BUILD before going into ERROR status.Set to 0 to disable.

Almost what i was looking for, but the instance build failed while in "Mapping Block Device" since it takes too long for the image to load onto a volume.

Revision history for this message
Jason (jwitko1) wrote :

I am having this issue currently using Netapp as backend on Ubuntu 14.04 LTS, version Kilo. Have yet to find any answers but will update here if I do.

Revision history for this message
Prashant Zanwar (pzanwar) wrote :

I am using redhat 7, and I am also seeing issue, same tests I conducted..

Revision history for this message
Prashant Zanwar (pzanwar) wrote :

And I am using Juno distribution

Revision history for this message
Prashant Zanwar (pzanwar) wrote :

My observation to this is, and I am using netapp cdot backend. Whenever cache image file doesn't exist for source image, netapp will try prepare that first and then add volume. If I change size of destination volume, cache image file will be redone, and it takes longer than nova timeout for accessing boot volume, and it errors out. Image cache file can be created manually, copying image file from Glace directory to volume directory, and name it like netapp expects cache-img-*

Revision history for this message
HT_Sergio (sergio-martins) wrote :

I'm seeing this issue using Kilo on Ubuntu 14.04. For me the cause of this error was slightly different. I'm posting in case anyone else comes here with the same cause:
http://paste.openstack.org/show/474506/

I was creating an instance by passing an image ID and telling nova to create a volume from it. Before nova finished setting up the instance I was invalidating the user's keystone token. When nova made the API call to Cinder (using the user's token) for creating the volume. The token was no longer valid, causing my stack trace. Hopefully you folks were not making the same silly mistake I was :p

Revision history for this message
Jeffrey Guan (double12gzh) wrote :

We can also set the value of "hw_disk_discard" to unmap in nova.conf.

Details can be found at:
http://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/libvirt-disk-discard-option.html

Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
  Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
importance: Low → Undecided
status: Confirmed → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.