Boot from volume, block device allocate timeout cause VM error, but volume would be available later

Bug #1447884 reported by Lan Qi song
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Lan Qi song
Kilo
Fix Released
Undecided
Lan Qi song

Bug Description

When we try to boot multi instances from volume (with a large image source) at the same time, we usually got a block device allocate error as the logs in nova-compute.log:

2015-03-30 23:22:46.920 6445 WARNING nova.compute.manager [-] Volume id: 551ea616-e1c4-4ef2-9bf3-b0ca6d4474dc finished being created but was not set as 'available'
2015-03-30 23:22:47.131 6445 ERROR nova.compute.manager [-] [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] Instance failed block device setup
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] Traceback (most recent call last):
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1829, in _prep_block_device
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] do_check_attach=do_check_attach) +
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] File "/usr/lib/python2.6/site-packages/nova/virt/block_device.py", line 407, in attach_block_devices
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] map(_log_and_attach, block_device_mapping)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] File "/usr/lib/python2.6/site-packages/nova/virt/block_device.py", line 405, in _log_and_attach
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] bdm.attach(*attach_args, **attach_kwargs)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] File "/usr/lib/python2.6/site-packages/nova/virt/block_device.py", line 339, in attach
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] do_check_attach=do_check_attach)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] File "/usr/lib/python2.6/site-packages/nova/virt/block_device.py", line 46, in wrapped
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] ret_val = method(obj, context, *args, **kwargs)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] File "/usr/lib/python2.6/site-packages/nova/virt/block_device.py", line 229, in attach
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] volume_api.check_attach(context, volume, instance=instance)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] File "/usr/lib/python2.6/site-packages/nova/volume/cinder.py", line 305, in check_attach
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] raise exception.InvalidVolume(reason=msg)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] InvalidVolume: Invalid volume: status must be 'available'
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]

This error cause the VM in "error" status:
+--------------------------------------+------------+--------+----------------------+-------------+----------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------------+--------+----------------------+-------------+----------------------+
| 1fa2d7aa-8bd9-4a22-8538-0a07d9dae8aa | inst02 | ERROR | block_device_mapping | NOSTATE | |
+--------------------------------------+------------+--------+----------------------+-------------+----------------------+
But the volume was in "available" status:
-----------------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+------+------+-------------+----------+--------------------------------------+
| a9ab2dc2-b117-44ef-8678-f71067a9e770 | available | None | 2 | None | true | |
+--------------------------------------+-----------+------+------+-------------+----------+--------------------------------------+
+--------------------------------------+------------+--------+----------------------+-------------+-------

And when we teminiate this VM, the volume will still exist, since there is no volume attachment info stored in this VM.

This can be easily reproduced:
1. add the following options to nova.conf in compute node ( make sure the error will be happened even with small image such as cirros):
block_device_allocate_retries = 1
block_device_allocate_retries_interval = 1

2. restart the nova-compute service and try boot one instance from volume:

nova boot --flavor 1 --block-device source=image,id=929dfa0b-0b9b-4819-9f84-e4ed22e81f33,dest=volume,size=2,shutdown=remove,bootindex=0 --nic net-id=a302dfa6-0c5d-46ab-9295-8f8b9119fe83 --availability-zone nova:compute01-juno.ibm.com inst02

Tags: volumes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/177084

Changed in nova:
assignee: nobody → Lan Qi song (lqslan)
status: New → In Progress
Matt Riedemann (mriedem)
tags: added: volumes
Changed in nova:
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/177084
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8772653ddec24ae68a0bcd9440e9e6ec81ea301b
Submitter: Jenkins
Branch: master

commit 8772653ddec24ae68a0bcd9440e9e6ec81ea301b
Author: Lan Qi song <email address hidden>
Date: Fri Apr 24 11:49:45 2015 +0800

    Remove useless volume when boot from volume failed

    Currently, when boot instance from block device, user can choose three
    kinds of sources: (blank, snapshot or image), and compute manager will
    create a volume from one of them. But the volume creation may fail or
    timeout at this time, then the process of instance boot will fail.

    During cleanup, the failed volume cannot be deleted, and becomes unusable.

    This patch removes the corresponding volume when block device allocation
    fails or times out.

    Closes-Bug: #1447884

    Change-Id: Ib3c915b5b5920c8289d5fc69cde4a3c485ed8600

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/189702

Matt Riedemann (mriedem)
Changed in nova:
importance: Low → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/kilo)

Reviewed: https://review.openstack.org/189702
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=540221fc91fb5b5c0ac8a79cdc5d74a9ec8b492e
Submitter: Jenkins
Branch: stable/kilo

commit 540221fc91fb5b5c0ac8a79cdc5d74a9ec8b492e
Author: Lan Qi song <email address hidden>
Date: Fri Apr 24 11:49:45 2015 +0800

    Remove useless volume when boot from volume failed

    Currently, when boot instance from block device, user can choose three
    kinds of sources: (blank, snapshot or image), and compute manager will
    create a volume from one of them. But the volume creation may fail or
    timeout at this time, then the process of instance boot will fail.

    During cleanup, the failed volume cannot be deleted, and becomes unusable.

    This patch removes the corresponding volume when block device allocation
    fails or times out.

    Closes-Bug: #1447884

    Change-Id: Ib3c915b5b5920c8289d5fc69cde4a3c485ed8600
    (cherry picked from commit 8772653ddec24ae68a0bcd9440e9e6ec81ea301b)

Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-1 → 12.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.