block device mapping timeout in compute

Bug #1332382 reported by Jacob Cherkas
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Akash Gangil
Icehouse
Fix Released
Undecided
Unassigned

Bug Description

When booting instances passing in block-device and increasing the volume size , instances can go in to error state if the volume takes longer to create than the hard code value set in:

nova/compute/manager.py

  def _await_block_device_map_created(self, context, vol_id, max_tries=180,
                                        wait_between=1):

Here is the command used to repro:

nova boot --flavor ca8d889e-6a4e-48f8-81ce-0fa2d153db16 --image 438b3f1f-1b23-4b8d-84e1-786ffc73a298
--block-device source=image,id=438b3f1f-1b23-4b8d-84e1-786ffc73a298,dest=volume,size=128
--nic net-id=5f847661-edef-4dff-9f4b-904d1b3ac422 --security-groups d9ce9fe3-983f-42a8-899e-609c01977e32
Test_Image_Instance

max_retries should be made configurable.

Looking through the different releases, Grizzly was 30, Havana was 60 , IceHouse is 180.

Here is a traceback:

2014-06-19 06:54:24.303 17578 ERROR nova.compute.manager [req-050fc984-cfa2-4c34-9cde-c8aeea65e6ed
d0b8f2c3cf70445baae994004e602e11 1e83429a8157489fb7ce087bd037f5d9] [instance:
74f612ea-9722-4796-956f-32defd417000] Instance failed block device setup
2014-06-19 06:54:24.303 17578 TRACE nova.compute.manager [instance: 74f612ea-9722-4796-956f-32defd417000]
Traceback (most recent call last):
2014-06-19 06:54:24.303 17578 TRACE nova.compute.manager [instance: 74f612ea-9722-4796-956f-32defd417000]
File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1394,
in _prep_block_device
2014-06-19 06:54:24.303 17578 TRACE nova.compute.manager [instance: 74f612ea-9722-4796-956f-32defd417000]
self._await_block_device_map_created))
2014-06-19 06:54:24.303 17578 TRACE nova.compute.manager [instance: 74f612ea-9722-4796-956f-32defd417000]
File "/usr/lib/python2.7/dist-packages/nova/virt/block_device.py", line 283,
in attach_block_devices
2014-06-19 06:54:24.303 17578 TRACE nova.compute.manager [instance: 74f612ea-9722-4796-956f-32defd417000]
block_device_mapping)
2014-06-19 06:54:24.303 17578 TRACE nova.compute.manager [instance: 74f612ea-9722-4796-956f-32defd417000]
File "/usr/lib/python2.7/dist-packages/nova/virt/block_device.py", line 238,
in attach
2014-06-19 06:54:24.303 17578 TRACE nova.compute.manager [instance: 74f612ea-9722-4796-956f-32defd417000]
wait_func(context, vol['id'])
2014-06-19 06:54:24.303 17578 TRACE nova.compute.manager [instance: 74f612ea-9722-4796-956f-32defd417000]
File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 909,
in _await_block_device_map_created
2014-06-19 06:54:24.303 17578 TRACE nova.compute.manager [instance: 74f612ea-9722-4796-956f-32defd417000]
attempts=attempts)
2014-06-19 06:54:24.303 17578 TRACE nova.compute.manager [instance: 74f612ea-9722-4796-956f-32defd417000]
VolumeNotCreated: Volume 8489549e-d23e-45c2-ae6e-7fdb1a9c30d0 did not finish
being created even after we waited 65 seconds or 60 attempts.
2014-06-19 06:54:24.303 17578 TRACE nova.compute.manager [instance: 74f612ea-9722-4796-956f-32defd417000]

Revision history for this message
Aaron Rosen (arosen) wrote :

Thanks for the bug report. I think we should make this timeout configurable so it can be set to a higher value than 60 seconds if it sometimes takes more time then that.

Changed in nova:
assignee: nobody → Aaron Rosen (arosen)
status: New → Confirmed
Matt Riedemann (mriedem)
tags: added: volumes
Changed in nova:
assignee: Aaron Rosen (arosen) → akash (akashg1611)
Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/102952

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by akash (<email address hidden>) on branch: master
Review: https://review.openstack.org/102952
Reason: Duplicate

Revision history for this message
Akash Gangil (akashg1611) wrote :

Gerrit Review Link: https://review.openstack.org/#/c/102891/

Awaiting +2

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/102891
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=66721eb2c0f53fc4260b2f0aa9a3811da0f7ddbd
Submitter: Jenkins
Branch: master

commit 66721eb2c0f53fc4260b2f0aa9a3811da0f7ddbd
Author: Akash Gangil <email address hidden>
Date: Thu Jul 10 15:10:33 2014 -0700

    Make the block device mapping retries configurable

    When booting instances passing in block-device and increasing the
    volume size, instances can go in to error state if the volume takes
    longer to create than the hard code value (max_tries(180)/wait_between(1))
    set in nova/compute/manager.py

    def _await_block_device_map_created(self,
                                        context,
                                        vol_id,
                                        max_tries=180,
                                        wait_between=1):

    To fix this, max_retries/wait_between should be made configurable.
    Looking through the different releases, Grizzly was 30, Havana was
    60 , IceHouse is 180.

    This change adds two configuration options:
    a) `block_device_allocate_retries` which can be set in nova.conf
    by the user to configure the number of block device mapping retries.
    It defaults to 60 and replaces the max_tries argument in the above method.
    b) `block_device_allocate_retries_interval` which allows the user
    to specify the time interval between consecutive retries. It defaults to 3
    and replaces wait_between argument in the above method.

    DocImpact
    Closes-Bug: #1332382
    Change-Id: I16e4cd1a572bc5c2cd91fc94be85e72f576a8c26

Changed in nova:
status: In Progress → Fix Committed
Changed in nova:
milestone: none → juno-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-2 → 2014.2
tags: added: icehouse-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/129276

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/icehouse)

Reviewed: https://review.openstack.org/129276
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5ab042143db18bd597b2c141db6ddf6dd0575f2f
Submitter: Jenkins
Branch: stable/icehouse

commit 5ab042143db18bd597b2c141db6ddf6dd0575f2f
Author: Akash Gangil <email address hidden>
Date: Thu Jul 10 15:10:33 2014 -0700

    Make the block device mapping retries configurable

    When booting instances passing in block-device and increasing the
    volume size, instances can go in to error state if the volume takes
    longer to create than the hard code value (max_tries(180)/wait_between(1))
    set in nova/compute/manager.py

    def _await_block_device_map_created(self,
                                        context,
                                        vol_id,
                                        max_tries=180,
                                        wait_between=1):

    To fix this, max_retries/wait_between should be made configurable.
    Looking through the different releases, Grizzly was 30, Havana was
    60 , IceHouse is 180.

    This change adds two configuration options:
    a) `block_device_allocate_retries` which can be set in nova.conf
    by the user to configure the number of block device mapping retries.
    It defaults to 180 and replaces the max_tries argument in the above method.
    b) `block_device_allocate_retries_interval` which allows the user
    to specify the time interval between consecutive retries. It defaults to 1
    and replaces wait_between argument in the above method.

    The cherry-picked patch has been amended to set the default configuration
    options to the same values that were previously hard-coded:
      block_device_allocate_retries=180
      block_device_allocate_retries_interval=1

    DocImpact
    Closes-Bug: #1332382
    Change-Id: I16e4cd1a572bc5c2cd91fc94be85e72f576a8c26
    (cherry picked from commit 66721eb2c0f53fc4260b2f0aa9a3811da0f7ddbd)

tags: added: in-stable-icehouse
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.