OpenStack Nova Compute Charm

Expose block_device_allocate_retries as a dedicated config option

Bug #1758607 reported by Nobuto Murata on 2018-03-25

This bug affects 7 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Charm Guide	Fix Released	Undecided	Nobuto Murata
	OpenStack Nova Compute Charm	Fix Released	Wishlist	Nobuto Murata	OpenStack Nova Compute Charm 22.04

Bug Description

It might be a good idea to expose "block_device_allocate_retries" as a dedicated config option in charm although it can be tweaked by "config-flags" option easily.

"block_device_allocate_retries" is an option I change in every engagement Windows guests are involved. The default block device attachment time out is 3 minutes (block_device_allocate_retries_interval=3 * block_device_allocate_retries=60 = 180 seconds). It may not be enough to download 10+ GB Windows image and convert it from QCOW2 to RAW (if the image was uploaded as QCOW2 originally and needs to be converted to RAW for Ceph backend).

I usually bump "block_device_allocate_retries" to 300 to set the timeout as 15 min to be safe (block_device_allocate_retries_interval=3 * block_device_allocate_retries=300 = 900 seconds).

block_device_allocate_retries_interval = 3
> (Integer) Interval (in seconds) between block device allocation retries on failures.
>
> This option allows the user to specify the time interval between consecutive retries. ‘block_device_allocate_retries’ option specifies the maximum number of retries.

block_device_allocate_retries = 60
> (Integer) Number of times to retry block device allocation on failures. Starting with Liberty, Cinder can use image volume cache. This may help with block device allocation performance. Look at the cinder image_volume_cache_enabled configuration option.

Tags:

James Page (james-page) on 2018-11-23

Changed in charm-nova-compute:
status:	New → Triaged
importance:	Undecided → Wishlist

Trent Lloyd (lathiat) on 2018-12-06

tags:

added: sts

Revision history for this message

Trent Lloyd (lathiat) wrote on 2018-12-06:

I agree with changing this and I think we should set the default to 10 or 15 minutes.

Many environments use qcow2 but with ceph volumes which requires the cinder node to download the volume, then convert it, then upload it. Usually using a HDD on the root disk. This both thrashes the HDD on these nodes affecting performance. But also if you try to create more than 2-6 VMs at once it will almost certainly timeout with the default (120) or even 240.

Revision history for this message

Trent Lloyd (lathiat) wrote on 2018-12-11:

Recently hit in 2 production environments when deploying Windows images.. Even 240 is not enough.

Revision history for this message

Jamon Camisso (jamon) wrote on 2019-02-12:

Likewise, supporting an arbitrarily large interval/retry combination (within reason) would be helpful. A customer ran into this timeout after 186s in a cloud with an 80GB Windows image.

Revision history for this message

Mark Maglana (mmaglana) wrote on 2019-06-28:

Does the charm have to expose block_device_allocate_retries directly or can we make it friendlier such that the charm user can indicate timeout in seconds rather than have to compute it in their head via block_device_allocate_retries and block_device_allocate_retries_interval?

Revision history for this message

Nobuto Murata (nobuto) wrote on 2019-06-28:

As long as the config description has mentions to block_device_allocate_retries and block_device_allocate_retries_interval to give an idea of upstream config values, friendlier config name would be nice to have.

Mark Maglana (mmaglana) on 2019-07-01

Changed in charm-nova-compute:
assignee:	nobody → Mark Maglana (mmaglana)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-07-03: Fix proposed to charm-nova-compute (master)

Fix proposed to branch: master
Review: https://review.opendev.org/668933

Changed in charm-nova-compute:
status:	Triaged → In Progress

Revision history for this message

Mark Maglana (mmaglana) wrote on 2019-07-04:

The preceding proposed fix now exposes block_device_allocate_retries_interval and block_device_allocate_retries as configuration options for the charm.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-07-04:

Fix proposed to branch: master
Review: https://review.opendev.org/669060

Revision history for this message

Mark Maglana (mmaglana) wrote on 2019-07-04:

For clarity:

* https://review.opendev.org/#/c/668933/ - Addresses the bug directly by exposing two OpenStack options

* https://review.opendev.org/#/c/669060/ - Adds a friendlier alternative to the two options in the above change.

Ryan Beisner (1chb1n) on 2019-07-18

Changed in charm-nova-compute:
milestone:	none → 19.10

Revision history for this message

Trent Lloyd (lathiat) wrote on 2019-08-21:

#10

I would love to see a fix for this merged, including an increased default as it's a common problem to need to increase this. Not only for large images, but also when deploying multiple VMs at once as multiple images copying at once are often converted through the root disk of the cinder node, and can bottleneck on slower root disks.

In the spirit of charms opinionated and simplified configuration, I think we should not merge 668933 and should prefer the simpler 669060 patch set - since users can already use config-flags to set the individual values if they really need to.

David Ames (thedac) on 2019-10-24

Changed in charm-nova-compute:
milestone:	19.10 → 20.01

James Page (james-page) on 2020-03-02

Changed in charm-nova-compute:
milestone:	20.01 → 20.05

David Ames (thedac) on 2020-05-21

Changed in charm-nova-compute:
milestone:	20.05 → 20.08

Revision history for this message

David Coronel (davecore) wrote on 2020-06-05:

#11

I just hit this problem in a lab environment with every openstack component virtualised in VMs on one baremetal machine. It's a low performance environment. I put block_device_allocate_retries = 300 and block_device_allocate_retries_interval = 3 and it worked for me.

James Page (james-page) on 2020-08-03

Changed in charm-nova-compute:
milestone:	20.08 → none

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-01-04: Change abandoned on charm-nova-compute (master)

#12

Change abandoned by "James Page <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/669060
Reason: This review is > 12 weeks without comment, and failed testing the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-01-04:

#13

Change abandoned by "James Page <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/668933
Reason: This review is > 12 weeks without comment, and failed testing the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Nobuto Murata (nobuto) on 2022-02-08

Changed in charm-nova-compute:
assignee:	Mark Maglana (mmaglana) → nobody

Nobuto Murata (nobuto) on 2022-02-08

Changed in charm-nova-compute:
assignee:	nobody → Nobuto Murata (nobuto)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-02-08: Fix proposed to charm-nova-compute (master)

#14

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/828253

Revision history for this message

Nobuto Murata (nobuto) wrote on 2022-02-08:

#15

Subscribing ~field-medium for tracking.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-02-09: Fix merged to charm-nova-compute (master)

#16

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/828253
Committed: https://opendev.org/openstack/charm-nova-compute/commit/2283f12eddc45ab97701ece264391e545f6bda1c
Submitter: "Zuul (22348)"
Branch: master

commit 2283f12eddc45ab97701ece264391e545f6bda1c
Author: Nobuto Murata <email address hidden>
Date: Tue Feb 8 18:02:12 2022 +0900

Expose block-device-allocate-retries and interval

    The upstream has 3 min as the timeout (60 retries at 3-seconds
    interval). It should work if an image is in a raw format to leverage
    Ceph's copy-on-write or an image is small enough to be copied quickly.
    However, there are some cases exeeding the 3 min deadline such as a big
    enough image with Qcow2 or other formats like Windows images, or storage
    backend doesn't have copy-on-write from Glance.

    Let's bump the deadline to 15 min (300 retries at 3-seconds interval) to
    cover most of the cases out of the box, and let operators tune it
    further by exposing those options.

    Co-authored-by: Mark Maglana <email address hidden>
    Closes-Bug: 1758607
    Change-Id: I6f6da8e90c6bbcd031ee183ae86d88eccd392230

Changed in charm-nova-compute:
status:	In Progress → Fix Committed

Nobuto Murata (nobuto) on 2022-02-14

Changed in charm-guide:
assignee:	nobody → Nobuto Murata (nobuto)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-04-04: Fix proposed to charm-nova-compute (stable/xena)

#17

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/836163

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-04-07: Fix merged to charm-nova-compute (stable/xena)

#18

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/836163
Committed: https://opendev.org/openstack/charm-nova-compute/commit/c52d87f071874dfa55e84ea96c2d5d29df29fad9
Submitter: "Zuul (22348)"
Branch: stable/xena

commit c52d87f071874dfa55e84ea96c2d5d29df29fad9
Author: Nobuto Murata <email address hidden>
Date: Tue Feb 8 18:02:12 2022 +0900

Expose block-device-allocate-retries and interval

    Let's bump the deadline to 15 min (300 retries at 3-seconds interval) to
    cover most of the cases out of the box, and let operators tune it
    further by exposing those options.

    Co-authored-by: Mark Maglana <email address hidden>
    Closes-Bug: 1758607
    Change-Id: I6f6da8e90c6bbcd031ee183ae86d88eccd392230
    (cherry picked from commit 2283f12eddc45ab97701ece264391e545f6bda1c)

tags:

added: in-stable-xena

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-04-13: Fix proposed to charm-guide (master)

#19

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/charm-guide/+/837651

Changed in charm-guide:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-04-14: Fix merged to charm-guide (master)

#20

Reviewed: https://review.opendev.org/c/openstack/charm-guide/+/837651
Committed: https://opendev.org/openstack/charm-guide/commit/fa51adbbd5d1c91c342b75d1ce7df2398ff50729
Submitter: "Zuul (22348)"
Branch: master

commit fa51adbbd5d1c91c342b75d1ce7df2398ff50729
Author: Nobuto Murata <email address hidden>
Date: Wed Apr 13 11:46:11 2022 +0900

release-notes: block-device-allocate timeout

Closes-Bug: #1758607
Change-Id: I056d79682213a39bcaa44b847cb78b84fbaf95de

Changed in charm-guide:
status:	In Progress → Fix Released

Alex Kavanagh (ajkavanagh) on 2022-04-14

Changed in charm-nova-compute:
milestone:	none → 22.04

Alex Kavanagh (ajkavanagh) on 2022-05-10

Changed in charm-nova-compute:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.