[image based provisioning] incorrect "device is busy" handling for non-RAID devices

Bug #1410471 reported by Miroslav Anashkin
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Alexander Gordeev
6.0.x
Fix Committed
High
Alexander Gordeev
6.1.x
Fix Committed
High
Alexander Gordeev

Bug Description

Fuel-agent built-in partitioning algorithm has correct "utils.execute('udevadm', 'settle',"... ) part for RAID devices only.

For single disks it still uses 30 one-second timed out attempts in order to wait for the device free.
These 30 seconds may be not enough if target disk device is plugged in as free/JBOD drive to the corporate class RAID controller.

Please implement the same wait-for-device is free procedure for single drives.

Error message from Dell PERC controller and JBOD target:

2015-01-13T01:34:00.761586+00:00 info: 2015-01-13 01:34:00.304 7447 DEBUG fuel_agent.utils.partition_utils [-] Last time output contained "Device or resource busy". Trying to re-read partition table on device /dev/sda
2015-01-13T01:34:00.761697+00:00 info: 2015-01-13 01:34:00.304 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: partprobe /dev/sda
2015-01-13T01:34:00.761805+00:00 info: 2015-01-13 01:34:00.383 7447 DEBUG fuel_agent.utils.partition_utils [-] Partprobe output:
2015-01-13T01:34:00.761928+00:00 info:
2015-01-13T01:34:00.762053+00:00 info: 2015-01-13 01:34:00.384 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: partx -a /dev/sda
2015-01-13T01:34:00.762174+00:00 info: 2015-01-13 01:34:00.386 7447 DEBUG fuel_agent.utils.partition_utils [-] Partx output:
2015-01-13T01:34:00.762278+00:00 info:
2015-01-13T01:34:01.764229+00:00 info: 2015-01-13 01:34:01.388 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:02.765855+00:00 info: 2015-01-13 01:34:02.392 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:03.767398+00:00 info: 2015-01-13 01:34:03.396 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:04.768908+00:00 info: 2015-01-13 01:34:04.401 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:05.770497+00:00 info: 2015-01-13 01:34:05.406 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:06.772084+00:00 info: 2015-01-13 01:34:06.411 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:07.773810+00:00 info: 2015-01-13 01:34:07.415 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:08.775330+00:00 info: 2015-01-13 01:34:08.420 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:09.776862+00:00 info: 2015-01-13 01:34:09.425 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:10.778403+00:00 info: 2015-01-13 01:34:10.430 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:11.779934+00:00 info: 2015-01-13 01:34:11.435 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:12.781458+00:00 info: 2015-01-13 01:34:12.439 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:13.783053+00:00 info: 2015-01-13 01:34:13.444 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:14.784698+00:00 info: 2015-01-13 01:34:14.449 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:15.786339+00:00 info: 2015-01-13 01:34:15.453 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:16.788149+00:00 info: 2015-01-13 01:34:16.458 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:17.789863+00:00 info: 2015-01-13 01:34:17.463 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:18.791639+00:00 info: 2015-01-13 01:34:18.467 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:19.793116+00:00 info: 2015-01-13 01:34:19.472 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:20.794767+00:00 info: 2015-01-13 01:34:20.476 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:21.796541+00:00 info: 2015-01-13 01:34:21.481 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:22.798286+00:00 info: 2015-01-13 01:34:22.486 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:23.800021+00:00 info: 2015-01-13 01:34:23.491 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:24.801495+00:00 info: 2015-01-13 01:34:24.495 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:25.803133+00:00 info: 2015-01-13 01:34:25.500 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:26.804776+00:00 info: 2015-01-13 01:34:26.505 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:27.806522+00:00 info: 2015-01-13 01:34:27.509 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:28.807994+00:00 info: 2015-01-13 01:34:28.514 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:29.809612+00:00 info: 2015-01-13 01:34:29.518 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:30.811256+00:00 info: 2015-01-13 01:34:30.523 7447 DEBUG fuel_agent.utils.utils [-] Trying to execute command: test -e /dev/sda6
2015-01-13T01:34:31.812967+00:00 info: 2015-01-13 01:34:31.528 7447 ERROR fuel_agent.errors [-]
2015-01-13T01:34:31.813168+00:00 info: 2015-01-13 01:34:31.528 7447 CRITICAL fuel-agent [-] PartitionNotFoundError
2015-01-13T01:34:31.813328+00:00 info: 2015-01-13 01:34:31.528 7447 TRACE fuel-agent Traceback (most recent call last):
2015-01-13T01:34:31.813468+00:00 info: 2015-01-13 01:34:31.528 7447 TRACE fuel-agent File "/usr/bin/provision", line 10, in <module>
2015-01-13T01:34:31.813609+00:00 info: 2015-01-13 01:34:31.528 7447 TRACE fuel-agent sys.exit(provision())
2015-01-13T01:34:31.813754+00:00 info: 2015-01-13 01:34:31.528 7447 TRACE fuel-agent File "/usr/lib/python2.6/site-packages/fuel_agent/cmd/agent.py", line 37, in provision
2015-01-13T01:34:31.813891+00:00 info: 2015-01-13 01:34:31.528 7447 TRACE fuel-agent main(['do_provisioning'])
2015-01-13T01:34:31.814032+00:00 info: 2015-01-13 01:34:31.528 7447 TRACE fuel-agent File "/usr/lib/python2.6/site-packages/fuel_agent/cmd/agent.py", line 67, in main
2015-01-13T01:34:31.814185+00:00 info: 2015-01-13 01:34:31.528 7447 TRACE fuel-agent getattr(mgr, action)()
2015-01-13T01:34:31.814292+00:00 info: 2015-01-13 01:34:31.528 7447 TRACE fuel-agent File "/usr/lib/python2.6/site-packages/fuel_agent/manager.py", line 296, in do_provisioning
2015-01-13T01:34:31.814424+00:00 info: 2015-01-13 01:34:31.528 7447 TRACE fuel-agent self.do_partitioning()
2015-01-13T01:34:31.814566+00:00 info: 2015-01-13 01:34:31.528 7447 TRACE fuel-agent File "/usr/lib/python2.6/site-packages/fuel_agent/manager.py", line 95, in do_partitioning
2015-01-13T01:34:31.814699+00:00 info: 2015-01-13 01:34:31.528 7447 TRACE fuel-agent 'Partition not found' % prt.name)
2015-01-13T01:34:31.814834+00:00 info: 2015-01-13 01:34:31.528 7447 TRACE fuel-agent PartitionNotFoundError
2015-01-13T01:34:31.814970+00:00 info: 2015-01-13 01:34:31.528 7447 TRACE fuel-agent

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
Vladimir Kozhukalov (kozhukalov) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/147083
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=d28128aa0283262c5f19daa95b4aa9feabb71758
Submitter: Jenkins
Branch: master

commit d28128aa0283262c5f19daa95b4aa9feabb71758
Author: Vladimir Kozhukalov <email address hidden>
Date: Wed Jan 14 10:20:50 2015 +0300

    fuel_agent: removed reread_partitions method

    This reread_partitions method was a desperate attempt
    to work around udev related "device is busy" error.
    The correct way to deal with that stuff is to use
    udevadm --settle which is to block thread until udev is
    ready to handle events.

    Closes-Bug: 1410471
    Change-Id: Idb0dccb35aab10d02c5ad942fd30d52a461e1a0e

Changed in fuel:
status: In Progress → Fix Committed
tags: added: based image provision
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/6.0)

Fix proposed to branch: stable/6.0
Review: https://review.openstack.org/149236

Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

ETA of fix is roughly only the next weekend. This bug is not so trivial as we have thought

tags: added: image-based
removed: based image
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/152609

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/152610

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/6.0)

Fix proposed to branch: stable/6.0
Review: https://review.openstack.org/152622

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (refs/changes/22/152622/1)

Related fix proposed to branch: refs/changes/22/152622/1
Review: https://review.openstack.org/152625

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (stable/6.0)

Related fix proposed to branch: stable/6.0
Review: https://review.openstack.org/152630

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-web (refs/changes/22/152622/1)

Change abandoned by Aleksandr Gordeev (<email address hidden>) on branch: refs/changes/22/152622/1
Review: https://review.openstack.org/152625
Reason: git :(

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-web (stable/6.0)

Change abandoned by Aleksandr Gordeev (<email address hidden>) on branch: stable/6.0
Review: https://review.openstack.org/152622

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Aleksandr Gordeev (<email address hidden>) on branch: stable/6.0
Review: https://review.openstack.org/152630

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Aleksandr Gordeev (<email address hidden>) on branch: stable/6.0
Review: https://review.openstack.org/149236

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/152609
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=44c2a20c2a272a092643dd30a20129ab65bf4c5e
Submitter: Jenkins
Branch: master

commit 44c2a20c2a272a092643dd30a20129ab65bf4c5e
Author: Alexander Gordeev <email address hidden>
Date: Mon Feb 2 21:10:40 2015 +0300

    [IBP] Fix 'device or resource busy' error in fuel-agent

    In short, we've faced udev scaling issues when making a lot of `parted` calls.
    As for workaround, blacklisting of rules was choisen. It helps us to mitigate
    scaling issues.

    The fix was stress tested on a VM with 25 disks by creating 30 partitions per
    disk.

    Change-Id: Ibec9c0485fba657ef592fee3aab4e7757a705e46
    Closes-Bug: #1410471

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/6.0)

Reviewed: https://review.openstack.org/152622
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=92580cac42fb92021cb32b5477793fb76be39e6c
Submitter: Jenkins
Branch: stable/6.0

commit 92580cac42fb92021cb32b5477793fb76be39e6c
Author: Alexander Gordeev <email address hidden>
Date: Mon Feb 2 21:10:40 2015 +0300

    [IBP] Fix 'device or resource busy' error in fuel-agent

    In short, we've faced udev scaling issues when making a lot of `parted` calls.
    As for workaround, blacklisting of rules was choisen. It helps us to mitigate
    scaling issues.

    The fix was stress tested on a VM with 25 disks by creating 30 partitions per
    disk.

    Change-Id: Ibec9c0485fba657ef592fee3aab4e7757a705e46
    Closes-Bug: #1410471
    (cherry picked from commit 44c2a20c2a272a092643dd30a20129ab65bf4c5e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/155698

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/155708

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (stable/6.0)

Related fix proposed to branch: stable/6.0
Review: https://review.openstack.org/155779

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/155708
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=fc26d9f661dd51ad71f76a80d93d617634a5a013
Submitter: Jenkins
Branch: master

commit fc26d9f661dd51ad71f76a80d93d617634a5a013
Author: Alexander Gordeev <email address hidden>
Date: Fri Feb 13 14:19:09 2015 +0300

    [IBP] add udevadm settle to partition_utils again

    We've decided to add it back as it works for us.

    Change-Id: Ifd117154b42792b7683130ea4c01c845495463f4
    Related-Bug: #1410471

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (stable/6.0)

Reviewed: https://review.openstack.org/155779
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=fdde94b7fcfa1e6dfc32c3b0cc216bd5306d1346
Submitter: Jenkins
Branch: stable/6.0

commit fdde94b7fcfa1e6dfc32c3b0cc216bd5306d1346
Author: Alexander Gordeev <email address hidden>
Date: Fri Feb 13 14:19:09 2015 +0300

    [IBP] add udevadm settle to partition_utils again

    We've decided to add it back as it works for us.

    Change-Id: Ifd117154b42792b7683130ea4c01c845495463f4
    Related-Bug: #1410471

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/155698
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=f0b0e8cac7291d8e5f0dcfbae4c527d34188579a
Submitter: Jenkins
Branch: master

commit f0b0e8cac7291d8e5f0dcfbae4c527d34188579a
Author: Vladimir Kozhukalov <email address hidden>
Date: Fri Feb 13 13:15:32 2015 +0300

    Pmanager: Added blacklisting udev rules

    The problem is that udev does not always have enough
    time to handle events generated by parted. It handles
    them much faster if we disable all not kernel rules.

    Change-Id: I17a681cf4cff8f2d684b3ab8970aba8ede885c8c
    Related-Bug: #1410471

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/170579

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/170921

Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

techinally it's in fix released state for 6.1 and 6.0 for IBP

change was backported to classic provisioning only for 6.1

but there're still few flaws should be addressed. Such as https://review.openstack.org/#/c/170921/ https://review.openstack.org/#/c/170579/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/170579
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=10ab0e3dd1db9390339259a670a339eb5f49af25
Submitter: Jenkins
Branch: master

commit 10ab0e3dd1db9390339259a670a339eb5f49af25
Author: Alexander Gordeev <email address hidden>
Date: Fri Apr 3 21:20:41 2015 +0300

    Pmanager: Add udev settle after trigger

    Since udev is well known for its async nature, we need to be sure
    that it finishes its queue processing after calling 'udevadm trigger'.
    'udevadm settle' is the only way to achive that sync.

    Otherwise, we could end up with unexpectable changes in devfs in the
    middle of the rest of the script execution.

    Change-Id: Ie994b5fbb070314621ba369e7a4f4850cf400173
    Related-Bug: #1410471

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/170921
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=d032174c97a75979366e88c233c25acee22ed005
Submitter: Jenkins
Branch: master

commit d032174c97a75979366e88c233c25acee22ed005
Author: Alexander Gordeev <email address hidden>
Date: Mon Apr 6 19:34:49 2015 +0300

    [IBP] Add udevadm trigger

    As far as we're doing udev rules blacklisting during disk
    partitioning, we need to call 'udevadm trigger' after all paritioning
    will be done in order to re-create all the links which were skipped by
    udev while blacklisted.

    Since udev is async, we need to use 'udevadm settle' to be synced with
    udev workers.

    Related-Bug: #1410471
    Change-Id: I7bb8be57de699fffe5f96ef6fe3112cf5f2b5a20

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-web (master)

Change abandoned by Aleksandr Gordeev (<email address hidden>) on branch: master
Review: https://review.openstack.org/152610

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.