LVM driver initialisation race between cinder-volume and cinder-backup

Bug #1410341 reported by Duncan Thomas
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Medium
Duncan Thomas
Juno
Fix Released
Medium
Jordan Pittier

Bug Description

On an LVM thin provisioned based system, on first startup:

If cinder-volume and cinder-backup both call driver check_setup_for_errors at exactly the same time, then there is a race between looking for the thin pool to exist and creating it. Both look for it, don't find it, and try to create it. Which ever looses the race has the create thin pool fail and bombs out.

There should be either a retry if the create fails (to see if we just lost a race) or a filesystem based local lock around the check/create.

I prefer a simple retry-the-check if the create fails.

Revision history for this message
John Griffith (john-griffith) wrote :

Should we even be calling this here? Shouldn't we just be checking if the volume-driver is initialized and that's it? We can surely retry that, but I don't think we should have multiple managers trying to perform setup and init on the same object.

Changed in cinder:
assignee: nobody → Duncan Thomas (duncan-thomas)
status: New → In Progress
Revision history for this message
John Griffith (john-griffith) wrote :

Ok, didn't realize that Backup Manager had it's own instance of the Volume driver. Seems like this could be problematic but changing that is def out of the scope of this bug.

Changed in cinder:
importance: Undecided → Medium
Revision history for this message
Jordan Pittier (jordan-pittier) wrote :

I can easily reproduce this with the Scality SOFS driver using :

cd ~/devstack && source functions && source stackrc && source openrc && source lib/cinder && stop_cinder && sleep 1 && sudo umount /sofs && USE_SCREEN=No start_cinder

I guess it should be the same with the LVM driver.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/147859

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master)

Reviewed: https://review.openstack.org/147859
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=4b136f574f465ce0fdd67299f5ed5f912bf5598b
Submitter: Jenkins
Branch: master

commit 4b136f574f465ce0fdd67299f5ed5f912bf5598b
Author: JordanP <email address hidden>
Date: Thu Jan 15 15:42:10 2015 +0100

    Scality: Lock around SOFS mount to avoid a race

    Both cinder-volume and cinder-backup could want to mount the
    SOFS at the same time (during init).

    Related-Bug: #1410341
    Change-Id: I75faa6eb283bc7c1f655cf5b051bed025af3d701

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (stable/juno)

Related fix proposed to branch: stable/juno
Review: https://review.openstack.org/155241

Mike Perez (thingee)
Changed in cinder:
assignee: Duncan Thomas (duncan-thomas) → nobody
milestone: none → kilo-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (stable/juno)

Reviewed: https://review.openstack.org/155241
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=6a950f1cf9044821bb5cfc78dd026edceaa3abc6
Submitter: Jenkins
Branch: stable/juno

commit 6a950f1cf9044821bb5cfc78dd026edceaa3abc6
Author: JordanP <email address hidden>
Date: Thu Jan 15 15:42:10 2015 +0100

    Scality: Lock around SOFS mount to avoid a race

    Both cinder-volume and cinder-backup could want to mount the
    SOFS at the same time (during init).

    Backported is needed because this bug is quite severe. When system boots
    cinder-backup and cinder-volume are likely to start at the same time.

    Related-Bug: #1410341
    Change-Id: I75faa6eb283bc7c1f655cf5b051bed025af3d701
    (cherry picked from commit 4b136f574f465ce0fdd67299f5ed5f912bf5598b)

tags: added: in-stable-juno
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by Mike Perez (<email address hidden>) on branch: master
Review: https://review.openstack.org/146917
Reason: no update for over a month.

Mike Perez (thingee)
Changed in cinder:
status: In Progress → Triaged
Changed in cinder:
assignee: nobody → Duncan Thomas (duncan-thomas)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/146917
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=c86f2be2e217e82c80e7ff68904e90ec6c5cba7f
Submitter: Jenkins
Branch: master

commit c86f2be2e217e82c80e7ff68904e90ec6c5cba7f
Author: Duncan Thomas <email address hidden>
Date: Tue Jan 13 18:41:13 2015 +0200

    Fix LVM thin pool creation race

    In the event that two copied of the LVM driver get init called at
    the same time (e.g. cinder-volume and cinder-backup getting
    started in parallel, on the same host), it is possible for the
    thin pool check/create to race. Add a simple recheck if the create
    fails, to cover this window.

    Change-Id: I006970736ba0e62df383bacc79b5754dea2e9a3e
    Closes-Bug: #1410341

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: kilo-3 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.