lvm deactivate race in snapshot extend

Bug #1495560 reported by John Griffith
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
Nate Potter
os-brick
Fix Released
Undecided
Nate Potter

Bug Description

There seems to be instances where the lvchange call is made and returns successfully however the subsequent snapshot extend call fails due to the volume still being active:

Example here:
http://logs.openstack.org/08/200108/4/check/gate-tempest-dsvm-neutron-full/361009b/logs/screen-c-vol.txt.gz#_2015-09-10_17_03_59_622

We might want to consider converting the deactivate method to a loop with status check to make sure the volume is deactivated rather than just relying on the lvchange response.

Changed in cinder:
status: New → Triaged
importance: Undecided → High
assignee: nobody → John Griffith (john-griffith)
Changed in cinder:
assignee: John Griffith (john-griffith) → Jordan Pittier (jordan-pittier)
Revision history for this message
Jordan Pittier (jordan-pittier) wrote :

How to reproduce:

1) Get this new tempest test : https://review.openstack.org/#/c/200108/
2) Run this test (tempest.api.volume.test_volumes_extend:VolumesV2ExtendTest.test_volume_extend_when_vol_has_snapshot) at least 4 times concurrently (to put some load on LVM)
3) Look for a stack trace in c-vol:

ProcessExecutionError: Unexpected error while running command.
cinder.volume.manager Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvextend -L 2g stack-volumes-lvmdriver-1/volume-5a1b1cd1-b0e2-4ff2-8dfd-f01f000924c0
cinder.volume.manager Exit code: 5
cinder.volume.manager Stdout: u''
cinder.volume.manager Stderr: u' Snapshot origin volumes can be resized only while inactive: try lvchange -an\n'

Revision history for this message
Jordan Pittier (jordan-pittier) wrote :

Now this is getting weird.

If I tweak my devstack with VOLUME_BACKING_FILE_SIZE=40G (the default is 10G), I can't reproduce even with 10 'clients' running this test. With the default 10G, I only need 3 'clients' running that tempest test concurrently.

So I think we are seeing weird things only when LVM almost runs out of space.

Changed in cinder:
assignee: Jordan Pittier (jordan-pittier) → nobody
Revision history for this message
Justin A Wilson (justin-wilson) wrote :

Which source files does this problem affect?

Nate Potter (ntpttr)
Changed in cinder:
assignee: nobody → Nate Potter (ntpttr)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/340655

Changed in cinder:
status: Triaged → In Progress
Revision history for this message
Nate Potter (ntpttr) wrote :
Changed in os-brick:
assignee: nobody → Nate Potter (ntpttr)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/340655
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=4373e98fa0881c0bf6fa747e05742143402f4c39
Submitter: Jenkins
Branch: master

commit 4373e98fa0881c0bf6fa747e05742143402f4c39
Author: Nate Potter <email address hidden>
Date: Tue Jul 12 03:55:26 2016 +0000

    Remove race condition from lvextend

    Currently it's possible for extend_volume in lvm to return
    from the deactivate_lv call and try to extend the volume before
    the lv has actually been deactivated. This patch adds logic to
    make sure that the lv is deactivated before returning from
    deactivate_lv.

    Change-Id: Ifc9dcb20e17c60a835e9f3b38c5bffd836fa5188
    Closes-bug: #1495560

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-brick (master)

Reviewed: https://review.openstack.org/341854
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=e1f9a5462512e3acd39689ffbff67d454e5db58d
Submitter: Jenkins
Branch: master

commit e1f9a5462512e3acd39689ffbff67d454e5db58d
Author: Nate Potter <email address hidden>
Date: Thu Jul 14 00:32:40 2016 +0000

    Remove race condition from lvextend

    Currently it's possible for extend_volume in lvm to return
    from the deactivate_lv call and try to extend the volume before
    the lv has actually been deactivated. This patch adds logic to
    make sure that the lv is deactivated before returning from
    deactivate_lv.

    Change-Id: I5c3671043df6e7474acdfcce342d655ac215a461
    Closes-bug: #1495560

Changed in os-brick:
status: In Progress → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/os-brick 1.6.0

This issue was fixed in the openstack/os-brick 1.6.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

This issue was fixed in the openstack/os-brick 1.6.0 release.

Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/cinder 9.0.0.0b3

This issue was fixed in the openstack/cinder 9.0.0.0b3 development milestone.

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Seeing this pop up again, but verified these code changes are present. Only way I can see this failing is if the check for whether it is still active fails to get the expected output. Keeping an eye on it for now.

http://logs.openstack.org/12/460512/1/check/gate-tempest-dsvm-py35-ubuntu-xenial/bf3d73e/

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

The new failures are being tracked under bug 1687044.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.