create lvm volume from snapshot fails with "device-mapper: reload ioctl on (252:4) failed: Invalid argument"

Bug #1642111 reported by Matt Riedemann
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Critical
Sean McGinnis
devstack
Fix Released
Critical
Matt Riedemann

Bug Description

Seen in the gate here:

http://logs.openstack.org/69/396469/7/check/gate-tempest-dsvm-cells-ubuntu-xenial/29e8b58/logs/screen-c-vol.txt.gz?level=TRACE

2016-11-15 22:06:36.791 ERROR oslo_messaging.rpc.server [req-97212d60-aa2a-47af-b90c-659e8d10e75f tempest-VolumesV2SnapshotTestJSON-541685595] Exception during message handling
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 155, in _process_incoming
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 225, in dispatch
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 195, in _do_dispatch
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "<decorator-gen-235>", line 2, in create_volume
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/objects/cleanable.py", line 191, in wrapper
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs)
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/volume/manager.py", line 613, in create_volume
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server _run_flow()
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/volume/manager.py", line 602, in _run_flow
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server flow_engine.run()
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/taskflow/engines/action_engine/engine.py", line 247, in run
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server for _state in self.run_iter(timeout=timeout):
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/taskflow/engines/action_engine/engine.py", line 340, in run_iter
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server failure.Failure.reraise_if_any(er_failures)
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/taskflow/types/failure.py", line 336, in reraise_if_any
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server failures[0].reraise()
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/taskflow/types/failure.py", line 343, in reraise
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server six.reraise(*self._exc_info)
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server result = task.execute(**arguments)
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/volume/flows/manager/create_volume.py", line 829, in execute
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server **volume_spec)
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/volume/flows/manager/create_volume.py", line 437, in _create_from_snapshot
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server snapshot)
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/volume/drivers/lvm.py", line 412, in create_volume_from_snapshot
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server self.vg.activate_lv(snapshot['name'], is_snapshot=True)
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/brick/local_dev/lvm.py", line 666, in activate_lv
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server run_as_root=True)
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/os_brick/executor.py", line 49, in _execute
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server result = self.__execute(*args, **kwargs)
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/utils.py", line 123, in execute
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server return processutils.execute(*cmd, **kwargs)
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 394, in execute
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server cmd=sanitized_cmd)
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server ProcessExecutionError: Unexpected error while running command.
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -K stack-volumes-lvmdriver-1/_snapshot-22290a16-1b74-4ec3-9610-7af1da40cf8a
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server Exit code: 5
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server Stdout: u''
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server Stderr: u' device-mapper: reload ioctl on (252:4) failed: Invalid argument\n'
2016-11-15 22:06:36.791 26225 ERROR oslo_messaging.rpc.server

Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in cinder:
status: New → Confirmed
Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

The command that is erroring out on has not changed for quite a while, so that's another clue that points to a package update.

https://github.com/openstack/cinder/blame/master/cinder/brick/local_dev/lvm.py#L652

Changed in cinder:
importance: Undecided → Critical
int32bit (int32bit)
Changed in cinder:
assignee: nobody → int32bit (int32bit)
Revision history for this message
Matt Riedemann (mriedem) wrote :

This is the top gate failure we have right now:

http://status.openstack.org//elastic-recheck/index.html#1642111

@int32bit, are you actually working on a fix for this because if you aren't, you shouldn't be assigned to this so someone else can be looking into fixing it.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Could changing the lvm_type to 'thin' in devstack had something to do with this?

https://github.com/openstack-dev/devstack/commit/dddb2c7b5f85688de9c9b92f025df25d2f2d3016

Revision history for this message
Matt Riedemann (mriedem) wrote :

The changelog for lvm2 (the package that provides lvchange) is pretty sparse, nothing recent:

http://changelogs.ubuntu.com/changelogs/pool/main/l/lvm2/lvm2_2.02.133-1ubuntu10/changelog

The last update was in April:

lvm2 (2.02.133-1ubuntu10) xenial; urgency=medium

  * Cherry-pick change from lvm2 2.02.133-2 in Debian to move event plugins
    back onto the main library patch, which fixes problems with monitoring
    failing for snapshots and raid volumes. Closes: #807279, LP: #1556451,
    LP: #1561228.

 -- Steve Langasek <email address hidden> Sat, 16 Apr 2016 00:06:53 -0700

Which is the version we're using:

http://logs.openstack.org/69/396469/7/check/gate-tempest-dsvm-cells-ubuntu-xenial/29e8b58/logs/dpkg-l.txt.gz

ii lvm2 2.02.133-1ubuntu10 amd64 Linux Logical Volume Manager

Revision history for this message
Matt Riedemann (mriedem) wrote :

I see this in the syslog at the point of failure:

http://logs.openstack.org/69/396469/7/check/gate-tempest-dsvm-cells-ubuntu-xenial/29e8b58/logs/syslog.txt.gz#_Nov_15_22_06_36

Nov 15 22:06:36 ubuntu-xenial-osic-cloud1-s3700-5383810 kernel: device-mapper: table: 252:4: thin: Unable to activate thin device while pool is suspended
Nov 15 22:06:36 ubuntu-xenial-osic-cloud1-s3700-5383810 kernel: device-mapper: ioctl: error adding target to table

Revision history for this message
Matt Riedemann (mriedem) wrote :

devstack change to workaround this until there is a fix in cinder:

https://review.openstack.org/#/c/400465/

Changed in devstack:
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to devstack (master)

Reviewed: https://review.openstack.org/400465
Committed: https://git.openstack.org/cgit/openstack-dev/devstack/commit/?id=b6cbf922d79d7189dab7d68dc6014fa8682aad9d
Submitter: Jenkins
Branch: master

commit b6cbf922d79d7189dab7d68dc6014fa8682aad9d
Author: Matt Riedemann <email address hidden>
Date: Mon Nov 21 21:10:49 2016 -0500

    Change CINDER_LVM_TYPE back to 'default' as the default

    Change dddb2c7b5f85688de9c9b92f025df25d2f2d3016 recently
    changed devstack to enable the Cinder image cache by default
    and changed to use thinly provisioned LVM volumes by default.

    Since then we've had a spike in thin LVM snapshot test failures
    in the gate, which is by far our top gate bug at 219 hits in the
    last 10 days.

    So unless there is a fix on the Cinder side, this changes the
    default lvm_type back to 'default' for thick provisioning.

    Change-Id: I1c53bbe40177fe104ed0a222124bbc45c553b817
    Related-Bug: #1642111

Matt Riedemann (mriedem)
Changed in devstack:
status: In Progress → Fix Released
Revision history for this message
int32bit (int32bit) wrote :

So I think We need check lv pool "Attr" using 'lvs pool', if the pool is "thin" and "suspend", we need abort activate.

int32bit (int32bit)
Changed in cinder:
assignee: int32bit (int32bit) → nobody
Revision history for this message
Steve Noyes (steve-noyes) wrote :

I am seeing this recently in the Citrix Xenserver CI -

tempest.api.volume.test_volumes_snapshots.VolumesSnapshotTestJSON.test_volume_from_snapshot

VolumeResourceBuildErrorException: volume 9f505c38-4c51-48cd-809d-7e04db7c8b48 failed to build and is in ERROR status

reload ioctl on (252:9) failed: Invalid argument

logs-
http://dd6b71949550285df7dc-dda4e480e005aaa13ec303551d2d8155.r49.cf1.rackcdn.com/98/389798/6/check/dsvm-tempest-neutron-network/a3081b5/

I also see this same error on a CI run on a different review-

http://dd6b71949550285df7dc-dda4e480e005aaa13ec303551d2d8155.r49.cf1.rackcdn.com/81/452181/1/check/dsvm-tempest-neutron-network/206ae71/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to devstack (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/501049

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to devstack (master)

Reviewed: https://review.openstack.org/501049
Committed: https://git.openstack.org/cgit/openstack-dev/devstack/commit/?id=486376e91b1f9a7680371036e470b8692804e917
Submitter: Jenkins
Branch: master

commit 486376e91b1f9a7680371036e470b8692804e917
Author: Sean McGinnis <email address hidden>
Date: Tue Sep 5 19:56:06 2017 -0500

    Change CINDER_LVM_TYPE to 'auto' as the default

    This was previously set to thin as the default, but at the time
    there were failures seen with what appeared to be race conditions
    when creating snapshots.

    These failures are not seen locally, and we have a lot of installs
    using the default auto by this point with no reports from the field
    of seeing this failure. This is to be able to more extensively test
    this in the gate, and hopefully get this switched over to be able
    to thinly provision by default when possible.

    Change-Id: I3e99adadd1c37ba8b24b6cb71a8969ffc93f75a1
    Related-bug: #1642111

Revision history for this message
Stefan Nica (stefan.nica) wrote :

I'm using 'lvm_type = thin' in the cinder configuration file and after switching from Newton to Pike, I've just started seeing errors that look very similar to those described here while running tempest. The following tempest test cases that are failing are apparently affected by this:

tempest.api.volume.test_volumes_snapshots.VolumesSnapshotTestJSON.test_volume_from_snapshot
tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern

The errors reported by cinder-volume:

2017-09-19 17:07:49.716 21672 ERROR oslo_messaging.rpc.server [req-b137810a-df14
-4691-b9d0-8c332360a1d2 0bd1b641b41143ebb3d5aab6e50e7f17 256184305a744ef7ba8c233
51934551c - default default] Exception during message handling: ProcessExecution
Error: Unexpected error while running command.
Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -k n
 cinder-volumes/volume-b1277905-b6e5-47db-a3db-124cad97fcc0
Exit code: 5
Stdout: u' Logical volume "volume-b1277905-b6e5-47db-a3db-124cad97fcc0" changed
.\n'
Stderr: u' device-mapper: reload ioctl on (254:5) failed: No data available\n'

Revision history for this message
Stefan Nica (stefan.nica) wrote :

I tracked down the following change introduced in Pike that is causing the "reload ioctl on (254:5) failed: No data available" tempest failures: https://review.openstack.org/#/c/488264

The problem seems to be that the snapshot volume is not active when the snapshot volume is created. Another easy way to reproduce this is to create the volume snapshot manually:

    openstack volume create --size 1 test-volume
    openstack snapshot create --name test-snapshot test-volume
    openstack volume create --snapshot test-snapshot --size 2 test-snapshot-volume

If I manually activate the snapshot before running the third step, cinder-volume doesn't complain anymore.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/505601

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/506091

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master)

Reviewed: https://review.openstack.org/505601
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=17c447e9f06af0afb195d6a2879c06233de520c7
Submitter: Jenkins
Branch: master

commit 17c447e9f06af0afb195d6a2879c06233de520c7
Author: Stefan Nica <email address hidden>
Date: Wed Sep 20 13:14:29 2017 +0200

    LVM: Activate thin snapshot before clone activation

    LVM may be configured to not automatically activate
    thin-provisioned LVs. Ensure they are activated before
    activating a clone, otherwise activating the clone may fail.

    Change-Id: Iaedeb3cdc706daa34db8d50c48cf738f6edb9bcf
    Related-Bug: #1642111

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :
Revision history for this message
Jan Zerebecki (jan-zerebecki) wrote :
Revision history for this message
Boden R (boden) wrote :

Based on [1] this error appears to become more common.
It seems like we shouldn't be ignoring this defect since it's impacting multiple projects.

[1] http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22failed%20to%20build%20and%20is%20in%20ERROR%20status%5C%22

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Breadcrumb in the syslog of what may be causing the thin activation to fail:

kernel: device-mapper: table: 252:7: thin: Unable to activate thin device while pool is suspended

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Current hunch is the snapshot has not actually finished creating yet. While the snapshot creation is happening the thin pool is in the 'suspend' state. May need a delay and/or additional checks to validate the snapshot and pool are in a good state to continue on.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Based on this logstash query:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22device-mapper%3A%20reload%20ioctl%20on%5C%22%20AND%20%20%20message%3A%5C%22failed%3A%20Invalid%20argument%5C%22%20AND%20%20%20tags%3A%5C%22screen-c-vol.txt%5C%22%20AND%20voting%3A1&from=7d

This started showing up around 10/3, so it could be related to the fact we're using the Pike UCA now:

https://github.com/openstack-dev/devstack/commit/b3b6c102d922ac638dbea51b22e30764031df76d

So pulling in new versions of lvm packages. Maybe check the diff in the changelog for the package before and after that change and see if there is anything suspect.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Before the Pike UCA change we were using:

ii lvm2 2.02.133-1ubuntu10 amd64 Linux Logical Volume Manager

After the Pike UCA change we are using:

ii lvm2 2.02.133-1ubuntu10 amd64 Linux Logical Volume Manager

So I guess that's probably not the issue...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/510168

Changed in cinder:
assignee: nobody → Sean McGinnis (sean-mcginnis)
status: Confirmed → In Progress
Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Optimistically marking review 510168 as "Closes-bug". If that merges and we still see this I will reopen.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/510168
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=a2b7b8d74a2dded1a38354321f5627d7375d7ee2
Submitter: Jenkins
Branch: master

commit a2b7b8d74a2dded1a38354321f5627d7375d7ee2
Author: Sean McGinnis <email address hidden>
Date: Fri Oct 6 10:55:22 2017 -0500

    Add retries to LVM logical volume activation

    We are running into failures activating snapshots where the syslog shows
    the output "thin: Unable to activate thin device while pool is suspended"
    when attempting to use quickly after creation. This appears to be a race
    where there are still internal things being done after the snapshot is
    created.

    This is a bit of a punt, but with local testing the thin pool state either
    does not visibily change or transitions so fast that it is hard to capture
    the state transition in the vgdisplay. Since we know this operations works
    most of the time, it would seem we are just giving up before the pool gets
    back into the right state to do this activation.

    Rather than trying to get the thin pool state and parse the output of the
    command, just adding retries to the operation that back off between each
    attempt. Based on what we've seen with successful runs, this should allow
    it to fail while the pool is in this transitional state and attempt again
    later when hopefully things have settled.

    Change-Id: I3e7037b3571665251db8dee2cf22cab1297106c9
    Closes-bug: #1642111

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
Matt Riedemann (mriedem) wrote :

Still seeing failures:

http://logs.openstack.org/75/508175/4/check/gate-tempest-dsvm-py35-ubuntu-xenial/d46ef74/logs/screen-c-vol.txt#_Oct_09_12_08_37_819653

Oct 09 12:08:37.819653 ubuntu-xenial-citycloud-kna1-11286426 cinder-volume[28155]: ERROR cinder.brick.local_dev.lvm [None req-be59c5a8-24bb-4bc7-af92-768b11f74368 tempest-VolumesSnapshotTestJSON-129534577 None] Error activating LV: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Oct 09 12:08:37.820699 ubuntu-xenial-citycloud-kna1-11286426 cinder-volume[28155]: Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -K stack-volumes-lvmdriver-1/_snapshot-fd094995-94ad-44f1-9ad7-6e903267c1d6
Oct 09 12:08:37.820779 ubuntu-xenial-citycloud-kna1-11286426 cinder-volume[28155]: Exit code: 5
Oct 09 12:08:37.820856 ubuntu-xenial-citycloud-kna1-11286426 cinder-volume[28155]: Stdout: ''
Oct 09 12:08:37.820930 ubuntu-xenial-citycloud-kna1-11286426 cinder-volume[28155]: Stderr: ' device-mapper: reload ioctl on (252:7) failed: Invalid argument\n'

Is there any indication what the invalid argument is from this?

lvchange -a y --yes -K stack-volumes-lvmdriver-1/_snapshot-fd094995-94ad-44f1-9ad7-6e903267c1d6

Revision history for this message
Matt Riedemann (mriedem) wrote :

Could running with --refresh help if the first attempt fails? Or at least dump lvdisplay before giving up to compare if what you're trying to do is valid with the current state of the lvm?

Revision history for this message
Matt Riedemann (mriedem) wrote :

This is the failure:

http://logs.openstack.org/75/508175/4/check/gate-tempest-dsvm-py35-ubuntu-xenial/d46ef74/logs/syslog.txt.gz#_Oct_09_12_08_37

Oct 09 12:08:37 ubuntu-xenial-citycloud-kna1-11286426 kernel: device-mapper: table: 252:7: thin: Unable to activate thin device while pool is suspended
Oct 09 12:08:37 ubuntu-xenial-citycloud-kna1-11286426 kernel: device-mapper: ioctl: error adding target to table

Revision history for this message
Matt Riedemann (mriedem) wrote :

Ignore comment 28, we are going to see this in the logs if we're retrying, but the retry does seem to have made a positive impact on the failure rates.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 12.0.0.0b1

This issue was fixed in the openstack/cinder 12.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (stable/pike)

Change abandoned by Sean McGinnis (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/506091

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (stable/pike)

Reviewed: https://review.openstack.org/506091
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=d4557dd4fc29d58d4032f3fcbd85a81a13991c40
Submitter: Zuul
Branch: stable/pike

commit d4557dd4fc29d58d4032f3fcbd85a81a13991c40
Author: Stefan Nica <email address hidden>
Date: Wed Sep 20 13:14:29 2017 +0200

    LVM: Activate thin snapshot before clone activation

    LVM may be configured to not automatically activate
    thin-provisioned LVs. Ensure they are activated before
    activating a clone, otherwise activating the clone may fail.

    Change-Id: Iaedeb3cdc706daa34db8d50c48cf738f6edb9bcf
    Related-Bug: #1642111
    (cherry picked from commit 17c447e9f06af0afb195d6a2879c06233de520c7)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/630923

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/pike)

Reviewed: https://review.openstack.org/630923
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=c44607691e96752f76141d80ae815c49d5b61206
Submitter: Zuul
Branch: stable/pike

commit c44607691e96752f76141d80ae815c49d5b61206
Author: Sean McGinnis <email address hidden>
Date: Fri Oct 6 10:55:22 2017 -0500

    Add retries to LVM logical volume activation

    We are running into failures activating snapshots where the syslog shows
    the output "thin: Unable to activate thin device while pool is suspended"
    when attempting to use quickly after creation. This appears to be a race
    where there are still internal things being done after the snapshot is
    created.

    This is a bit of a punt, but with local testing the thin pool state either
    does not visibily change or transitions so fast that it is hard to capture
    the state transition in the vgdisplay. Since we know this operations works
    most of the time, it would seem we are just giving up before the pool gets
    back into the right state to do this activation.

    Rather than trying to get the thin pool state and parse the output of the
    command, just adding retries to the operation that back off between each
    attempt. Based on what we've seen with successful runs, this should allow
    it to fail while the pool is in this transitional state and attempt again
    later when hopefully things have settled.

    Change-Id: I3e7037b3571665251db8dee2cf22cab1297106c9
    Closes-bug: #1642111
    (cherry picked from commit a2b7b8d74a2dded1a38354321f5627d7375d7ee2)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 11.2.1

This issue was fixed in the openstack/cinder 11.2.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.