lvm commands crash, causing failures

Bug #1932188 reported by Eric Harney
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
Sofia Enriquez
os-brick
New
High
Unassigned

Bug Description

https://review.opendev.org/c/openstack/cinder/+/783660 fixed this for some LVM calls, info can be found there and in
https://launchpad.net/bugs/1901783

That fix added a retry when the "lvs" command crashes, but other LVM crashes are occurring which need a similar retry workaround.

In this case, an "lvs" call from _get_thin_pool_free_space crashed, causing the volume driver to report 0 free space.

This causes tempest tests to fail because volumes cannot be allocated, as seen here:
https://zuul.opendev.org/t/openstack/build/202ff0cc5a164ae3a67a1be0535ef8be/logs from
https://review.opendev.org/c/openstack/cinder/+/796680/1

Revision history for this message
Eric Harney (eharney) wrote :

Tempest failure:

c-sch:
https://78bdfcd60d586f3ba39c-ebf431c06382d97dfdf40c8495b34507.ssl.cf2.rackcdn.com/796680/1/check/cinder-tempest-plugin-lvm-lio-barbican/202ff0c/controller/logs/screen-c-sch.txt
Jun 16 15:47:14.599337 ubuntu-focal-rax-dfw-0025145355 cinder-scheduler[111543]: WARNING cinder.scheduler.filters.capacity_filter [None req-51edf9ed-ba3c-47b8-b5db-d4e38110eec5 tempest-VolumesBackupsTest-2112242099 None] Insufficient free virtual space (0.0GB) to accommodate thin provisioned 1GB volume on host ubuntu-focal-rax-dfw-0025145355@lvmdriver-1#lvmdriver-1.
Jun 16 15:47:14.599617 ubuntu-focal-rax-dfw-0025145355 cinder-scheduler[111543]: DEBUG cinder.scheduler.base_filter [None req-51edf9ed-ba3c-47b8-b5db-d4e38110eec5 tempest-VolumesBackupsTest-2112242099 None] Filter CapacityFilter returned 0 host(s) {{(pid=111543) get_filtered_objects /opt/stack/cinder/cinder/scheduler/base_filter.py:123}}

c-vol:
    Jun 16 15:47:08.429406 ubuntu-focal-rax-dfw-0025145355 cinder-volume[112538]: ERROR cinder.brick.local_dev.lvm [None req-dd4158cb-774c-4190-8e40-0357882b1bb9 None None] Error querying thin pool about data_percent: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
    Jun 16 15:47:08.429406 ubuntu-focal-rax-dfw-0025145355 cinder-volume[112538]: Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C lvs --noheadings --unit=g -o size,data_percent --separator : --nosuffix /dev/stack-volumes-lvmdriver-1/stack-volumes-lvmdriver-1-pool
    Jun 16 15:47:08.429406 ubuntu-focal-rax-dfw-0025145355 cinder-volume[112538]: Exit code: 139

messages:
    Jun 16 15:47:08 ubuntu-focal-rax-dfw-0025145355 kernel: lvs[171824]: segfault at 800 ip 00007f96c2927860 sp 00007ffedbacb4a8 error 4 in libc-2.31.so[7f96c27d3000+178000]
    Jun 16 15:47:08 ubuntu-focal-rax-dfw-0025145355 kernel: Code: 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 48 89 f1 49 89 d0 48 89 fa 4d 85 c0 0f 84 ca 20 00 00 49 83 f8 08 0f 86 60 21 00 00 <80> 39 00 0f 84 c7 1c 00 00 80 79 01 00 0f 84 dd 1c 00 00 80 79 02

Eric Harney (eharney)
Changed in cinder:
assignee: nobody → Eric Harney (eharney)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/796889

Changed in cinder:
status: New → In Progress
Revision history for this message
Sofia Enriquez (lsofia-enriquez) wrote :
Changed in cinder:
importance: Undecided → High
tags: added: gate-failure lvm
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/cinder/+/797161

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/796889
Committed: https://opendev.org/openstack/cinder/commit/2da7d42b91fd0015f8bfb2b67067eaf5768691ca
Submitter: "Zuul (22348)"
Branch: master

commit 2da7d42b91fd0015f8bfb2b67067eaf5768691ca
Author: Eric Harney <email address hidden>
Date: Sat Jul 17 15:39:31 2021 +0000

    Retry "lvs" call on segfault for _get_thin_pool_free_space

    This is a follow-up to I6824ba4f.

    LVM commands segfault occasionally, exiting with code 139.
    Change I6824ba4f introduced a workaround to retry the command
    when code 139 is returned, which generally works. This expands
    that retry to the case where thin pool space is queried, which
    currently results in the LVM driver reporting no free space to
    the scheduler.

    Further work is needed to expand this to other LVM calls, but
    this patch is narrow in scope to target a particular gate
    failure.

    Related-Bug: #1901783
    Partial-Bug: #1932188
    Closes-Bug: #1932287
    Change-Id: I0a2420f3e4a411f5fa52ebe2d22859b138ef387f

Revision history for this message
Eric Harney (eharney) wrote :

Similar looking crash in "lvcreate":

https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_91b/795994/1/check/tempest-full-py3/91bea07/controller/logs/syslog.txt

https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_91b/795994/1/check/tempest-full-py3/91bea07/controller/logs/screen-c-vol.txt

Jun 11 13:27:24.927199 ubuntu-focal-rax-ord-0025074029 cinder-volume[111475]: ERROR cinder.brick.local_dev.lvm [None req-4741d105-caa1-4548-90e4-979b7f46cd79 tempest-GroupSnapshotsTest-1653523329 None] Error creating Volume: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Jun 11 13:27:24.927199 ubuntu-focal-rax-ord-0025074029 cinder-volume[111475]: Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C lvcreate -T -V 1g -n volume-ddc1ecc2-36ac-41b3-acab-1dcd8332ae09 stack-volumes-lvmdriver-1/stack-volumes-lvmdriver-1-pool
Jun 11 13:27:24.927199 ubuntu-focal-rax-ord-0025074029 cinder-volume[111475]: Exit code: 139
Jun 11 13:27:24.927199 ubuntu-focal-rax-ord-0025074029 cinder-volume[111475]: Stdout: ''
Jun 11 13:27:24.927199 ubuntu-focal-rax-ord-0025074029 cinder-volume[111475]: Stderr: ' WARNING: Failed to get udev device handler for device /dev/sda1.\n /dev/sda15: stat failed: No such file or directory\n Path /dev/sda15 no longer valid for device(8,15)\n /dev/sda15: stat failed: No such file or directory\n Path /dev/sda15 no longer valid for device(8,15)\n Device open /dev/sda 8:0 failed errno 2\n Device open /dev/sda 8:0 failed errno 2\n Device open /dev/sda1 8:1 failed errno 2\n Device open /dev/sda1 8:1 failed errno 2\n WARNING: Scan ignoring device 8:0 with no paths.\n WARNING: Scan ignoring device 8:1 with no paths.\n'

Eric Harney (eharney)
Changed in cinder:
assignee: Eric Harney (eharney) → Sofia Enriquez (lsofia-enriquez)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/797161
Committed: https://opendev.org/openstack/cinder/commit/9ad46817e42b577ae2625ce90a32ee96eb00f427
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 9ad46817e42b577ae2625ce90a32ee96eb00f427
Author: Eric Harney <email address hidden>
Date: Sat Jul 17 15:39:31 2021 +0000

    Retry "lvs" call on segfault for _get_thin_pool_free_space

    This is a follow-up to I6824ba4f.

    LVM commands segfault occasionally, exiting with code 139.
    Change I6824ba4f introduced a workaround to retry the command
    when code 139 is returned, which generally works. This expands
    that retry to the case where thin pool space is queried, which
    currently results in the LVM driver reporting no free space to
    the scheduler.

    Further work is needed to expand this to other LVM calls, but
    this patch is narrow in scope to target a particular gate
    failure.

    Related-Bug: #1901783
    Partial-Bug: #1932188
    Closes-Bug: #1932287
    Change-Id: I0a2420f3e4a411f5fa52ebe2d22859b138ef387f
    (cherry picked from commit 410306efb82760191e76b4d40817f38842d87eb0)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/cinder/+/798356

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/cinder/+/798671

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/796950
Committed: https://opendev.org/openstack/cinder/commit/a8552ed2d69ae4c0f3ec97fa92395cd245a8d848
Submitter: "Zuul (22348)"
Branch: master

commit a8552ed2d69ae4c0f3ec97fa92395cd245a8d848
Author: Sofia Enriquez <email address hidden>
Date: Thu Jun 17 20:44:09 2021 +0000

    LVM: Retry lvdisplay and lvcreate calls on segfault

    This is a follow-up to I0a2420f3e4a411f5fa52ebe2d22859b138ef387f.

    LVM commands segfault occasionally, exiting with code 139.
    Change I6824ba4f introduced a workaround to retry the command
    when code 139 is returned, which generally works.

    Closes-Bug: #1932188
    Change-Id: I7c0f4d4ea7de635afede3c8514a5da9e85ad9b48

Changed in cinder:
status: In Progress → Fix Released
affects: cinder → os-brick
affects: os-brick → cinder
Changed in os-brick:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/798356
Committed: https://opendev.org/openstack/cinder/commit/b6d575b34b036c793727db081668156da1884a2b
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit b6d575b34b036c793727db081668156da1884a2b
Author: Eric Harney <email address hidden>
Date: Sat Jul 17 15:39:31 2021 +0000

    Retry "lvs" call on segfault for _get_thin_pool_free_space

    This is a follow-up to I6824ba4f.

    LVM commands segfault occasionally, exiting with code 139.
    Change I6824ba4f introduced a workaround to retry the command
    when code 139 is returned, which generally works. This expands
    that retry to the case where thin pool space is queried, which
    currently results in the LVM driver reporting no free space to
    the scheduler.

    Further work is needed to expand this to other LVM calls, but
    this patch is narrow in scope to target a particular gate
    failure.

    Related-Bug: #1901783
    Partial-Bug: #1932188
    Closes-Bug: #1932287
    Change-Id: I0a2420f3e4a411f5fa52ebe2d22859b138ef387f
    (cherry picked from commit 410306efb82760191e76b4d40817f38842d87eb0)
    (cherry picked from commit 9ad46817e42b577ae2625ce90a32ee96eb00f427)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/798671
Committed: https://opendev.org/openstack/cinder/commit/77d4aa6a898d25e7fb3ffccd1386b84d3f1ba1d3
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 77d4aa6a898d25e7fb3ffccd1386b84d3f1ba1d3
Author: Sofia Enriquez <email address hidden>
Date: Thu Jun 17 20:44:09 2021 +0000

    LVM: Retry lvdisplay and lvcreate calls on segfault

    This is a follow-up to I0a2420f3e4a411f5fa52ebe2d22859b138ef387f.

    LVM commands segfault occasionally, exiting with code 139.
    Change I6824ba4f introduced a workaround to retry the command
    when code 139 is returned, which generally works.

    Closes-Bug: #1932188
    Change-Id: I7c0f4d4ea7de635afede3c8514a5da9e85ad9b48
    (cherry picked from commit a8552ed2d69ae4c0f3ec97fa92395cd245a8d848)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 19.0.0.0b1

This issue was fixed in the openstack/cinder 19.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/805026

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/805026
Committed: https://opendev.org/openstack/cinder/commit/c4b8956763c2759bd2b9fe511efac5b3edfb226d
Submitter: "Zuul (22348)"
Branch: master

commit c4b8956763c2759bd2b9fe511efac5b3edfb226d
Author: Eric Harney <email address hidden>
Date: Wed Aug 18 09:19:58 2021 -0400

    LVM: Retry lvextend commands on code 139

    Retry lvextend commands upon segfault, similar to other
    LVM calls. This affects the volume extend path.

    Change-Id: I0c0cb5308246a3dce736eade67b40be063aa78bb
    Related-Bug: #1901783
    Related-Bug: #1932188
    Closes-Bug: #1940436

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/cinder/+/805144

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/805144
Committed: https://opendev.org/openstack/cinder/commit/2425f3ef590524c357622748df79d87edebd17f3
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 2425f3ef590524c357622748df79d87edebd17f3
Author: Eric Harney <email address hidden>
Date: Wed Aug 18 09:19:58 2021 -0400

    LVM: Retry lvextend commands on code 139

    Retry lvextend commands upon segfault, similar to other
    LVM calls. This affects the volume extend path.

    Change-Id: I0c0cb5308246a3dce736eade67b40be063aa78bb
    Related-Bug: #1901783
    Related-Bug: #1932188
    Closes-Bug: #1940436
    (cherry picked from commit c4b8956763c2759bd2b9fe511efac5b3edfb226d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/cinder/+/805825

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/cinder/+/806008

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 18.1.0

This issue was fixed in the openstack/cinder 18.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/806008
Committed: https://opendev.org/openstack/cinder/commit/ee1674b5a806998514db83cb771c1781b0966716
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit ee1674b5a806998514db83cb771c1781b0966716
Author: Sofia Enriquez <email address hidden>
Date: Thu Jun 17 20:44:09 2021 +0000

    LVM: Retry lvdisplay and lvcreate calls on segfault

    This is a follow-up to I0a2420f3e4a411f5fa52ebe2d22859b138ef387f.

    LVM commands segfault occasionally, exiting with code 139.
    Change I6824ba4f introduced a workaround to retry the command
    when code 139 is returned, which generally works.

    Closes-Bug: #1932188
    Change-Id: I7c0f4d4ea7de635afede3c8514a5da9e85ad9b48
    (cherry picked from commit a8552ed2d69ae4c0f3ec97fa92395cd245a8d848)
    (cherry picked from commit 77d4aa6a898d25e7fb3ffccd1386b84d3f1ba1d3)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/805825
Committed: https://opendev.org/openstack/cinder/commit/a7c34584af53b01bd359e1df37e3704288ebe30e
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit a7c34584af53b01bd359e1df37e3704288ebe30e
Author: Eric Harney <email address hidden>
Date: Wed Aug 18 09:19:58 2021 -0400

    LVM: Retry lvextend commands on code 139

    Retry lvextend commands upon segfault, similar to other
    LVM calls. This affects the volume extend path.

    Change-Id: I0c0cb5308246a3dce736eade67b40be063aa78bb
    Related-Bug: #1901783
    Related-Bug: #1932188
    Closes-Bug: #1940436
    (cherry picked from commit c4b8956763c2759bd2b9fe511efac5b3edfb226d)
    (cherry picked from commit 2425f3ef590524c357622748df79d87edebd17f3)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 17.2.0

This issue was fixed in the openstack/cinder 17.2.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.