volume delete fails because cinder-rootwrap lvs fails with exit code 139

Bug #1901783 reported by melanie witt
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Critical
Gorka Eguileor
Victoria
Fix Committed
Critical
Unassigned
Wallaby
Fix Committed
Critical
Gorka Eguileor
Xena
Fix Committed
Critical
Unassigned

Bug Description

Seen in a recent gate failure [1] where a request to delete a volume fails:

    testtools.runtest.MultipleExceptions: (<class 'tempest.lib.exceptions.DeleteErrorException'>, Resource %(resource_id)s failed to delete and is in ERROR status
Details: Volume 241d7f3b-a717-43e4-a8d7-ca730cf714ba failed to delete and is in error_deleting status, <traceback object at 0x7fc55737a2c0>)

Here is the beginning of delete in c-vol and the trace:

Oct 27 22:05:04.658050 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: DEBUG cinder.coordination [None req-4ff1322e-a638-4372-a5f7-b0358e5cbc53 tempest-AttachVolumeShelveTestJSON-1486147230 None] Lock "/opt/stack/data/cinder/cinder-241d7f3b-a717-43e4-a8d7-ca730cf714ba-delete_volume" acquired by "delete_volume" :: waited 0.000s {{(pid=54880) _synchronized /opt/stack/cinder/cinder/coordination.py:146}}
Oct 27 22:05:04.698798 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: INFO cinder.volume.targets.iscsi [None req-4ff1322e-a638-4372-a5f7-b0358e5cbc53 tempest-AttachVolumeShelveTestJSON-1486147230 None] Skipping remove_export. No iscsi_target is presently exported for volume: 241d7f3b-a717-43e4-a8d7-ca730cf714ba

...

Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server [None req-4ff1322e-a638-4372-a5f7-b0358e5cbc53 tempest-AttachVolumeShelveTestJSON-1486147230 None] Exception during message handling: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C lvs --noheadings --unit=g -o vg_name,name,size --nosuffix stack-volumes-lvmdriver-1/volume-241d7f3b-a717-43e4-a8d7-ca730cf714ba
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: Exit code: 139
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: Stdout: ''
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: Stderr: ' WARNING: Failed to get udev device handler for device /dev/sdb1.\n /dev/sdb15: stat failed: No such file or directory\n Path /dev/sdb15 no longer valid for device(8,31)\n Path /dev/disk/by-partuuid/a2a6d359-10a3-458d-a110-cdc318fe4a21 no longer valid for device(8,31)\n Path /dev/disk/by-uuid/1C73-D179 no longer valid for device(8,31)\n /dev/disk/by-id/scsi-360000000000000000e00000000020001-part15: stat failed: No such file or directory\n Path /dev/disk/by-id/scsi-360000000000000000e00000000020001-part15 no longer valid for device(8,31)\n /dev/disk/by-id/wwn-0x60000000000000000e00000000020001-part15: stat failed: No such file or directory\n Path /dev/disk/by-id/wwn-0x60000000000000000e00000000020001-part15 no longer valid for device(8,31)\n /dev/disk/by-path/ip-158.69.74.86:3260-iscsi-iqn.2010-10.org.openstack:volume-3a7b250e-a0d3-41f1-8769-8f88df83c6aa-lun-1-part15: stat failed: No such file or directory\n Path /dev/disk/by-path/ip-158.69.74.86:3260-iscsi-iqn.2010-10.org.openstack:volume-3a7b250e-a0d3-41f1-8769-8f88df83c6aa-lun-1-part15 no longer valid for device(8,31)\n /dev/disk/by-path/ip-158.69.74.86:3260-iscsi-iqn.2010-10.org.openstack:volume-3a7b250e-a0d3-41f1-8769-8f88df83c6aa-lun-1-part15: stat failed: No such file or directory\n Path /dev/disk/by-path/ip-158.69.74.86:3260-iscsi-iqn.2010-10.org.openstack:volume-3a7b250e-a0d3-41f1-8769-8f88df83c6aa-lun-1-part15 no longer valid for device(8,31)\n Device open /dev/sdb 8:16 failed errno 2\n Device open /dev/sdb 8:16 failed errno 2\n Device open /dev/sdb1 8:17 failed errno 2\n Device open /dev/sdb1 8:17 failed errno 2\n WARNING: Scan ignoring device 8:16 with no paths.\n WARNING: Scan ignoring device 8:17 with no paths.\n'
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "<decorator-gen-754>", line 2, in delete_volume
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/coordination.py", line 151, in _synchronized
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server return f(*a, **k)
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "<decorator-gen-753>", line 2, in delete_volume
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/objects/cleanable.py", line 212, in wrapper
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server result = f(*args, **kwargs)
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/volume/manager.py", line 915, in delete_volume
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server self._clear_db(context, is_migrating_dest, volume,
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server self.force_reraise()
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server raise value
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/volume/manager.py", line 898, in delete_volume
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server self.driver.delete_volume(volume)
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/volume/drivers/lvm.py", line 460, in delete_volume
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server if self._volume_not_present(volume['name']):
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/volume/drivers/lvm.py", line 139, in _volume_not_present
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server return self.vg.get_volume(volume_name) is None
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/brick/local_dev/lvm.py", line 335, in get_volume
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server ref_list = self.get_volumes(name)
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/brick/local_dev/lvm.py", line 325, in get_volumes
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server return self.get_lv_info(self._root_helper,
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/brick/local_dev/lvm.py", line 308, in get_lv_info
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server out = None
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server self.force_reraise()
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server raise value
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/brick/local_dev/lvm.py", line 298, in get_lv_info
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server (out, _err) = putils.execute(*cmd,
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_concurrency/processutils.py", line 431, in execute
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server raise ProcessExecutionError(exit_code=_returncode,
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C lvs --noheadings --unit=g -o vg_name,name,size --nosuffix stack-volumes-lvmdriver-1/volume-241d7f3b-a717-43e4-a8d7-ca730cf714ba
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server Exit code: 139
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server Stdout: ''
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server Stderr: ' WARNING: Failed to get udev device handler for device /dev/sdb1.\n /dev/sdb15: stat failed: No such file or directory\n Path /dev/sdb15 no longer valid for device(8,31)\n Path /dev/disk/by-partuuid/a2a6d359-10a3-458d-a110-cdc318fe4a21 no longer valid for device(8,31)\n Path /dev/disk/by-uuid/1C73-D179 no longer valid for device(8,31)\n /dev/disk/by-id/scsi-360000000000000000e00000000020001-part15: stat failed: No such file or directory\n Path /dev/disk/by-id/scsi-360000000000000000e00000000020001-part15 no longer valid for device(8,31)\n /dev/disk/by-id/wwn-0x60000000000000000e00000000020001-part15: stat failed: No such file or directory\n Path /dev/disk/by-id/wwn-0x60000000000000000e00000000020001-part15 no longer valid for device(8,31)\n /dev/disk/by-path/ip-158.69.74.86:3260-iscsi-iqn.2010-10.org.openstack:volume-3a7b250e-a0d3-41f1-8769-8f88df83c6aa-lun-1-part15: stat failed: No such file or directory\n Path /dev/disk/by-path/ip-158.69.74.86:3260-iscsi-iqn.2010-10.org.openstack:volume-3a7b250e-a0d3-41f1-8769-8f88df83c6aa-lun-1-part15 no longer valid for device(8,31)\n /dev/disk/by-path/ip-158.69.74.86:3260-iscsi-iqn.2010-10.org.openstack:volume-3a7b250e-a0d3-41f1-8769-8f88df83c6aa-lun-1-part15: stat failed: No such file or directory\n Path /dev/disk/by-path/ip-158.69.74.86:3260-iscsi-iqn.2010-10.org.openstack:volume-3a7b250e-a0d3-41f1-8769-8f88df83c6aa-lun-1-part15 no longer valid for device(8,31)\n Device open /dev/sdb 8:16 failed errno 2\n Device open /dev/sdb 8:16 failed errno 2\n Device open /dev/sdb1 8:17 failed errno 2\n Device open /dev/sdb1 8:17 failed errno 2\n WARNING: Scan ignoring device 8:16 with no paths.\n WARNING: Scan ignoring device 8:17 with no paths.\n'
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server

[1] https://zuul.opendev.org/t/openstack/build/84676423fdda4352a403f77817b2379a/log/compute1/logs/screen-c-vol.txt#5966

tags: added: rootwrap
Revision history for this message
Eric Harney (eharney) wrote :

This seems to indicate that "lvs" (or cinder-rootwrap?) is segfaulting:

Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C lvs --noheadings --unit=g -o vg_name,name,size --nosuffix stack-volumes-lvmdriver-1/volume-241d7f3b-a717-43e4-a8d7-ca730cf714ba
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server Exit code: 139

We should confirm in syslog and see if there's an Ubuntu issue to chase there.

Revision history for this message
Eric Harney (eharney) wrote :

From syslog on the same job:
Oct 27 22:05:05 ubuntu-focal-ovh-bhs1-0021096768 kernel: lvs[72130]: segfault at 810 ip 00007f11332f71ce sp 00007ffc23711648 error 4 in libc-2.31.so[7f113318f000+178000]

Just before that, there are some odd GPT and I/O errors.

https://zuul.opendev.org/t/openstack/build/84676423fdda4352a403f77817b2379a/log/compute1/logs/syslog.txt#2998

Changed in cinder:
status: New → Triaged
Revision history for this message
Eric Harney (eharney) wrote :
Revision history for this message
melanie witt (melwitt) wrote :

I have tried to put together a unique fingerprint for a e-r query for this but the lvs error code 139 does not always correspond to a job failure. My guess is that there are a number of operations where perhaps lvs is retried and eventually succeeds when it initially fails with error 139, so oftentimes jobs can result in SUCCESS even when they encounter lvs fails with error 139.

Revision history for this message
melanie witt (melwitt) wrote :

Even the segfault itself in syslog is not a guarantee of job failure. Happens for all node providers and happens on both ubuntu-focal and fedora-32 [1]:

2020-12-17T15:28:29.000-08:00 lvs[102392]: segfault at 800 ip 00007fa46b0b3860 sp 00007fff518238e8 error 4 in libc-2.31.so[7fa46af... SUCCESS ubuntu-focal rax-dfw
2020-12-17T15:18:20.000-08:00 lvs[150462]: segfault at 810 ip 00007f923e699a50 sp 00007ffda8a5ec38 error 4 in libc-2.31.so[7f923e5... SUCCESS fedora-32 rax-dfw
2020-12-17T14:35:05.000-08:00 lvs[156376]: segfault at 800 ip 00007f2ae51ac860 sp 00007ffd952947a8 error 4 in libc-2.31.so[7f2ae50... FAILURE ubuntu-focal rax-iad
2020-12-17T13:27:20.000-08:00 lvs[141354]: segfault at 800 ip 00007fb785c9b01e sp 00007ffedf1548f8 error 4 in libc-2.31.so[7fb785b... FAILURE fedora-32 ovh-bhs1
2020-12-17T11:47:10.000-08:00 lvs[125849]: segfault at 800 ip 00007fd9fe98c860 sp 00007ffdd7859fa8 error 4 in libc-2.31.so[7fd9fe8... FAILURE ubuntu-focal rax-dfw

[1] http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22lvs%5C%22%20AND%20message%3A%5C%22segfault%5C%22%20AND%20filename%3A%5C%22controller%2Flogs%2Fsyslog.txt%5C%22&from=7d

Revision history for this message
Lee Yarwood (lyarwood) wrote :
Download full text (6.4 KiB)

https://zuul.opendev.org/t/openstack/build/db6362e6c2c34d8f9f06e543430d7beb/log/controller/logs/syslog.txt

Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: sda: sda1 sda15
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: blk_update_request: I/O error, dev sda, sector 18552 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: blk_update_request: I/O error, dev sda, sector 18552 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: Buffer I/O error on dev sda1, logical block 120, async page read
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: blk_update_request: I/O error, dev sda, sector 18553 op 0x0:(READ) flags 0x0 phys_seg 7 prio class 0
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: Buffer I/O error on dev sda1, logical block 121, async page read
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: Buffer I/O error on dev sda1, logical block 122, async page read
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: Buffer I/O error on dev sda1, logical block 123, async page read
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: Buffer I/O error on dev sda1, logical block 124, async page read
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: Buffer I/O error on dev sda1, logical block 125, async page read
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: Buffer I/O error on dev sda1, logical block 126, async page read
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: Buffer I/O error on dev sda1, logical block 127, async page read
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: sd 2:0:0:1: [sda] Synchronizing SCSI cache
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: lvs[130388]: segfault at 800 ip 00007f01a0fa2555 sp 00007ffcf4d396b8 error 4 in libc-2.31.so[7f01a0f0d000+178000]

https://zuul.opendev.org/t/openstack/build/db6362e6c2c34d8f9f06e543430d7beb/log/controller/logs/screen-c-vol.txt

Feb 18 07:54:14.597590 ubuntu-focal-rax-iad-0023075527 cinder-volume[85652]: ERROR cinder.volume.manager Traceback (most recent call last):
Feb 18 07:54:14.597590 ubuntu-focal-rax-iad-0023075527 cinder-volume[85652]: ERROR cinder.volume.manager File "/usr/local/lib/python3.8/dist-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
Feb 18 07:54:14.597590 ubuntu-focal-rax-iad-0023075527 cinder-volume[85652]: ERROR cinder.volume.manager result = task.execute(**arguments)
Feb 18 07:54:14.597590 ubuntu-focal-rax-iad-0023075527 cinder-volume[85652]: ERROR cinder.volume.manager File "/opt/stack/cinder/cinder/volume/flows/manager/manage_existing_snapshot.py", line 118, in execute
Feb 18 07:54:14.597590 ubuntu-focal-rax-iad-0023075527 cinder-volume[85652]: ERROR cinder.volume.manager size = self.driver.manage_existing_snapshot_get_size(
Feb 18 07:54:14.597590 ubuntu-focal-rax-iad-0023075527 cinder-volume[85652]: ERROR cinder.volume.manager File "/opt/stack/cinder/cinder/volume/drivers/lvm.py", line 662, in manage_existing_snapshot_get_size
Feb 18 07:54:14.597590 ubuntu-focal-rax-iad-0023075527 cinder-volume[85652]: ERROR cinder.volume.manage...

Read more...

Revision history for this message
Lee Yarwood (lyarwood) wrote :

Weird, I wonder if this has anything to do with the underlying PV (a loopback device) filling up?

Revision history for this message
Lee Yarwood (lyarwood) wrote :
Download full text (3.5 KiB)

FWIW we are also seeing this in the nova-lvm job where n-cpu calls lvs via privsep:

https://a59f59e90e7507f14599-2cecd17ff9ea5f4d02c12f6d5fc5aedd.ssl.cf1.rackcdn.com/776681/7/check/nova-lvm/a19ff22/controller/logs/screen-n-cpu.txt

Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1525, in _cleanup
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server self._cleanup_lvm(instance, block_device_info)
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1615, in _cleanup_lvm
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server disks = self._lvm_disks(instance)
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1633, in _lvm_disks
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server logical_volumes = lvm.list_volumes(vg)
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/storage/lvm.py", line 111, in list_volumes
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server out, err = nova.privsep.fs.lvlist(vg)
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_privsep/priv_context.py", line 247, in _wrap
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server return self.channel.remote_call(name, args, kwargs)
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_privsep/daemon.py", line 224, in remote_call
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server raise exc_type(*result[2])
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server Command: lvs --noheadings -o lv_name /dev/stack-volumes-default
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server Exit code: -11
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server Stdout: ''
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server Stderr: ' WARNING: Failed to get udev device handler for device /dev/sda1.\n /dev/sda15: stat failed: No such file or directory\n Path /dev/sda15 no longer valid for de...

Read more...

Revision history for this message
Stephen Finucane (stephenfinucane) wrote :
Download full text (10.5 KiB)

We're also seeing this fail in nova, since that has calls (via privsep) to 'lvs'.

https://a59f59e90e7507f14599-2cecd17ff9ea5f4d02c12f6d5fc5aedd.ssl.cf1.rackcdn.com/776681/7/check/nova-lvm/a19ff22/controller/logs/screen-n-cpu.txt

  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR nova.compute.manager [None req-4c1200aa-cc35-4249-a6ba-538d7a763b07 tempest-SecurityGroupsTestJSON-739885543 tempest-SecurityGroupsTestJSON-739885543-project] [instance: 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c43] Setting instance vm_state to ERROR: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: Command: lvs --noheadings -o lv_name /dev/stack-volumes-default
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: Exit code: -11
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: Stdout: ''
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: Stderr: ' WARNING: Failed to get udev device handler for device /dev/sda1.\n /dev/sda15: stat failed: No such file or directory\n Path /dev/sda15 no longer valid for device(8,15)\n /dev/sda15: stat failed: No such file or directory\n Path /dev/sda15 no longer valid for device(8,15)\n Device open /dev/sda 8:0 failed errno 2\n Device open /dev/sda 8:0 failed errno 2\n Device open /dev/sda1 8:1 failed errno 2\n Device open /dev/sda1 8:1 failed errno 2\n WARNING: Scan ignoring device 8:0 with no paths.\n WARNING: Scan ignoring device 8:1 with no paths.\n'
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR nova.compute.manager [instance: 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c43] Traceback (most recent call last):
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR nova.compute.manager [instance: 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c43] File "/opt/stack/nova/nova/compute/manager.py", line 3040, in do_terminate_instance
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR nova.compute.manager [instance: 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c43] self._delete_instance(context, instance, bdms)
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR nova.compute.manager [instance: 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c43] File "/opt/stack/nova/nova/compute/manager.py", line 2975, in _delete_instance
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR nova.compute.manager [instance: 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c43] self._shutdown_instance(context, instance, bdms)
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR nova.compute.manager [instance: 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c43] File "/opt/stack/nova/nova/compute/manager.py", line 2867, in _shutdown_instance
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR nova.compute.manager [instance: 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c43] self._try_deallocate_network(context, instance,
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 n...

Revision history for this message
Gorka Eguileor (gorka) wrote :

According to the LVM code [1] this seems to be caused by a race condition between LVM and udev:

    /*
     * FIXME: remove this, the theory for this retry is that
     * there may be a udev race that we can sometimes mask by
     * retrying. This is here until we can figure out if it's
     * needed and if so fix the real problem.
     */

[1]: https://github.com/lvmteam/lvm2/blob/b84a9927b78727efffbb257a61e9e95a648cdfab/lib/label/label.c#L562-L567

tags: added: lvm
Revision history for this message
Gorka Eguileor (gorka) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/783660

Changed in cinder:
assignee: nobody → Gorka Eguileor (gorka)
status: Triaged → In Progress
Changed in cinder:
importance: Undecided → High
importance: High → Critical
Revision history for this message
Sofia Enriquez (lsofia-enriquez) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 18.0.0.0rc2

This issue was fixed in the openstack/cinder 18.0.0.0rc2 release candidate.

Revision history for this message
Sofia Enriquez (lsofia-enriquez) wrote :
Revision history for this message
Lee Yarwood (lyarwood) wrote :

I'm still seeing this on master, for example:

https://zuul.opendev.org/t/openstack/build/ebdd70c78b97462a933f58fc31adff05/log/controller/logs/screen-c-vol.txt#6684

May 20 20:08:52.251811 ubuntu-focal-inap-mtl01-0024757875 cinder-volume[194862]: Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C lvdisplay --noheading -C -o Attr stack-volumes-lvmdriver-1/volume-e2706dc2-1f26-4296-89f8-3944eed092f0
May 20 20:08:52.251811 ubuntu-focal-inap-mtl01-0024757875 cinder-volume[194862]: Exit code: 139
May 20 20:08:52.251811 ubuntu-focal-inap-mtl01-0024757875 cinder-volume[194862]: Stdout: ''
May 20 20:08:52.251811 ubuntu-focal-inap-mtl01-0024757875 cinder-volume[194862]: Stderr: ' WARNING: Failed to get udev device handler for device /dev/sdb1.\n /dev/sdb15: stat failed: No such file or directory\n Path /dev/sdb15 no longer valid for device(8,31)\n /dev/sdb15: stat failed: No such file or directory\n Path /dev/sdb15 no longer valid for device(8,31)\n Device open /dev/sdb 8:16 failed errno 2\n Device open /dev/sdb 8:16 failed errno 2\n Device open /dev/sdb1 8:17 failed errno 2\n Device open /dev/sdb1 8:17 failed errno 2\n WARNING: Scan ignoring device 8:16 with no paths.\n WARNING: Scan ignoring device 8:17 with no paths.\n'

Looks like this is also required around all of the lvdisplay commands?

https://opendev.org/openstack/cinder/src/branch/master/cinder/brick/local_dev/lvm.py#L773

Revision history for this message
melanie witt (melwitt) wrote :

Just saw this on stable/victoria tempest-integrated-compute [1] with the lvdisplay command:

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/api/volume/admin/test_volumes_actions.py", line 61, in test_volume_force_delete_when_volume_is_creating
    self._create_reset_and_force_delete_temp_volume('creating')
  File "/opt/stack/tempest/tempest/api/volume/admin/test_volumes_actions.py", line 39, in _create_reset_and_force_delete_temp_volume
    self.volumes_client.wait_for_resource_deletion(temp_volume['id'])
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 905, in wait_for_resource_deletion
    if self.is_resource_deleted(id, *args, **kwargs):
  File "/opt/stack/tempest/tempest/lib/services/volume/v3/volumes_client.py", line 227, in is_resource_deleted
    raise lib_exc.DeleteErrorException(
tempest.lib.exceptions.DeleteErrorException: Resource %(resource_id)s failed to delete and is in ERROR status
Details: Volume 231e6cea-69a9-4bac-b6c4-7ed21465178d failed to delete and is in error_deleting status

screen-c-vol.txt:

Jun 15 21:11:12.680660 ubuntu-focal-limestone-regionone-0025129752 cinder-volume[81073]: ERROR oslo_messaging.rpc.server Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C lvdisplay --noheading -C -o Attr stack-volumes-lvmdriver-1/volume-231e6cea-69a9-4bac-b6c4-7ed21465178d
Jun 15 21:11:12.680660 ubuntu-focal-limestone-regionone-0025129752 cinder-volume[81073]: ERROR oslo_messaging.rpc.server Exit code: 139

[1] https://zuul.opendev.org/t/openstack/build/054fe45381564005a547877ee219fce9/log/controller/logs/screen-c-vol.txt#5807

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/796889

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/cinder/+/797161

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/796889
Committed: https://opendev.org/openstack/cinder/commit/2da7d42b91fd0015f8bfb2b67067eaf5768691ca
Submitter: "Zuul (22348)"
Branch: master

commit 2da7d42b91fd0015f8bfb2b67067eaf5768691ca
Author: Eric Harney <email address hidden>
Date: Sat Jul 17 15:39:31 2021 +0000

    Retry "lvs" call on segfault for _get_thin_pool_free_space

    This is a follow-up to I6824ba4f.

    LVM commands segfault occasionally, exiting with code 139.
    Change I6824ba4f introduced a workaround to retry the command
    when code 139 is returned, which generally works. This expands
    that retry to the case where thin pool space is queried, which
    currently results in the LVM driver reporting no free space to
    the scheduler.

    Further work is needed to expand this to other LVM calls, but
    this patch is narrow in scope to target a particular gate
    failure.

    Related-Bug: #1901783
    Partial-Bug: #1932188
    Closes-Bug: #1932287
    Change-Id: I0a2420f3e4a411f5fa52ebe2d22859b138ef387f

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/797161
Committed: https://opendev.org/openstack/cinder/commit/9ad46817e42b577ae2625ce90a32ee96eb00f427
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 9ad46817e42b577ae2625ce90a32ee96eb00f427
Author: Eric Harney <email address hidden>
Date: Sat Jul 17 15:39:31 2021 +0000

    Retry "lvs" call on segfault for _get_thin_pool_free_space

    This is a follow-up to I6824ba4f.

    LVM commands segfault occasionally, exiting with code 139.
    Change I6824ba4f introduced a workaround to retry the command
    when code 139 is returned, which generally works. This expands
    that retry to the case where thin pool space is queried, which
    currently results in the LVM driver reporting no free space to
    the scheduler.

    Further work is needed to expand this to other LVM calls, but
    this patch is narrow in scope to target a particular gate
    failure.

    Related-Bug: #1901783
    Partial-Bug: #1932188
    Closes-Bug: #1932287
    Change-Id: I0a2420f3e4a411f5fa52ebe2d22859b138ef387f
    (cherry picked from commit 410306efb82760191e76b4d40817f38842d87eb0)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/cinder/+/798356

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/798356
Committed: https://opendev.org/openstack/cinder/commit/b6d575b34b036c793727db081668156da1884a2b
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit b6d575b34b036c793727db081668156da1884a2b
Author: Eric Harney <email address hidden>
Date: Sat Jul 17 15:39:31 2021 +0000

    Retry "lvs" call on segfault for _get_thin_pool_free_space

    This is a follow-up to I6824ba4f.

    LVM commands segfault occasionally, exiting with code 139.
    Change I6824ba4f introduced a workaround to retry the command
    when code 139 is returned, which generally works. This expands
    that retry to the case where thin pool space is queried, which
    currently results in the LVM driver reporting no free space to
    the scheduler.

    Further work is needed to expand this to other LVM calls, but
    this patch is narrow in scope to target a particular gate
    failure.

    Related-Bug: #1901783
    Partial-Bug: #1932188
    Closes-Bug: #1932287
    Change-Id: I0a2420f3e4a411f5fa52ebe2d22859b138ef387f
    (cherry picked from commit 410306efb82760191e76b4d40817f38842d87eb0)
    (cherry picked from commit 9ad46817e42b577ae2625ce90a32ee96eb00f427)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 19.0.0.0b1

This issue was fixed in the openstack/cinder 19.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/805026

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/805026
Committed: https://opendev.org/openstack/cinder/commit/c4b8956763c2759bd2b9fe511efac5b3edfb226d
Submitter: "Zuul (22348)"
Branch: master

commit c4b8956763c2759bd2b9fe511efac5b3edfb226d
Author: Eric Harney <email address hidden>
Date: Wed Aug 18 09:19:58 2021 -0400

    LVM: Retry lvextend commands on code 139

    Retry lvextend commands upon segfault, similar to other
    LVM calls. This affects the volume extend path.

    Change-Id: I0c0cb5308246a3dce736eade67b40be063aa78bb
    Related-Bug: #1901783
    Related-Bug: #1932188
    Closes-Bug: #1940436

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/cinder/+/805144

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-brick 5.0.0

This issue was fixed in the openstack/os-brick 5.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/805144
Committed: https://opendev.org/openstack/cinder/commit/2425f3ef590524c357622748df79d87edebd17f3
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 2425f3ef590524c357622748df79d87edebd17f3
Author: Eric Harney <email address hidden>
Date: Wed Aug 18 09:19:58 2021 -0400

    LVM: Retry lvextend commands on code 139

    Retry lvextend commands upon segfault, similar to other
    LVM calls. This affects the volume extend path.

    Change-Id: I0c0cb5308246a3dce736eade67b40be063aa78bb
    Related-Bug: #1901783
    Related-Bug: #1932188
    Closes-Bug: #1940436
    (cherry picked from commit c4b8956763c2759bd2b9fe511efac5b3edfb226d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/cinder/+/805825

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/805825
Committed: https://opendev.org/openstack/cinder/commit/a7c34584af53b01bd359e1df37e3704288ebe30e
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit a7c34584af53b01bd359e1df37e3704288ebe30e
Author: Eric Harney <email address hidden>
Date: Wed Aug 18 09:19:58 2021 -0400

    LVM: Retry lvextend commands on code 139

    Retry lvextend commands upon segfault, similar to other
    LVM calls. This affects the volume extend path.

    Change-Id: I0c0cb5308246a3dce736eade67b40be063aa78bb
    Related-Bug: #1901783
    Related-Bug: #1932188
    Closes-Bug: #1940436
    (cherry picked from commit c4b8956763c2759bd2b9fe511efac5b3edfb226d)
    (cherry picked from commit 2425f3ef590524c357622748df79d87edebd17f3)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 17.2.0

This issue was fixed in the openstack/cinder 17.2.0 release.

Eric Harney (eharney)
Changed in cinder:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.