volume delete fails because cinder-rootwrap lvs fails with exit code 139

Bug #1901783 reported by melanie witt on 2020-10-27
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Status tracked in Xena
Victoria
Critical
Unassigned
Wallaby
Critical
Gorka Eguileor
Xena
Critical
Unassigned

Bug Description

Seen in a recent gate failure [1] where a request to delete a volume fails:

    testtools.runtest.MultipleExceptions: (<class 'tempest.lib.exceptions.DeleteErrorException'>, Resource %(resource_id)s failed to delete and is in ERROR status
Details: Volume 241d7f3b-a717-43e4-a8d7-ca730cf714ba failed to delete and is in error_deleting status, <traceback object at 0x7fc55737a2c0>)

Here is the beginning of delete in c-vol and the trace:

Oct 27 22:05:04.658050 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: DEBUG cinder.coordination [None req-4ff1322e-a638-4372-a5f7-b0358e5cbc53 tempest-AttachVolumeShelveTestJSON-1486147230 None] Lock "/opt/stack/data/cinder/cinder-241d7f3b-a717-43e4-a8d7-ca730cf714ba-delete_volume" acquired by "delete_volume" :: waited 0.000s {{(pid=54880) _synchronized /opt/stack/cinder/cinder/coordination.py:146}}
Oct 27 22:05:04.698798 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: INFO cinder.volume.targets.iscsi [None req-4ff1322e-a638-4372-a5f7-b0358e5cbc53 tempest-AttachVolumeShelveTestJSON-1486147230 None] Skipping remove_export. No iscsi_target is presently exported for volume: 241d7f3b-a717-43e4-a8d7-ca730cf714ba

...

Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server [None req-4ff1322e-a638-4372-a5f7-b0358e5cbc53 tempest-AttachVolumeShelveTestJSON-1486147230 None] Exception during message handling: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C lvs --noheadings --unit=g -o vg_name,name,size --nosuffix stack-volumes-lvmdriver-1/volume-241d7f3b-a717-43e4-a8d7-ca730cf714ba
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: Exit code: 139
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: Stdout: ''
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: Stderr: ' WARNING: Failed to get udev device handler for device /dev/sdb1.\n /dev/sdb15: stat failed: No such file or directory\n Path /dev/sdb15 no longer valid for device(8,31)\n Path /dev/disk/by-partuuid/a2a6d359-10a3-458d-a110-cdc318fe4a21 no longer valid for device(8,31)\n Path /dev/disk/by-uuid/1C73-D179 no longer valid for device(8,31)\n /dev/disk/by-id/scsi-360000000000000000e00000000020001-part15: stat failed: No such file or directory\n Path /dev/disk/by-id/scsi-360000000000000000e00000000020001-part15 no longer valid for device(8,31)\n /dev/disk/by-id/wwn-0x60000000000000000e00000000020001-part15: stat failed: No such file or directory\n Path /dev/disk/by-id/wwn-0x60000000000000000e00000000020001-part15 no longer valid for device(8,31)\n /dev/disk/by-path/ip-158.69.74.86:3260-iscsi-iqn.2010-10.org.openstack:volume-3a7b250e-a0d3-41f1-8769-8f88df83c6aa-lun-1-part15: stat failed: No such file or directory\n Path /dev/disk/by-path/ip-158.69.74.86:3260-iscsi-iqn.2010-10.org.openstack:volume-3a7b250e-a0d3-41f1-8769-8f88df83c6aa-lun-1-part15 no longer valid for device(8,31)\n /dev/disk/by-path/ip-158.69.74.86:3260-iscsi-iqn.2010-10.org.openstack:volume-3a7b250e-a0d3-41f1-8769-8f88df83c6aa-lun-1-part15: stat failed: No such file or directory\n Path /dev/disk/by-path/ip-158.69.74.86:3260-iscsi-iqn.2010-10.org.openstack:volume-3a7b250e-a0d3-41f1-8769-8f88df83c6aa-lun-1-part15 no longer valid for device(8,31)\n Device open /dev/sdb 8:16 failed errno 2\n Device open /dev/sdb 8:16 failed errno 2\n Device open /dev/sdb1 8:17 failed errno 2\n Device open /dev/sdb1 8:17 failed errno 2\n WARNING: Scan ignoring device 8:16 with no paths.\n WARNING: Scan ignoring device 8:17 with no paths.\n'
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "<decorator-gen-754>", line 2, in delete_volume
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/coordination.py", line 151, in _synchronized
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server return f(*a, **k)
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "<decorator-gen-753>", line 2, in delete_volume
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/objects/cleanable.py", line 212, in wrapper
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server result = f(*args, **kwargs)
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/volume/manager.py", line 915, in delete_volume
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server self._clear_db(context, is_migrating_dest, volume,
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server self.force_reraise()
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server raise value
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/volume/manager.py", line 898, in delete_volume
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server self.driver.delete_volume(volume)
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/volume/drivers/lvm.py", line 460, in delete_volume
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server if self._volume_not_present(volume['name']):
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/volume/drivers/lvm.py", line 139, in _volume_not_present
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server return self.vg.get_volume(volume_name) is None
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/brick/local_dev/lvm.py", line 335, in get_volume
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server ref_list = self.get_volumes(name)
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/brick/local_dev/lvm.py", line 325, in get_volumes
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server return self.get_lv_info(self._root_helper,
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/brick/local_dev/lvm.py", line 308, in get_lv_info
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server out = None
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server self.force_reraise()
Oct 27 22:05:05.251873 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server raise value
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/brick/local_dev/lvm.py", line 298, in get_lv_info
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server (out, _err) = putils.execute(*cmd,
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_concurrency/processutils.py", line 431, in execute
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server raise ProcessExecutionError(exit_code=_returncode,
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C lvs --noheadings --unit=g -o vg_name,name,size --nosuffix stack-volumes-lvmdriver-1/volume-241d7f3b-a717-43e4-a8d7-ca730cf714ba
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server Exit code: 139
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server Stdout: ''
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server Stderr: ' WARNING: Failed to get udev device handler for device /dev/sdb1.\n /dev/sdb15: stat failed: No such file or directory\n Path /dev/sdb15 no longer valid for device(8,31)\n Path /dev/disk/by-partuuid/a2a6d359-10a3-458d-a110-cdc318fe4a21 no longer valid for device(8,31)\n Path /dev/disk/by-uuid/1C73-D179 no longer valid for device(8,31)\n /dev/disk/by-id/scsi-360000000000000000e00000000020001-part15: stat failed: No such file or directory\n Path /dev/disk/by-id/scsi-360000000000000000e00000000020001-part15 no longer valid for device(8,31)\n /dev/disk/by-id/wwn-0x60000000000000000e00000000020001-part15: stat failed: No such file or directory\n Path /dev/disk/by-id/wwn-0x60000000000000000e00000000020001-part15 no longer valid for device(8,31)\n /dev/disk/by-path/ip-158.69.74.86:3260-iscsi-iqn.2010-10.org.openstack:volume-3a7b250e-a0d3-41f1-8769-8f88df83c6aa-lun-1-part15: stat failed: No such file or directory\n Path /dev/disk/by-path/ip-158.69.74.86:3260-iscsi-iqn.2010-10.org.openstack:volume-3a7b250e-a0d3-41f1-8769-8f88df83c6aa-lun-1-part15 no longer valid for device(8,31)\n /dev/disk/by-path/ip-158.69.74.86:3260-iscsi-iqn.2010-10.org.openstack:volume-3a7b250e-a0d3-41f1-8769-8f88df83c6aa-lun-1-part15: stat failed: No such file or directory\n Path /dev/disk/by-path/ip-158.69.74.86:3260-iscsi-iqn.2010-10.org.openstack:volume-3a7b250e-a0d3-41f1-8769-8f88df83c6aa-lun-1-part15 no longer valid for device(8,31)\n Device open /dev/sdb 8:16 failed errno 2\n Device open /dev/sdb 8:16 failed errno 2\n Device open /dev/sdb1 8:17 failed errno 2\n Device open /dev/sdb1 8:17 failed errno 2\n WARNING: Scan ignoring device 8:16 with no paths.\n WARNING: Scan ignoring device 8:17 with no paths.\n'
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server

[1] https://zuul.opendev.org/t/openstack/build/84676423fdda4352a403f77817b2379a/log/compute1/logs/screen-c-vol.txt#5966

tags: added: rootwrap
Revision history for this message
Eric Harney (eharney) wrote :

This seems to indicate that "lvs" (or cinder-rootwrap?) is segfaulting:

Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C lvs --noheadings --unit=g -o vg_name,name,size --nosuffix stack-volumes-lvmdriver-1/volume-241d7f3b-a717-43e4-a8d7-ca730cf714ba
Oct 27 22:05:05.253823 ubuntu-focal-ovh-bhs1-0021096768 cinder-volume[54880]: ERROR oslo_messaging.rpc.server Exit code: 139

We should confirm in syslog and see if there's an Ubuntu issue to chase there.

Revision history for this message
Eric Harney (eharney) wrote :

From syslog on the same job:
Oct 27 22:05:05 ubuntu-focal-ovh-bhs1-0021096768 kernel: lvs[72130]: segfault at 810 ip 00007f11332f71ce sp 00007ffc23711648 error 4 in libc-2.31.so[7f113318f000+178000]

Just before that, there are some odd GPT and I/O errors.

https://zuul.opendev.org/t/openstack/build/84676423fdda4352a403f77817b2379a/log/compute1/logs/syslog.txt#2998

Changed in cinder:
status: New → Triaged
Revision history for this message
Eric Harney (eharney) wrote :
Revision history for this message
melanie witt (melwitt) wrote :

I have tried to put together a unique fingerprint for a e-r query for this but the lvs error code 139 does not always correspond to a job failure. My guess is that there are a number of operations where perhaps lvs is retried and eventually succeeds when it initially fails with error 139, so oftentimes jobs can result in SUCCESS even when they encounter lvs fails with error 139.

Revision history for this message
melanie witt (melwitt) wrote :

Even the segfault itself in syslog is not a guarantee of job failure. Happens for all node providers and happens on both ubuntu-focal and fedora-32 [1]:

2020-12-17T15:28:29.000-08:00 lvs[102392]: segfault at 800 ip 00007fa46b0b3860 sp 00007fff518238e8 error 4 in libc-2.31.so[7fa46af... SUCCESS ubuntu-focal rax-dfw
2020-12-17T15:18:20.000-08:00 lvs[150462]: segfault at 810 ip 00007f923e699a50 sp 00007ffda8a5ec38 error 4 in libc-2.31.so[7f923e5... SUCCESS fedora-32 rax-dfw
2020-12-17T14:35:05.000-08:00 lvs[156376]: segfault at 800 ip 00007f2ae51ac860 sp 00007ffd952947a8 error 4 in libc-2.31.so[7f2ae50... FAILURE ubuntu-focal rax-iad
2020-12-17T13:27:20.000-08:00 lvs[141354]: segfault at 800 ip 00007fb785c9b01e sp 00007ffedf1548f8 error 4 in libc-2.31.so[7fb785b... FAILURE fedora-32 ovh-bhs1
2020-12-17T11:47:10.000-08:00 lvs[125849]: segfault at 800 ip 00007fd9fe98c860 sp 00007ffdd7859fa8 error 4 in libc-2.31.so[7fd9fe8... FAILURE ubuntu-focal rax-dfw

[1] http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22lvs%5C%22%20AND%20message%3A%5C%22segfault%5C%22%20AND%20filename%3A%5C%22controller%2Flogs%2Fsyslog.txt%5C%22&from=7d

Revision history for this message
Lee Yarwood (lyarwood) wrote :
Download full text (6.4 KiB)

https://zuul.opendev.org/t/openstack/build/db6362e6c2c34d8f9f06e543430d7beb/log/controller/logs/syslog.txt

Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: sda: sda1 sda15
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: blk_update_request: I/O error, dev sda, sector 18552 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: blk_update_request: I/O error, dev sda, sector 18552 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: Buffer I/O error on dev sda1, logical block 120, async page read
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: blk_update_request: I/O error, dev sda, sector 18553 op 0x0:(READ) flags 0x0 phys_seg 7 prio class 0
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: Buffer I/O error on dev sda1, logical block 121, async page read
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: Buffer I/O error on dev sda1, logical block 122, async page read
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: Buffer I/O error on dev sda1, logical block 123, async page read
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: Buffer I/O error on dev sda1, logical block 124, async page read
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: Buffer I/O error on dev sda1, logical block 125, async page read
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: Buffer I/O error on dev sda1, logical block 126, async page read
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: Buffer I/O error on dev sda1, logical block 127, async page read
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: sd 2:0:0:1: [sda] Synchronizing SCSI cache
Feb 18 07:54:14 ubuntu-focal-rax-iad-0023075527 kernel: lvs[130388]: segfault at 800 ip 00007f01a0fa2555 sp 00007ffcf4d396b8 error 4 in libc-2.31.so[7f01a0f0d000+178000]

https://zuul.opendev.org/t/openstack/build/db6362e6c2c34d8f9f06e543430d7beb/log/controller/logs/screen-c-vol.txt

Feb 18 07:54:14.597590 ubuntu-focal-rax-iad-0023075527 cinder-volume[85652]: ERROR cinder.volume.manager Traceback (most recent call last):
Feb 18 07:54:14.597590 ubuntu-focal-rax-iad-0023075527 cinder-volume[85652]: ERROR cinder.volume.manager File "/usr/local/lib/python3.8/dist-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
Feb 18 07:54:14.597590 ubuntu-focal-rax-iad-0023075527 cinder-volume[85652]: ERROR cinder.volume.manager result = task.execute(**arguments)
Feb 18 07:54:14.597590 ubuntu-focal-rax-iad-0023075527 cinder-volume[85652]: ERROR cinder.volume.manager File "/opt/stack/cinder/cinder/volume/flows/manager/manage_existing_snapshot.py", line 118, in execute
Feb 18 07:54:14.597590 ubuntu-focal-rax-iad-0023075527 cinder-volume[85652]: ERROR cinder.volume.manager size = self.driver.manage_existing_snapshot_get_size(
Feb 18 07:54:14.597590 ubuntu-focal-rax-iad-0023075527 cinder-volume[85652]: ERROR cinder.volume.manager File "/opt/stack/cinder/cinder/volume/drivers/lvm.py", line 662, in manage_existing_snapshot_get_size
Feb 18 07:54:14.597590 ubuntu-focal-rax-iad-0023075527 cinder-volume[85652]: ERROR cinder.volume.manage...

Read more...

Revision history for this message
Lee Yarwood (lyarwood) wrote :

Weird, I wonder if this has anything to do with the underlying PV (a loopback device) filling up?

Revision history for this message
Lee Yarwood (lyarwood) wrote :
Download full text (3.5 KiB)

FWIW we are also seeing this in the nova-lvm job where n-cpu calls lvs via privsep:

https://a59f59e90e7507f14599-2cecd17ff9ea5f4d02c12f6d5fc5aedd.ssl.cf1.rackcdn.com/776681/7/check/nova-lvm/a19ff22/controller/logs/screen-n-cpu.txt

Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1525, in _cleanup
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server self._cleanup_lvm(instance, block_device_info)
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1615, in _cleanup_lvm
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server disks = self._lvm_disks(instance)
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1633, in _lvm_disks
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server logical_volumes = lvm.list_volumes(vg)
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/storage/lvm.py", line 111, in list_volumes
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server out, err = nova.privsep.fs.lvlist(vg)
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_privsep/priv_context.py", line 247, in _wrap
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server return self.channel.remote_call(name, args, kwargs)
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_privsep/daemon.py", line 224, in remote_call
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server raise exc_type(*result[2])
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server Command: lvs --noheadings -o lv_name /dev/stack-volumes-default
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server Exit code: -11
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server Stdout: ''
Mar 09 16:45:48.496630 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR oslo_messaging.rpc.server Stderr: ' WARNING: Failed to get udev device handler for device /dev/sda1.\n /dev/sda15: stat failed: No such file or directory\n Path /dev/sda15 no longer valid for de...

Read more...

Revision history for this message
Stephen Finucane (stephenfinucane) wrote :
Download full text (10.5 KiB)

We're also seeing this fail in nova, since that has calls (via privsep) to 'lvs'.

https://a59f59e90e7507f14599-2cecd17ff9ea5f4d02c12f6d5fc5aedd.ssl.cf1.rackcdn.com/776681/7/check/nova-lvm/a19ff22/controller/logs/screen-n-cpu.txt

  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR nova.compute.manager [None req-4c1200aa-cc35-4249-a6ba-538d7a763b07 tempest-SecurityGroupsTestJSON-739885543 tempest-SecurityGroupsTestJSON-739885543-project] [instance: 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c43] Setting instance vm_state to ERROR: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: Command: lvs --noheadings -o lv_name /dev/stack-volumes-default
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: Exit code: -11
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: Stdout: ''
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: Stderr: ' WARNING: Failed to get udev device handler for device /dev/sda1.\n /dev/sda15: stat failed: No such file or directory\n Path /dev/sda15 no longer valid for device(8,15)\n /dev/sda15: stat failed: No such file or directory\n Path /dev/sda15 no longer valid for device(8,15)\n Device open /dev/sda 8:0 failed errno 2\n Device open /dev/sda 8:0 failed errno 2\n Device open /dev/sda1 8:1 failed errno 2\n Device open /dev/sda1 8:1 failed errno 2\n WARNING: Scan ignoring device 8:0 with no paths.\n WARNING: Scan ignoring device 8:1 with no paths.\n'
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR nova.compute.manager [instance: 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c43] Traceback (most recent call last):
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR nova.compute.manager [instance: 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c43] File "/opt/stack/nova/nova/compute/manager.py", line 3040, in do_terminate_instance
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR nova.compute.manager [instance: 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c43] self._delete_instance(context, instance, bdms)
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR nova.compute.manager [instance: 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c43] File "/opt/stack/nova/nova/compute/manager.py", line 2975, in _delete_instance
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR nova.compute.manager [instance: 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c43] self._shutdown_instance(context, instance, bdms)
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR nova.compute.manager [instance: 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c43] File "/opt/stack/nova/nova/compute/manager.py", line 2867, in _shutdown_instance
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 nova-compute[91725]: ERROR nova.compute.manager [instance: 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c43] self._try_deallocate_network(context, instance,
  Mar 09 16:45:48.078761 ubuntu-focal-rax-ord-0023393373 n...

Revision history for this message
Gorka Eguileor (gorka) wrote :

According to the LVM code [1] this seems to be caused by a race condition between LVM and udev:

    /*
     * FIXME: remove this, the theory for this retry is that
     * there may be a udev race that we can sometimes mask by
     * retrying. This is here until we can figure out if it's
     * needed and if so fix the real problem.
     */

[1]: https://github.com/lvmteam/lvm2/blob/b84a9927b78727efffbb257a61e9e95a648cdfab/lib/label/label.c#L562-L567

tags: added: lvm
Revision history for this message
Gorka Eguileor (gorka) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/783660

Changed in cinder:
assignee: nobody → Gorka Eguileor (gorka)
status: Triaged → In Progress
Changed in cinder:
importance: Undecided → High
importance: High → Critical
Revision history for this message
Sofia Enriquez (lsofia-enriquez) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 18.0.0.0rc2

This issue was fixed in the openstack/cinder 18.0.0.0rc2 release candidate.

Revision history for this message
Sofia Enriquez (lsofia-enriquez) wrote :
Revision history for this message
Lee Yarwood (lyarwood) wrote :

I'm still seeing this on master, for example:

https://zuul.opendev.org/t/openstack/build/ebdd70c78b97462a933f58fc31adff05/log/controller/logs/screen-c-vol.txt#6684

May 20 20:08:52.251811 ubuntu-focal-inap-mtl01-0024757875 cinder-volume[194862]: Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C lvdisplay --noheading -C -o Attr stack-volumes-lvmdriver-1/volume-e2706dc2-1f26-4296-89f8-3944eed092f0
May 20 20:08:52.251811 ubuntu-focal-inap-mtl01-0024757875 cinder-volume[194862]: Exit code: 139
May 20 20:08:52.251811 ubuntu-focal-inap-mtl01-0024757875 cinder-volume[194862]: Stdout: ''
May 20 20:08:52.251811 ubuntu-focal-inap-mtl01-0024757875 cinder-volume[194862]: Stderr: ' WARNING: Failed to get udev device handler for device /dev/sdb1.\n /dev/sdb15: stat failed: No such file or directory\n Path /dev/sdb15 no longer valid for device(8,31)\n /dev/sdb15: stat failed: No such file or directory\n Path /dev/sdb15 no longer valid for device(8,31)\n Device open /dev/sdb 8:16 failed errno 2\n Device open /dev/sdb 8:16 failed errno 2\n Device open /dev/sdb1 8:17 failed errno 2\n Device open /dev/sdb1 8:17 failed errno 2\n WARNING: Scan ignoring device 8:16 with no paths.\n WARNING: Scan ignoring device 8:17 with no paths.\n'

Looks like this is also required around all of the lvdisplay commands?

https://opendev.org/openstack/cinder/src/branch/master/cinder/brick/local_dev/lvm.py#L773

Revision history for this message
melanie witt (melwitt) wrote :

Just saw this on stable/victoria tempest-integrated-compute [1] with the lvdisplay command:

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/api/volume/admin/test_volumes_actions.py", line 61, in test_volume_force_delete_when_volume_is_creating
    self._create_reset_and_force_delete_temp_volume('creating')
  File "/opt/stack/tempest/tempest/api/volume/admin/test_volumes_actions.py", line 39, in _create_reset_and_force_delete_temp_volume
    self.volumes_client.wait_for_resource_deletion(temp_volume['id'])
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 905, in wait_for_resource_deletion
    if self.is_resource_deleted(id, *args, **kwargs):
  File "/opt/stack/tempest/tempest/lib/services/volume/v3/volumes_client.py", line 227, in is_resource_deleted
    raise lib_exc.DeleteErrorException(
tempest.lib.exceptions.DeleteErrorException: Resource %(resource_id)s failed to delete and is in ERROR status
Details: Volume 231e6cea-69a9-4bac-b6c4-7ed21465178d failed to delete and is in error_deleting status

screen-c-vol.txt:

Jun 15 21:11:12.680660 ubuntu-focal-limestone-regionone-0025129752 cinder-volume[81073]: ERROR oslo_messaging.rpc.server Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C lvdisplay --noheading -C -o Attr stack-volumes-lvmdriver-1/volume-231e6cea-69a9-4bac-b6c4-7ed21465178d
Jun 15 21:11:12.680660 ubuntu-focal-limestone-regionone-0025129752 cinder-volume[81073]: ERROR oslo_messaging.rpc.server Exit code: 139

[1] https://zuul.opendev.org/t/openstack/build/054fe45381564005a547877ee219fce9/log/controller/logs/screen-c-vol.txt#5807

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/796889

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/cinder/+/797161

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers