nova-lvm lvs return -11 and fails with Failed to get udev device handler for device

Bug #1931710 reported by Lee Yarwood
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Balazs Gibizer
Wallaby
Fix Released
Undecided
Unassigned

Bug Description

Description
===========

Tests within the nova-lvm job fail during cleanup with the following trace visible in n-cpu:

https://797b12f7389a12861990-09e4be48fe62aca6e4b03d954e19defe.ssl.cf5.rackcdn.com/795992/3/check/nova-lvm/99a7b1f/controller/logs/screen-n-cpu.txt

Jun 11 13:04:38.733030 ubuntu-focal-inap-mtl01-0025074127 nova-compute[106254]: Command: lvs --noheadings -o lv_name /dev/stack-volumes-default
Jun 11 13:04:38.733030 ubuntu-focal-inap-mtl01-0025074127 nova-compute[106254]: Exit code: -11
Jun 11 13:04:38.733030 ubuntu-focal-inap-mtl01-0025074127 nova-compute[106254]: Stdout: ''
Jun 11 13:04:38.733030 ubuntu-focal-inap-mtl01-0025074127 nova-compute[106254]: Stderr: ' WARNING: Failed to get udev device handler for device /dev/sda1.\n /dev/sda15: stat failed: No such file or directory\n Path /dev/sda15 no longer valid for device(8,15)\n /dev/sda15: stat failed: No such file or directory\n Path /dev/sda15 no longer valid for device(8,15)\n Device open /dev/sda 8:0 failed errno 2\n Device open /dev/sda 8:0 failed errno 2\n Device open /dev/sda1 8:1 failed errno 2\n Device open /dev/sda1 8:1 failed errno 2\n WARNING: Scan ignoring device 8:0 with no paths.\n WARNING: Scan ignoring device 8:1 with no paths.\n'

Bug #1901783 details something simillar to this in Cinder but as the above is coming from native Nova ephemeral storage code with a different return code I'm going to treat this as a separate issue for now.

Steps to reproduce
==================

Only seen as part of the nova-lvm job at present.

Expected result
===============

nova-lvm and the removal of instances succeeds.

Actual result
=============

nova-lvm and the removal of instances fails.

Environment
===========
1. Exact version of OpenStack you are running. See the following
  list for all releases: http://docs.openstack.org/releases/

master

2. Which hypervisor did you use?
   (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
   What's the version of that?

libvirt

2. Which storage type did you use?
   (For example: Ceph, LVM, GPFS, ...)
   What's the version of that?

LVM (ephemeral)

3. Which networking type did you use?
   (For example: nova-network, Neutron with OpenVSwitch, ...)

N/A

Logs & Configs
==============

As above.

Lee Yarwood (lyarwood)
summary: - lvs return -11 and fails with Failed to get udev device handler for
- device
+ nova-lvm lvs return -11 and fails with Failed to get udev device handler
+ for device
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :
Download full text (4.1 KiB)

The same issue can happen in other scenarios e.g:

Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager Traceback (most recent call last):
Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 9856, in _update_available_resource_for_node
Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager self.rt.update_available_resource(context, nodename,
Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 879, in update_available_resource
Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename)
Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 8858, in get_available_resource
Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager disk_info_dict = self._get_local_gb_info()
Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 7299, in _get_local_gb_info
Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager info = lvm.get_volume_group_info(
Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/storage/lvm.py", line 92, in get_volume_group_info
Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager out, err = nova.privsep.fs.vginfo(vg)
Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager File "/usr/local/lib/python3.8/dist-packages/oslo_privsep/priv_context.py", line 247, in _wrap
Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager return self.channel.remote_call(name, args, kwargs)
Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager File "/usr/local/lib/python3.8/dist-packages/oslo_privsep/daemon.py", line 224, in remote_call
Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager raise exc_type(*result[2])
Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager Command: vgs --noheadings --nosuffix --separator | --units b -o vg_size,vg_free stack-volumes-default
Jun 03 05:20:33.248693 ubuntu-focal-inap-mtl01-0024946450 nova-compute[90971]: ERROR nova.compute.manager Exit code: -11
J...

Read more...

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

After discussion with Lee it seem this is a transient error where retry could help

Changed in nova:
assignee: nobody → Balazs Gibizer (balazs-gibizer)
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/796269

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/796269
Committed: https://opendev.org/openstack/nova/commit/e59fc77c3d924cfab6ddcd780ae0604a84e64c30
Submitter: "Zuul (22348)"
Branch: master

commit e59fc77c3d924cfab6ddcd780ae0604a84e64c30
Author: Balazs Gibizer <email address hidden>
Date: Mon Jun 14 14:05:05 2021 +0200

    Retry lvm volume and volume group query

    We observed that the vgs and lvs queries the our lvm driver uses can
    intermittently fail with error code 11 (EAGAIN). So this patch enabled
    the retry in oslo_concurrency.processutils for these calls.

    Change-Id: I93da6cb1675d77bcdcd1075641dea9e2afc0ee1a
    Closes-Bug: #1931710

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/796707

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/796707
Committed: https://opendev.org/openstack/nova/commit/2b2434c173959aa728de8bfc1a0f1e1027ee0466
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 2b2434c173959aa728de8bfc1a0f1e1027ee0466
Author: Balazs Gibizer <email address hidden>
Date: Mon Jun 14 14:05:05 2021 +0200

    Retry lvm volume and volume group query

    We observed that the vgs and lvs queries the our lvm driver uses can
    intermittently fail with error code 11 (EAGAIN). So this patch enabled
    the retry in oslo_concurrency.processutils for these calls.

    Change-Id: I93da6cb1675d77bcdcd1075641dea9e2afc0ee1a
    Closes-Bug: #1931710
    (cherry picked from commit e59fc77c3d924cfab6ddcd780ae0604a84e64c30)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 24.0.0.0rc1

This issue was fixed in the openstack/nova 24.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 23.1.0

This issue was fixed in the openstack/nova 23.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.