update-status hook, list-disk action breaks when device has no device_node in udev

Bug #1866956 reported by Jose Guedez
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Fix Released
Undecided
Unassigned

Bug Description

The update status hook and the list-disk action throw an exception when udev returns a device without a "device_node", as can happen with nvme devices. Example:

Traceback (most recent call last):
  File "hooks/update-status", line 873, in <module>
    assess_status()
  File "hooks/update-status", line 826, in assess_status
    for dev in list(set(ceph.unmounted_disks()) - set(osd_journals)):
  File "lib/ceph/utils.py", line 180, in unmounted_disks
    if block_type in device.device_node:
TypeError: argument of type 'NoneType' is not iterable

This happens on calls to unmounted_disks (https://github.com/openstack/charm-ceph-osd/blob/master/lib/charms_ceph/utils.py#L180), as pyudev returns a device with 'None' as the device_node, which is not iterable.

This code is invoked at the end of all hooks via 'assess_status' (https://github.com/openstack/charm-ceph-osd/blob/master/hooks/ceph_hooks.py#L831), and the list-disks action (https://github.com/openstack/charm-ceph-osd/blob/master/actions/list_disks.py#L49)

In our client case, this happen when pyudev returns a NVMe device that has device_node as None, like this:

Device('/sys/devices/pci0000:5d/0000:5d:02.0/0000:5e:00.0/nvme/nvme0/nvme0c33n1')
device.device_node is None => True

We identified the issue with revision 292 (19.07), but the code in master seems the same (all the links above)

Tags: field-high
Revision history for this message
Michael Skalka (mskalka) wrote :

commenting so the field-high subscribers are properly notified.

tags: added: field-high
Revision history for this message
Andrew McLeod (admcleod) wrote :

Could you please test https://review.opendev.org/712886 and see if it resolves the issue?

Revision history for this message
Jose Guedez (jfguedez) wrote :

admcleod: Thanks, I tested the fix and it does allow the action to complete (since it essentially is filtering the device), and I expect the unit to go out of "error" state.

However, we now are getting unexpected "non-pristine" disks on the output:

  results:
    blacklist: '[]'
    disks: '[''/dev/sdd'', ''/dev/bcache7'']'
    non-pristine: '[''/dev/bcache7'']'

These disks are actually associated to a OSD that seems healthy, so it is perhaps caused by either it being nvme or the bcache setup itself. Let me know if you have any pointers here.

Revision history for this message
Xav Paice (xavpaice) wrote :
Changed in charm-ceph-osd:
status: New → Fix Committed
James Page (james-page)
Changed in charm-ceph-osd:
milestone: none → 20.05
David Ames (thedac)
Changed in charm-ceph-osd:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.