python-rtslib-fb needs to handle new attribute cpus_allowed_list

Bug #1988366 reported by Sergio Durigan Junior
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
python-rtslib-fb (Ubuntu)
Fix Released
Critical
Sergio Durigan Junior
Jammy
Incomplete
Undecided
Unassigned

Bug Description

[ Impact ]

* getting information about "attached_luns" fails via python3-rtslib-fb when running the HWE kernel on jammy due to the new kernel module attribute cpus_allowed_list

* As a consequence, the following operations on jammy fails:

  - creating an iSCSI target with Ceph-iSCSI service
   https://docs.ceph.com/en/quincy/rbd/iscsi-target-cli/

(LUN.allocate) created test-iscsi-pool/disk_1 successfully
(LUN.add_dev_to_lio) Adding image 'test-iscsi-pool/disk_1' to LIO backstore user:rbd
tcmu-runner: tcmu_rbd_open:1162 rbd/test-iscsi-pool.disk_1: address: {172.16.12.185:0/2337103748}
(LUN.add_dev_to_lio) Successfully added test-iscsi-pool/disk_1 to LIO
LUN alloc problem - Delete from LIO/backstores failed - [Errno 20] Not a directory: '/sys/kernel/config/target/iscsi/cpus_allowed_list'

  - targetcli clearconfig confirm=True

[Errno 20] Not a directory: '/sys/kernel/config/target/iscsi/cpus_allowed_list'

  - targetctl clear

$ sudo targetctl clear
Traceback (most recent call last):
  File "/usr/bin/targetctl", line 82, in <module>
    main()
  File "/usr/bin/targetctl", line 79, in main
    funcs[sys.argv[1]](savefile)
  File "/usr/bin/targetctl", line 57, in clear
    RTSRoot().clear_existing(confirm=True)
  File "/usr/lib/python3/dist-packages/rtslib_fb/root.py", line 318, in clear_existing
    so.delete()
  File "/usr/lib/python3/dist-packages/rtslib_fb/tcm.py", line 269, in delete
    for lun in self._gen_attached_luns():
  File "/usr/lib/python3/dist-packages/rtslib_fb/tcm.py", line 215, in _gen_attached_luns
    for tpgt_dir in listdir(tpgts_base):
NotADirectoryError: [Errno 20] Not a directory: '/sys/kernel/config/target/iscsi/cpus_allowed_list'

[ Test Plan ]

## create two VMs, one for the GA kernel and the other for the HWE kernel
for kernel in ga hwe; do
    uvt-kvm create \
        --cpu=4 --memory=4096 \
        rtslib-fb-sru-testing-$kernel \
        release=jammy

    uvt-kvm wait rtslib-fb-sru-testing-$kernel
    uvt-kvm ssh rtslib-fb-sru-testing-$kernel 'sudo apt-get update && sudo apt-get upgrade -y'
    uvt-kvm ssh rtslib-fb-sru-testing-$kernel 'sudo apt-get install -y python3-rtslib-fb targetcli-fb'
done

## Install the HWE kernel and reboot
uvt-kvm ssh rtslib-fb-sru-testing-hwe 'sudo apt-get install -y linux-generic-hwe-22.04 && sudo reboot'

## Upgrade python3-rtslib-fb to the -proposed one

## create the test iSCSI target based on the quickstart guide in targetcli(8)
## https://manpages.ubuntu.com/manpages/jammy/en/man8/targetcli.8.html
cat <<EOF | sudo targetcli
backstores/fileio create test /tmp/test.img 100m;
iscsi/ create iqn.2006-04.com.example:test-target;
cd iscsi/iqn.2006-04.com.example:test-target/tpg1/;
luns/ create /backstores/fileio/test;
set attribute generate_node_acls=1;
EOF

## confirm the test iSCSI target is discoverable locally and confirm the discovered one is output in the terminal
sudo iscsiadm --mode discoverydb --type sendtargets \
    --portal 127.0.0.1 --discover

## tear down the test iSCSI target and confirm there is no error returned
sudo targetcli clearconfig confirm=True

[ Where problems could occur ]

The worst case scenario is it could cause a regression to the environment where the GA kernel is still running since the fix is for newer kernels.

To mitigate the risk, the same test case will be run for both GA kernel and HWE kernel machines with -proposed package.

[ Other Info ]

* upstream fix https://github.com/open-iscsi/rtslib-fb/commit/8d2543c4da62e962661011fea5b19252b9660822

====

python-rtslib-fb needs to properly handle the new kernel module attribute cpus_allowed_list.

This is causing a problem during targetcli-fb's autopkgtest on s390x:

https://autopkgtest.ubuntu.com/results/autopkgtest-kinetic/kinetic/s390x/t/targetcli-fb/20220830_075622_04113@/log.gz

Related branches

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :
Changed in python-rtslib-fb (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package python-rtslib-fb - 2.1.74-0ubuntu5

---------------
python-rtslib-fb (2.1.74-0ubuntu5) kinetic; urgency=medium

  * d/p/handle-target-kernel-module-new-attribute-cpus_allow.patch:
    Handle new Linux kernel module attribute "cpus_allowed_list".
    (LP: #1988366)

 -- Sergio Durigan Junior <email address hidden> Wed, 31 Aug 2022 22:41:11 -0400

Changed in python-rtslib-fb (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Nobuto Murata (nobuto) wrote :

The latest LTS (jammy) is missing this patch, and causes a failure in LUN operations when the host is running the HWE kernel, v6.5.

 python3-rtslib-fb | 2.1.74-0ubuntu4 | jammy | all
 python3-rtslib-fb | 2.1.74-0ubuntu5 | mantic | all
 python3-rtslib-fb | 2.1.74-0ubuntu5 | noble | all

Those are the log lines from the ceph-iscsi use cases (to expose an RBD volume over iSCSI/LIO) and it fails to complete the export creation and will be stuck at an unrecoverable state unless manually fixing gateway.conf in rados by deleting a half broken volume.

====
(LUN.allocate) created test-iscsi-pool/disk_1 successfully
(LUN.add_dev_to_lio) Adding image 'test-iscsi-pool/disk_1' to LIO backstore user:rbd
tcmu-runner: tcmu_rbd_open:1162 rbd/test-iscsi-pool.disk_1: address: {172.16.12.185:0/2337103748}
(LUN.add_dev_to_lio) Successfully added test-iscsi-pool/disk_1 to LIO
LUN alloc problem - Delete from LIO/backstores failed - [Errno 20] Not a directory: '/sys/kernel/config/target/iscsi/cpus_allowed_list'
====

similar report:
https://bugs.launchpad.net/python-cinderclient/yoga/+bug/2008010

Revision history for this message
Nobuto Murata (nobuto) wrote :

The workaround is to switch back to GA kernel (v5.15), but it's far from ideal to be used for newer generation of servers (less than two years old).

Revision history for this message
Nobuto Murata (nobuto) wrote :

Ceph-iSCSI is a bit complicated example as a reproducer
https://docs.ceph.com/en/quincy/rbd/iscsi-overview/
But the simplest reproducer is `targetctl clear` with jammy HWE kernel.

$ sudo targetctl clear
Traceback (most recent call last):
  File "/usr/bin/targetctl", line 82, in <module>
    main()
  File "/usr/bin/targetctl", line 79, in main
    funcs[sys.argv[1]](savefile)
  File "/usr/bin/targetctl", line 57, in clear
    RTSRoot().clear_existing(confirm=True)
  File "/usr/lib/python3/dist-packages/rtslib_fb/root.py", line 318, in clear_existing
    so.delete()
  File "/usr/lib/python3/dist-packages/rtslib_fb/tcm.py", line 269, in delete
    for lun in self._gen_attached_luns():
  File "/usr/lib/python3/dist-packages/rtslib_fb/tcm.py", line 215, in _gen_attached_luns
    for tpgt_dir in listdir(tpgts_base):
NotADirectoryError: [Errno 20] Not a directory: '/sys/kernel/config/target/iscsi/cpus_allowed_list'

Nobuto Murata (nobuto)
description: updated
Revision history for this message
James Page (james-page) wrote :

Thanks Nobuto - uploaded to jammy UNAPPROVED for SRU team review.

Revision history for this message
Robie Basak (racb) wrote :

SRU review

> +-version_attributes = set(["lio_version", "version"])
> +-discovery_auth_attributes = set(["discovery_auth"])

These might be accessed by an API caller somewhere, and so would represent a regression. I spent some time digging around and didn't find a direct example, so I was going to leave it, but then I came across in rtslib/tcm.py:

> from .fabric import target_names_excludes

target_names_excludes is being maintained so I think this is OK. However, I think it demonstrates that "stuff" does make use of these names arbitrarily and so we probably shouldn't be dropping items from the module namespace if we don't have to. Would there be a problem with just dropping the dropping of these two lines, to reduce the risk that we'll break something somewhere, including something external to the archive that we cannot find, by unnecessarily dropping these? It should be trivial to do and could save users quite a bit of pain if they are impacted, even if unlikely.

Or, put another way, if we were writing this directly for Jammy, we certainly wouldn't be doing the refactoring that is being done here because it is riskier, so it seems to me that we shouldn't, and the change I suggest should be trivially safe. So I think we should do that.

Apart from this, +1 from an SRU perspective.

If you disagree let's discuss, so I'll leave this in the queue for now. If you agree, please upload an adjustment.

Changed in python-rtslib-fb (Ubuntu Jammy):
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.