Report failure to parse results in _get_iscsi_nodes

Bug #1814849 reported by Ian Wienand
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
os-brick
Invalid
Undecided
Unassigned

Bug Description

Testing on Fedora 29 I see in [1]

---
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [None req-c1dcfaa8-7718-4b54-a354-3d57080eb102 tempest-ServersTestBootFromVolume-1030256664 tempest-ServersTestBootFromVolume-1030256664] [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] Instance failed to spawn: IndexError: list index out of range
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] Traceback (most recent call last):
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] File "/opt/stack/nova/nova/compute/manager.py", line 2379, in _build_resources
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] yield resources
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] File "/opt/stack/nova/nova/compute/manager.py", line 2142, in _build_and_run_instance
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] block_device_info=block_device_info)
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3132, in spawn
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] mdevs=mdevs)
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5521, in _get_guest_xml
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] context, mdevs)
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5314, in _get_guest_config
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] flavor, guest.os_type)
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4080, in _get_guest_storage_config
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] self._connect_volume(context, connection_info, instance)
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1275, in _connect_volume
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] vol_driver.connect_volume(connection_info, instance)
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] File "/opt/stack/nova/nova/virt/libvirt/volume/iscsi.py", line 64, in connect_volume
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] device_info = self.connector.connect_volume(connection_info['data'])
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] File "/usr/lib/python2.7/site-packages/os_brick/utils.py", line 150, in trace_logging_wrapper
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] result = f(*args, **kwargs)
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 328, in inner
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] return f(*args, **kwargs)
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] File "/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py", line 517, in connect_volume
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] self._cleanup_connection(connection_properties, force=True)
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] File "/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py", line 886, in _cleanup_connection
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] ips_iqns_luns)
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] File "/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py", line 803, in _get_connection_devices
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] nodes = self._get_iscsi_nodes()
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] File "/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py", line 159, in _get_iscsi_nodes
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] lines.append((info[0].split(',')[0], info[1]))
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae] IndexError: list index out of range
Feb 06 00:24:43.586976 fedora-29-ovh-gra1-0002396116 nova-compute[22282]: ERROR nova.compute.manager [instance: e57913ea-b40c-44e5-b0ba-91ac78c949ae]
---

I'm guessing that the iscsiadm code has run correctly (did hit the error case in _execute) but has unexpected output

---
        out, err = self._execute('iscsiadm', '-m', 'node', run_as_root=True,
                                 root_helper=self._root_helper,
                                 check_exit_code=False)
---

It would be much more helpful if we could see the output to diagnose what was wrong...

[1] http://logs.openstack.org/59/619259/2/check/devstack-platform-fedora-latest/3eaee4d/controller/logs/screen-n-cpu.txt.gz

Revision history for this message
Ian Wienand (iwienand) wrote :

hrm, there is another log, perhaps related

The failure is in "setUpClass (tempest.api.compute.servers.test_create_server"

---
tempest.exceptions.BuildErrorException: Server e57913ea-b40c-44e5-b0ba-91ac78c949ae failed to build and is in ERROR status
---

That seems to relate to tempest-ServersTestBootFromVolume-1030256664

It's rather mangled in the raw logs [1] but there's another error in there:

---
result = self.__execute(*args, **kwargs)', ' File "/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py", line 169, in execute
    return execute_root(*cmd, **kwargs)
    File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 241, in _wrap
    return self.channel.remote_call(name, args, kwargs)
    File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 204, in remote_call
    raise exc_type(*result[2])
    ProcessExecutionError: Unexpected error while running command.

    Command: iscsiadm -m session
    Exit code: 1
    Stdout: u\'\'
    Stderr: u"iSCSI ERROR: Failed to read \'address\': error when reading \'/sys/class/iscsi_connection/connection1:0/address\': 107 # sysfs.c:_sysfs_prop_get_str():169
    iSCSI ERROR: Error when reading \'/sys/class/iscsi_connection/connection1:0/port\': 107 # sysfs.c:iscsi_sysfs_prop_get_ll():230
    iscsiadm: Failed to query iSCSI sessions, error 1: BUG of libopeniscsiusr library

 IndexError: list index out of range
---

[1] http://paste.openstack.org/show/744600/

Revision history for this message
Ian Wienand (iwienand) wrote :

Accounting for timestamps, the cinder setup from init_cinder between [1] (working, f28) and [2] (not working, f29) is exactly the same. so it's not related to that or the lvm volume setup there

[1] http://logs.openstack.org/59/619259/2/check/devstack-platform-fedora-latest/3eaee4d/controller/logs/devstacklog.txt.gz#_2019-02-05_23_56_39_430
[2] http://logs.openstack.org/93/629493/2/check/devstack-platform-fedora-latest/62fec41/controller/logs/devstacklog.txt.gz

Revision history for this message
Ian Wienand (iwienand) wrote :

There isn't a big difference between the package versions:

2019-01-25 20:45:41.285 | Package iscsi-initiator-utils-6.2.0.874-9.git86e8892.fc28.x86_64 is already installed, skipping.

2019-02-05 23:36:39.718 | Package iscsi-initiator-utils-6.2.0.876-5.gitf3c8e90.fc29.x86_64 is already installed.

Revision history for this message
Ian Wienand (iwienand) wrote :

iscsid is clearly running from [1] ... we don't really capture it's logs I don't think

[1] http://logs.openstack.org/59/619259/2/check/devstack-platform-fedora-latest/3eaee4d/controller/logs/screen-peakmem_tracker.txt.gz

Revision history for this message
Ian Wienand (iwienand) wrote :

Ok, a re-run shows the same thing. I'm guessing [1] is the start of the problems

---
Feb 06 23:13:23.838875 fedora-29-rax-dfw-0002417621 nova-compute[22870]: WARNING os_brick.initiator.connectors.iscsi [None req-53e99380-f98b-4404-9e4f-a5ba744f840d tempest-VolumesAdminNegativeTest-948589734 tempest-VolumesAdminNegativeTest-948589734] iscsiadm stderr output when getting sessions: iscsiadm: No active sessions.
---

[1] http://logs.openstack.org/59/619259/2/check/devstack-platform-fedora-latest/7dc20b0/controller/logs/screen-n-cpu.txt.gz?#_Feb_06_23_13_23_838875

Revision history for this message
Ian Wienand (iwienand) wrote :

---
[root@fedora-29-limestone-regionone-0002419661 ~]# iscsiadm -m session
iSCSI ERROR: Failed to read 'address': error when reading '/sys/class/iscsi_connection/connection1:0/address': 107 # sysfs.c:_sysfs_prop_get_str():169
iSCSI ERROR: Error when reading '/sys/class/iscsi_connection/connection1:0/port': 107 # sysfs.c:iscsi_sysfs_prop_get_ll():230
iscsiadm: Failed to query iSCSI sessions, error 1: BUG of libopeniscsiusr library
[root@fedora-29-limestone-regionone-0002419661 ~]# ls /sys/class/iscsi_connection/connection1\:0/
address device/ header_digest max_recv_dlength persistent_address ping_tmo power/ subsystem/
data_digest exp_statsn local_port max_xmit_dlength persistent_port port recv_tmo uevent
[root@fedora-29-limestone-regionone-0002419661 ~]# cat /sys/class/iscsi_connection/connection1\:0/address
cat: '/sys/class/iscsi_connection/connection1:0/address': Transport endpoint is not connected
[root@fedora-29-limestone-regionone-0002419661 ~]#
---

Revision history for this message
Ian Wienand (iwienand) wrote :

There is clearly something going on with the iscsi setup which I don't understand. Because launchpad mangles output, here is a paste of

 - journalctl -u iscsid.service
 - targetcli
 - iscsiadm -m session
 - some dmesg errors

 http://paste.openstack.org/show/744658/

This is from the host running the devstack instance with logs at

 http://logs.openstack.org/59/619259/2/check/devstack-platform-fedora-latest/aef382e/

Revision history for this message
Attila Fazekas (afazekas) wrote :

fedora29 has https://bugzilla.redhat.com/show_bug.cgi?id=1672620 issue with libvirt, I am using ceph, I did not tried iscsi, but it can be related.
The easiest way to get newer libvirt is 'dnf copr enable @virtmaint-sig/virt-preview -y' until a new libvirt released in f29, or you can try to downgrade.

Revision history for this message
Matthew Booth (mbooth-9) wrote :

I've also hit this and I spent a bunch of time on it. Unfortunately I had real work to do and after hacking on it for about a day I trashed that system and moved on, so I can no longer cut/paste from it. However, I did discover 2 important things:

1. devstack doesn't don't start the 'target' service, which is literally called 'target' on F29. It's a kernel thing, and explicitly not iscsid:
  # systemctl start target

Once you start it, you can see targets created correctly when you attach volumes.

2. The os-brick bug is caused by the target portal running into the tgt in the output. IIRC normal output is something like:

1.2.3.4:3296,1 iqn-.....

However, on F29 the number after the comma was enormous:

1.2.4.4:3296,49292949iqn-...

The above IS NOT CUT/PASTE. That's not the real number. However, it was causing the parsing issue.

Revision history for this message
Ian Wienand (iwienand) wrote :

@attila thanks ... it does seem to be just related to iscsi at this point, but could be more errors lurking!

@matt yeah I wondered about the target service. the service itself is a oneshot that just calls targetctl restore, that bit hasn't changed in years afaics

---
[Unit]
Description=Restore LIO kernel target configuration
Requires=sys-kernel-config.mount
After=sys-kernel-config.mount network.target local-fs.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/targetctl restore
ExecStop=/usr/bin/targetctl clear
SyslogIdentifier=target

[Install]
WantedBy=multi-user.target
---

It seems on a fresh system, there's nothing to restore, but possibly this initialises something?

The different numbers are the type of thing I hope fedora is good at catching, if new kernels tickle bad assumptions etc in code it's good to chase down early :)

Revision history for this message
Ian Wienand (iwienand) wrote :

Ok, so the system is not fully hosed; there's some iscsi things recorded by cinder, and i could add new per

 http://paste.openstack.org/show/744723/

In fact I can see a lot of volumes with iscsiadm

 http://paste.openstack.org/show/744724/

It's just the connections seem dead

---
[root@fedora-29-limestone-regionone-0002419661 target]# iscsiadm -m session
iSCSI ERROR: Failed to read 'address': error when reading '/sys/class/iscsi_connection/connection1:0/address': 107 # sysfs.c:_sysfs_prop_get_str():169
iSCSI ERROR: Error when reading '/sys/class/iscsi_connection/connection1:0/port': 107 # sysfs.c:iscsi_sysfs_prop_get_ll():230
iscsiadm: Failed to query iSCSI sessions, error 1: BUG of libopeniscsiusr library
---

Looking at

 http://paste.openstack.org/show/744725/

something seems to go wrong at around

---
Feb 07 01:14:47 fedora-29-limestone-regionone-0002419661 kernel: connection2:0: detected conn error (1020)
Feb 07 01:14:49 fedora-29-limestone-regionone-0002419661 kernel: Unable to locate Target IQN: iqn.2010-10.org.openstack:volume-81e32260-2895-4dff-bacb-a70062d7de89 in Storage Node
Feb 07 01:14:49 fedora-29-limestone-regionone-0002419661 kernel: iSCSI Login negotiation failed.
---

then it seems like not much works after that. That paste has a bunch from the logs.

I'm trying to get tcpdump running in the background, so we can see what happens. Hitting a few other issues :/

Revision history for this message
Gorka Eguileor (gorka) wrote :
Changed in os-brick:
status: New → Invalid
Revision history for this message
Matthew Booth (mbooth-9) wrote :

The above was merged into open-iscsi:

[mbooth@workstation open-iscsi (master)]$ git tag --contains baa0cb45cfcf10a81283c191b0b236cd1a2f66ee
2.0.877

Revision history for this message
Matthew Booth (mbooth-9) wrote :

I've created a patched iscsi-initiator-utils locally containing the fix referenced in comment 12, and I can confirm this fixes the issue.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.