Comment 82 for bug 1569925

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hello Matthijs

Unfortunately the best way to make this not to happen is by fixing the kernel hang situation, when kernel calls sd_sync_cache() to every configured device before the shutdown. There is a single I/O cmd hanging in all scsi paths and the I/O error is never propagated to block layer (despite iscsi having proper I/O error settings). I'm finishing analysing some kernel dumps so I can finally understand what is happening in the transport layer (this happens with more recent kernels also).

The workaround was to create a script that would restore the iscsi connection, wait for the login to happen again and the paths are back online, and cleanly logout, allowing the sd_sync_cache() operation to be finalized.

If you are facing this problem, I know for sure that your iscsi connections are not being finalized before the network is off. This means that you have to pay attention on how you configured your iscsi disks:

- guarantee that iscsiadm was configured with "interfaces" so it works on startup:

  sudo iscsiadm -m iface -I ens4 --op=new -n iface.hwaddress -v 52:54:00:b4:21:bb
  sudo iscsiadm -m iface -I ens7 --op=new -n iface.hwaddress -v 52:54:00:c2:34:1b

- the discovery/login has to be made AFTER the iscsiadm had interfaces added

  sudo iscsiadm -m discovery --op=new --op=del --type sendtargets --portal $SERVER1
  sudo iscsiadm -m discovery --op=new --op=del --type sendtargets --portal $SERVER2

  # iscsiadm -m node --loginall=automatic HAS TO WORK or else init scripts will fail

  http://pastebin.ubuntu.com/25894472/

- configure the volumes in /etc/fstab with "_netdev" parameter for systemd unit ordering

  LABEL=BLUE /blue ext4 defaults,_netdev 0 1
  LABEL=GREEN /green ext4 defaults,_netdev 0 1
  LABEL=PURPLE /purple ext4 defaults,_netdev 0 1
  LABEL=RED /red ext4 defaults,_netdev 0 1
  LABEL=YELLOW /yellow ext4 defaults,_netdev 0 1

You have to make sure open-iscsi and iscsid systemd units are started after the network is available and are stopped before they disappear. That might be your problem, if configuration above is correct.

inaddy@iscsihang:~$ systemctl edit --full iscsid.service
inaddy@iscsihang:~$ systemctl edit --full open-iscsi.service

The defaults are:

[Unit]
Description=iSCSI initiator daemon (iscsid)
Documentation=man:iscsid(8)
Wants=network-online.target remote-fs-pre.target
Before=remote-fs-pre.target
After=network.target network-online.target

and

[Unit]
Description=Login to default iSCSI targets
Documentation=man:iscsiadm(8) man:iscsid(8)
Wants=network-online.target remote-fs-pre.target iscsid.service
After=network-online.target iscsid.service
Before=remote-fs-pre.target

So you can see that iscsid.service runs BEFORE open-iscsi.service. In my case, I'm configuring network using rc-local.service (since this is my lab) and I had to guarantee the ordering also:

If, after configuring your system like this, you still face problems, you can use this script:

http://pastebin.ubuntu.com/25894592/

And provide me the DEBUG=/.shutdown.log file, created after its execution, attached to this launchpad case. Its likely that you will have hang iscsi connections for some reason (services ordering, lack of volumes in fstab so umounts are not done, etc).

Hope it helps for now.