Shutdown hang on 16.04 with iscsi targets

Bug #1569925 reported by Matt Schulte on 2016-04-13
150
This bug affects 25 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Status tracked in Bionic
Trusty
Medium
Rafael David Tinoco
Xenial
Medium
Rafael David Tinoco
Artful
Medium
Rafael David Tinoco
Bionic
Medium
Rafael David Tinoco
open-iscsi (Ubuntu)
Status tracked in Bionic
Trusty
Medium
Rafael David Tinoco
Xenial
Medium
Rafael David Tinoco
Artful
Medium
Rafael David Tinoco
Bionic
Medium
Rafael David Tinoco

Bug Description

[Impact]

 * open-iscsi users might face hangs during OS shutdown.
 * hangs can be caused by manual iscsi configuration/setup.
 * hangs can also be caused by bad systemd unit ordering.
 * if transport layer interface vanishes before lun is
   disconnected, then the hang will happen.
 * check comment #89 for the fix decision.

[Test Case]

 * a simple way of reproducing the kernel hang is to disable
   the open-iscsi logouts. this simulates a situation when
   a service has shutdown the network interface, used by
   the transport layer, before proper iscsi logout was done.

   $ log into all iscsi luns

   $ systemctl edit --full open-iscsi.service
   ...
   #ExecStop=/lib/open-iscsi/logout-all.sh
   ...

   $ sudo reboot # this will make server to hang forever
                 # on shutdown

[Regression Potential]

 * the regression is low because the change acts on the iscsi
   transport layer code ONLY when the server is in shutdown
   state.

 * any error in logic would only appear during shutdown and
   would not cause any harm to data.

[Other Info]

 * ORIGINAL BUG DESCRIPTION

I have 4 servers running the latest 16.04 updates from the development branch (as of right now).

Each server is connected to NetApp storage using iscsi software initiator. There are a total of 56 volumes spread across two NetApp arrays. Each volume has 4 paths available to it which are being managed by device mapper.

While logged into the iscsi sessions all I have to do is reboot the server and I get a hang.

I see a message that says:

  "Reached target Shutdown"

followed by

  "systemd-shutdown[1]: Failed to finalize DM devices, ignoring"

and then I see 8 lines that say:

  "connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
  "connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
  "connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
  "connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
  "connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
  "connection6:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
  "connection7:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
  "connection8:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
  NOTE: the actual values of the *'s differ for each line above.

This seems like a bug somewhere but I am unaware of any additional logging that I could turn on to pinpoint the problem.

Note I also have similar setups that are not doing iscsi and they don't have this problem.

Here is a screenshot of what I see on the shell when I try to reboot:

(https://launchpadlibrarian.net/291303059/Screenshot.jpg)

This is being tracked in NetApp bug tracker CQ number 860251.

If I log out of all iscsi sessions before rebooting then I do not experience the hang:

iscsiadm -m node -U all

We are wondering if this could be some kind of shutdown ordering problem. Like the network devices have already disappeared and then iscsi tries to perform some operation (hence the ping timeouts).

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1569925/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
tags: added: xenial
affects: ubuntu → systemd (Ubuntu)
Martin Pitt (pitti) wrote :

My first suspicion is that the interface that the iscsi device is on lives in /etc/network/interfaces (or interfaces.d/*) somewhere and thus is being shut down by networking.service. Ordinarily this network interface is being set up in initramfs by open-iscsi and should not otherwise be configured/touched by the OS (i. e. not by ifupdown, NetworkManager, or networkd).

 - Which network interface is the iscsi device on, and how did you configure this?
 - Please attach /etc/network/interfaces*

It would also be useful to attach a complete shutdown log. Do a reboot and attach /var/log/syslog, that should have sufficient data for the network interface shutdowns (it won't cover the open-iscsi error though as that happens too late, but we have that part in the screenshot).

Changed in systemd (Ubuntu):
status: New → Incomplete
Matt Schulte (gimpyestrada) wrote :

root@ICTM1612S02H1:/etc/network# cat interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eno1
iface eno1 inet dhcp

auto enp130s0
iface enp130s0 inet6 static
address fe80::192:12:21:115
netmask 64
mtu 9000

auto enp4s0f0
iface enp4s0f0 inet6 static
address fe80::192:12:22:115
netmask 64
mtu 9000

Matt Schulte (gimpyestrada) wrote :

In the above, the two NICs used for iSCSI traffic are the enp130s0 and the enp4s0f0.

In case you were wondering I have another server running ipv4 and it also hangs.

Martin Pitt (pitti) wrote :

Do you have your root partition on iscsi, or just some auxiliary ones like /opt?

If it's the root partition, then you can't put that on an interface which is "auto" in /etc/network/interfaces -- ifupdown will tear it down during shutdown and sever the connection, explaining the hang. In this case, open-iscsi's initramfs integration will already bring it up. (Hence my question how you set this up).

Matt Schulte (gimpyestrada) wrote :

Here is the requested syslog.

Reboot happens after:

Apr 14 09:49:42 ICTM1612S02H1 root: MDS: Rebooting Now

Matt Schulte (gimpyestrada) wrote :

Nothing for the system is on iscsi. They are just extra volumes that I use for generating block or filesystem IO to our storage.

Martin Pitt (pitti) wrote :

Unfortunately rsyslog gets shut down too early, so we don't see much from this. Can you please re-try with enabling persistent journal:

    sudo mkdir -p /var/log/journal
    sudo systemd-tmpfiles --create --prefix /var/log/journal

then reboot once to actually enable on-disk journal, then reboot again to record the shutdown sequence. After that, please do

   sudo journalctl -b -1 > /tmp/journal.txt

and attach /tmp/journal.txt here.

And again, how exactly did you set this up? Is this reproducible in a VM somehow? Can you please give me /etc/fstab and other files where you configured this?

Thanks!

Matt Schulte (gimpyestrada) wrote :

Here is the requested journal.txt file.

I had a bit of difficulty after performing the requested operations, my server kept booting into the (initramfs) prompt.

The journal log may contain more than one reboot attempt.

Matt Schulte (gimpyestrada) wrote :

This contains the configuration of my iscsi sessions. It is a tar of the entire /etc/iscsi folder, it should show what we've got going on.

Matt Schulte (gimpyestrada) wrote :

And just for good measure this is my /etc/network/interfaces file:

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eno1
iface eno1 inet dhcp

auto enp130s0
iface enp130s0 inet6 static
address fe80::192:12:21:115
netmask 64
mtu 9000

auto enp4s0f0
iface enp4s0f0 inet6 static
address fe80::192:12:22:115
netmask 64
mtu 9000

Matt Schulte (gimpyestrada) wrote :

Contents of fstab:

# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
# / was on /dev/sda1 during installation
UUID=3f62d8b3-3afe-4cd7-8d36-49101b1d7b02 / ext4 errors=remount-ro 0 1
# swap was on /dev/sda5 during installation
UUID=9c54f202-7e90-4633-9134-706e6d7080fc none swap sw 0 0
10.113.242.59:/IOP /opt/iop nfs noauto 0 0

#IOMNT START#
/dev/mapper/3600a09800059d66a0000fa5c56f3e78f /iomnt-3600a09800059d66a0000fa5c56f3e78f ext3 nobarrier,_netdev 0 0
/dev/mapper/3600a09800059d7b40000223656f3ef42 /iomnt-3600a09800059d7b40000223656f3ef42 ext4 nobarrier,_netdev,discard 0 0
/dev/mapper/3600a09800059d66a0000fa5456f3e782 /iomnt-3600a09800059d66a0000fa5456f3e782 xfs nobarrier,_netdev,discard 0 0
/dev/mapper/3600a09800059d66a0000fa5056f3e77b /iomnt-3600a09800059d66a0000fa5056f3e77b btrfs nobarrier,_netdev,discard 0 0
/dev/mapper/360080e50003415e60000147d5605f7fb /iomnt-360080e50003415e60000147d5605f7fb ext3 nobarrier,_netdev 0 0
/dev/mapper/360080e500034153e0000ed065605faa2 /iomnt-360080e500034153e0000ed065605faa2 ext4 nobarrier,_netdev,discard 0 0
#IOMNT END#

sles (slesru) wrote :

I have the same problem , target is HPE MSA, and problem happens not every time, about in 1/5 of reboots- I run 16.04 , so I just tested...
Looks like iscsi not always stopped before network interfaces are down.

sles (slesru) wrote :

>Looks like iscsi not always stopped before network interfaces are down.

No, just reproduced this, iscsi was logged out...

sles (slesru) wrote :

Because it is not always reproduced, I guess this is dependency problem.
Looked into /etc/init.d/iscsid

# Required-Stop: $network $local_fs sendsigs

I guess we need $remote_fs here?

sles (slesru) wrote :

OK, I'm trying reboot and reboot and now I think this is kernel bug and it is multipath related...

Matt Schulte (gimpyestrada) wrote :

I definitely have remote filesystems mounted (using _netdev if that matters) when I experience this problem.

Matt Schulte (gimpyestrada) wrote :

I have finally been able to verify that this is reproducible with the released 16.04 from the ISO.

What else can be provided here to move this bug along?

(This is being tracked internally in CQ 860251)

rc_micha (m-niendorf) wrote :

I have the same problem with my multipath iscsi session. I just logged in without mounting the volumes.
While rebooting I get always the message "systemd-shutdown[1]: Failed to finalize DM devices, ignoring" but in most cases the server just ignores that an reboots.

The iscsi multipath volumes are blacklisted in lvm config and are currently not mounted.

Matt Schulte (gimpyestrada) wrote :

rc_micha: Your volumes aren't mounted, but the iscsi sessions are logged in? If you log out of your iscsi sessions does it shutdown clean?

Andrew Morton (amorton12) wrote :

Seeing what I believe is this bug on a 16.04 install that boots from iSCSI LUN. When I shut down, it goes just past shutting down the network interfaces, then slows down and starts throwing task blocked messages from the kernel. It eventually gets down to "Reached Target Shutdown" before getting completely stuck. I have to manually shut down or reboot the machine from that point.

daigang (daigang0701) wrote :

I got same preblom
look iscsi not logout before Network-manager cut the network

so iscsi can not logout.

Matt Schulte (gimpyestrada) wrote :

There is clearly something wrong here.

FWIW This issue did not exist in 14.04.

Launchpad Janitor (janitor) wrote :

[Expired for systemd (Ubuntu) because there has been no activity for 60 days.]

Changed in systemd (Ubuntu):
status: Incomplete → Expired
Matt Schulte (gimpyestrada) wrote :

If this is still a problem, what do we do now? Do I just keep testing with new releases and file new bugs?

Matt Schulte (gimpyestrada) wrote :

For completeness, I upgraded to 16.04.1, the issue is still present.

I upgraded to the latest xenial-updates packages as of 10/24/16 and the issue is still present.

I then re-installed with 16.10 and the issue is still present. I guess I'll open a new bug for 16.10 and refer back to this one.

Changed in systemd (Ubuntu):
status: Expired → Confirmed
description: updated
Changed in systemd (Ubuntu):
importance: Undecided → High

Seeing a similar issue here with so called "diskless" clients. That is, all of their filesystems are network mounts including the root filesystem. These systems use iscsi targets for the root filesystem and during shutdown the network is turned off before the file I/O operations are completed which causes the system to hang with filesystem IO errors.

I'm currently looking to see if there is some way to make systemd keep the network up no matter what, all the way until the machine physically reboots or shuts down (ie. disable the network shutdown completely).

This worked in 14.04 and got broken in 16.04 apparently due to the switch to systemd as far as I can tell (great tool, isn't it?).

Matt Schulte (gimpyestrada) wrote :

Any idea on this one folks?

Nish Aravamudan (nacc) wrote :

So let's separate out two issues in this bug.

@gimpeystrada, I believe your issue was always with iSCSI disks that were not in-use as the root/boot disk, correct? Looking at your fstab, I think also you don't have multipath over iSCSI, correct?

For anyone else indicating this bug is also affecting them, if @gimpeystrada replies in the affirmative to the above (Not using iSCSI for root disk, not using multipath), then I would like to use separate bugs for those issues. They may all have the same root-cause, but it becomes rather noisy, otherwise.

@gimpestrada, have you tried 16.10 or 17.04 to see if it has been fixed already? Just curious.

Changed in systemd (Ubuntu):
assignee: nobody → Nish Aravamudan (nacc)
Matt Schulte (gimpyestrada) wrote :

iSCSI disks are not the root/boot disks.

Multipath IS in use. Hence the /dev/mapper at the beginning of the disk names.

Yes I have tried 16.10, it is still present and I have another bug for that release. https://bugs.launchpad.net/bugs/1636862

Nish Aravamudan (nacc) wrote :

@gimpeystrada: thank you for the prompt response!

Err, right on multipath. Is it at all possible to test without multipath in use? To help narrow down if it's really in iSCSI or if it's in the multipath layer?

Note additional bugs aren't required (I understand at some point this one may have expired, but we'll end up creating tasks here for all affected releases).

Matt Schulte (gimpyestrada) wrote :

It is possible to test without multipath, this will take some time as I no longer have anything with 16.04 installed. For fun I may go ahead and try with 17.04 so we can figure out if it still exists then we'll kind of kill two birds with one stone.

Nish Aravamudan (nacc) wrote :

No problem -- also, you can just test in 16.10 without multipath. I can spend time reproducing it, etc. later to verify whatever fix we come up with is correct.

Matt Schulte (gimpyestrada) wrote :

Didn't take as long as I thought. Installed the most recent build of Zesty (zesty-server-amd64.iso 2017-02-22 06:54 676M).

I was able to:

1. Confirm the issue still exists.
2. Confirm that the issue is not caused by multipath (i.e. it still occurs during reboot when multipathing is disabled).

Scott (scopa) wrote :

Fresh install 16.04 download ISO from ubuntu
Can confirm this issue as well.
If I manually log out of the iscsi session reboot no issue
If I am logged into iscsi session reboot hangs. Need to physically power cycle server

On Wed, May 24, 2017 at 1:45 PM, Scott <email address hidden> wrote:
> Fresh install 16.04 download ISO from ubuntu
> Can confirm this issue as well.
> If I manually log out of the iscsi session reboot no issue

Interesting data point.

> If I am logged into iscsi session reboot hangs. Need to physically power cycle server

I think the open-iscsi or iscsid services have a killall sessions
portion when the service shuts down. I wonder if that's either a) not
running or b) somehow incorrect. I could imagine for iSCSI root it'd
be an issue, but for non-root, I'm not sure why that's not working.

Nish Aravamudan (nacc) wrote :
Download full text (3.4 KiB)

@gimpeystrada: Thank you for providing your journal logs (in the future, providing the raw journal file is better, as then we can run `journalctl` on it locally). From the logs:

Apr 26 08:52:13 ICTM1612S02H1 iscsiadm[3234]: iscsiadm: initiator reported error (8 - connection timed out)
Apr 26 08:52:13 ICTM1612S02H1 iscsiadm[3234]: iscsiadm: Could not log into all portals
Apr 26 08:52:13 ICTM1612S02H1 iscsiadm[3234]: :0113,3260] successful.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Child 3234 belongs to open-iscsi.service
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Main process exited, code=exited, status=8/n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Changed start -> failed
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=936 reply_cookie=0 error=n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd-logind[2801]: Got message type=signal sender=:1.2 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=936 reply_cookie=0 error=n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Job open-iscsi.service/start finished, result=failed
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: Failed to start Login to default iSCSI targets.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=937 reply_cookie=0 error=n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Unit entered failed state.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Failed with result 'exit-code'.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: cgroup is empty

I *think* the issue in this case is that open-iscsi.service is actually in a failed state before shutdown, which I think means it does not run the ExecStop commands? Can you verify that is the case (boot your system, `systemctl is-system-running` should indicate "degraded", `systemctl status open-iscsi.service` should indicate Failed). The problem is that the open-iscsi.service has:

ExecStartPre=/bin/systemctl --quiet is-active iscsid.service
ExecStart=/sbin/iscsiadm -m node --loginall=automatic
ExecStart=/lib/open-iscsi/activate-storage.sh
ExecStop=/lib/open-iscsi/umountiscsi.sh
ExecStop=/bin/sync
ExecStop=/lib/open-iscsi/logout-all.sh

This indicates it, on 'start' of the service, it will try to login to and active all configured iSCSI targets (the two ExecStart lines). However, if either of those fail (as they did in the journal in this case), the ExecStop lines *do not* run. From `man systemd.service`:

           Note that if any of the commands specified in ExecStartPre=,
           ExecStart=, or ExecStartPost= fail (and are not prefixed with "-",
           see above) or time out before the service is fully up, execution
           continues with commands specified in ExecStopPost=, the commands in
           ExecStop= are skipped.

So, I think that we should be using...

Read more...

Matt Schulte (gimpyestrada) wrote :

My test configuration has long since been re-tasked. I will eventually be able to come back and test as you ask, but it may take a few weeks. If anyone else on the bug has their systems up and available, please test and reply.

Changed in systemd (Ubuntu Xenial):
assignee: nobody → Dimitri John Ledkov (xnox)
importance: Undecided → High
Changed in systemd (Ubuntu Xenial):
status: New → Confirmed
Changed in systemd (Ubuntu Xenial):
assignee: Dimitri John Ledkov (xnox) → nobody
Changed in systemd (Ubuntu Xenial):
assignee: nobody → Rafael David Tinoco (inaddy)
status: Confirmed → In Progress
22 comments hidden view all 102 comments
Rafael David Tinoco (inaddy) wrote :

Hello Matt,

Good to read your feedback. When you have multipath AND iscsi, the error propagation timeout can be given by one or both, that is why I explained both. This issue is not an iscsi or multipath problem, it is a systemd problem related to the fact that the network is removed before the disks are unmounted. Good to know you don't have containers, that is something that could hold the disk superblock reference as well (even not having containers, filesystems mounted by _netdev can hold reference by different mount namespaces started by systemd itself, not the case here).

Could you do me a favor. Are you reproducing this issue in a constant basis ? If you are, could you please reduce your replacement_timeout to 0 and see if you can reproduce it again ? Please be aware that, after changing replacement_timeout = 0 in iscsid.conf, you will have to change the parameter for the disks already discovered:

## example

Change iscsid.conf AND:

$ sudo iscsiadm -m node
192.168.48.1:3260,1 iqn.2017.tgtd:tid2.lun
192.168.48.1:3260,1 iqn.2017.tgtd:tid1.lun

$ sudo iscsiadm -m node -o show | grep node.session.timeo.replacement_timeout
node.session.timeo.replacement_timeout = 20
node.session.timeo.replacement_timeout = 20

$ sudo iscsiadm -m node -o update -n node.session.timeo.replacement_timeout -v 0
$ sudo iscsiadm -m node -o show | grep node.session.timeo.replacement_timeout
node.session.timeo.replacement_timeout = 0
node.session.timeo.replacement_timeout = 0

This will cause the I/O errors to be propagated right away - when systemd disconnects the interface - and cause the later umount - done by other systemd service - to succeed without waiting for pagecache timeout + iscsi timeout (20 for you). Hopefully the "0" will be fast enough to make systemd to continue all the times.

Could you please let me know if it mitigates the issue ? If it does, I'll work in the actual fix. This will just confirm hypothesis and help in diagnose (and serve as a workaround if anyone cares).

Thank you
-Rafael

Matt Schulte (gimpyestrada) wrote :

Please let me explain who I am and what we do to possibly shed some light on this.

I am an Interoperability engineer at NetApp. We test our storage products with many popular operating systems, adapters, protocols and switches, in varying combinations.

We test software initiator iscsi continuously (on other distros) with exactly the same configuration as I have described above and have never seen this behavior. Indeed whatever the problem is seems fixed in
Ubuntu upstream as I indicated above. This is also new behavior introduced between 14.04 and 16.04 because it did not occur in older versions.

The purpose of me asking to get this issue resolved is not to fix MY system, it is to fix it for my customers' systems. We will not post support for Ubuntu 16.04 (iscsi) without resolving this issue so that we don't put our customers in the same position.

With all that said, I do not currently have a configuration available or setup to reproduce this problem. If it is required, I can add it to the queue. When it is up and running, it reproduces every single time.

Other folks on the thread may actually have it up and running right now and be able to test the immediate fail before I get to it.

Rafael David Tinoco (inaddy) wrote :
Download full text (3.2 KiB)

Hypothesis,

Test (1) - The error is NEVER propagated to upper layers:

# xfs and ext4 mounted automatically

inaddy@iscsihang:~$ mount | grep _netde
/dev/sda1 on /ext4 type ext4 (rw,relatime,stripe=32,data=ordered,_netdev)
/dev/sdb1 on /xfs type xfs (rw,relatime,attr2,inode64,noquota,_netdev)

# no error propagation

inaddy@iscsihang:~$ sudo iscsiadm -m node -o show | grep timeo.replace
node.session.timeo.replacement_timeout = -1
node.session.timeo.replacement_timeout = -1

# target server can't give any more packets to guest:

inaddy@machete:~$ sudo iptables -A INPUT -s 192.168.49.8 -p tcp --destination-port 3260 -j DROP

# reboot can't succeed

inaddy@iscsihang:~$ sudo reboot

[ 27.596135] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294896692, last ping 4294897944, now 4294899196
[ 27.628109] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294896700, last ping 4294897952, now 4294899204

Systemd hangs forever:

[ OK ] Stopped target Remote File Systems.
         Unmounting /ext4...
         Unmounting /xfs...

OBS: There is a tight relationship in between connection disappearing before the umount service runs and the capability of systemd to shutdown the machine entirely. I would say that, in case of no error propagation, is even worse since kernel would be locked up forever:

[ 240.132208] INFO: task systemd:1094 blocked for more than 120 seconds.
[ 240.133499] Not tainted 4.4.0-93-generic #116-Ubuntu
[ 240.134544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.136092] INFO: task umount:1199 blocked for more than 120 seconds.
[ 240.137262] Not tainted 4.4.0-93-generic #116-Ubuntu
[ 240.138302] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.139742] INFO: task umount:1201 blocked for more than 120 seconds.
[ 240.140898] Not tainted 4.4.0-93-generic #116-Ubuntu
[ 240.141953] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Systemd is still trying...

[ OK ] Unmounted /ext4.
[ OK ] Unmounted /xfs.
[ OK ] Stopped File System Check on /dev/disk/by-label/XFS.
[ OK ] Stopped File System Check on /dev/disk/by-label/EXT4.
[ OK ] Removed slice system-systemd\x2dfsck.slice.
[ OK ] Stopped target Remote File Systems (Pre).
         Stopping Login to default iSCSI targets...

[ 360.140109] INFO: task systemd:1094 blocked for more than 120 seconds.
[ 360.141219] Not tainted 4.4.0-93-generic #116-Ubuntu
[ 360.142100] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 360.143377] INFO: task umount:1199 blocked for more than 120 seconds.
[ 360.144451] Not tainted 4.4.0-93-generic #116-Ubuntu
[ 360.145333] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 360.146576] INFO: task umount:1201 blocked for more than 120 seconds.
[ 360.147586] Not tainted 4.4.0-93-generic #116-Ubuntu
[ 360.148472] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

This will happen forever. I still have to find a way of causing systemd to shutdown network and cause this hang because error, li...

Read more...

Rafael David Tinoco (inaddy) wrote :

Hypothesis,

Test (2) - The error is propagated to upper layers after X seconds.

In this case I'm testing:

$ sudo iscsiadm -m node -o show | grep node.session.timeo.replac
node.session.timeo.replacement_timeout = 60
node.session.timeo.replacement_timeout = 60

And same "locking" behavior is observed, but for a certain period of time. Now, shutdown procedure stayed locked for exact 60 seconds before allowing the machine to shutdown. This means that the systemd logic NEEDS either a clear shutdown, for the _netdev filesystems, OR the error to be propagated so the umount() can go on:

[ 17.612020] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294894197, last ping 4294895448, now 4294896700
[ 17.644128] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294894204, last ping 4294895456, now 4294896708

<60 seconds>

[ 78.044264] sd 2:0:0:1: rejecting I/O to offline device
[ 78.045297] blk_update_request: I/O error, dev sdb, sector 0
[ 78.046315] sd 3:0:0:1: rejecting I/O to offline device
[ 78.046357] sd 2:0:0:1: rejecting I/O to offline device
[ 78.046363] blk_update_request: I/O error, dev sdb, sector 0
[ 78.046487] sd 2:0:0:1: rejecting I/O to offline device
[ 78.050117] blk_update_request: I/O error, dev sda, sector 20481074
[ 78.051219] XFS (sda1): metadata I/O error: block 0x1387c32 ("xlog_iodone") error 5 numblks 64
[ 78.052727] XFS (sda1): Log I/O Error Detected. Shutting down filesystem
[ 78.053901] XFS (sda1): Please umount the filesystem and rectify the problem(s)
[ 78.060723] sd 3:0:0:1: rejecting I/O to offline device
[ 78.061759] blk_update_request: I/O error, dev sda, sector 0

<error was propagated>
<filesystems can proceed with umount>

[ OK ] Unmounted /ext4.
[ OK ] Unmounted /xfs.
[ OK ] Stopped File System Check on /dev/disk/by-label/XFS.
[ OK ] Stopped File System Check on /dev/disk/by-label/EXT4.
[ OK ] Removed slice system-systemd\x2dfsck.slice.

Rafael David Tinoco (inaddy) wrote :

Might be unrelated, but I discovered that:

## root namespace

root@iscsihang:~$ mount | egrep -E "sd.*(xfs|ext4)"
/dev/sda1 on /ext4 type ext4 (rw,relatime,stripe=32,data=ordered,_netdev)
/dev/sdb1 on /xfs type xfs (rw,relatime,attr2,inode64,noquota,_netdev)

## specific mnt namespace

root@iscsihang:~$ ip netns exec testnamespace /bin/bash
root@iscsihang:~$ sudo umount /xfs
root@iscsihang:~$ sudo umount /ext4

## root namespace

root@iscsihang:~$ mount | egrep -E "sd.*(xfs|ext4)"
/dev/sda1 on /ext4 type ext4 (rw,relatime,stripe=32,data=ordered)
/dev/sdb1 on /xfs type xfs (rw,relatime,attr2,inode64,noquota)

When a mount namespace (mntns) umounts a --shared filesystem from the root namespace, the _netdev flag is gone. That might stop init scripts from correctly identifying when those filesystems should be umounted (before the network is gone, for example) -> this might be "another" bug.

Rafael David Tinoco (inaddy) wrote :
Download full text (3.9 KiB)

Okay, besides all crap I posted - which are definitely right and will possibly be addressed in other bugs, and also show other problems that could make systemd to hang because of iscsi disks - I was able to cause the same symptom of this bug:

[ OK ] Stopped LSB: ebtables ruleset management.
[ OK ] Stopped Login to default iSCSI targets.
         Stopping iSCSI initiator daemon (iscsid)...
[ OK ] Stopped iSCSI initiator daemon (iscsid).
[ OK ] Stopped target Network is Online.
[ OK ] Stopped target Network.
         Stopping Raise network interfaces...
         Stopping ifup for bond0.20...
[ OK ] Stopped ifup for bond0.20.
         Stopping ifup for bond0.10...
[ OK ] Stopped ifup for bond0.10.
         Stopping ifup for bond0...
[ OK ] Stopped ifup for bond0.
         Stopping ifup for internal...
[ OK ] Stopped ifup for internal.
[ OK ] Stopped Raise network interfaces.
[ OK ] Stopped target Network (Pre).
[ OK ] Stopped Apply Kernel Variables.
[ OK ] Stopped Load Kernel Modules.
[ OK ] Stopped target Local File Systems.
         Unmounting /run/cgmanager/fs...
         Unmounting /run/user/1000...
         Unmounting /xfs...
         Unmounting /ext4...
[ OK ] Unmounted /run/cgmanager/fs.
[ OK ] Unmounted /run/user/1000.

[ 522.972114] connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295020535, last ping 4295021788, now 4295023040
[ 522.974085] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295020535, last ping 4295021788, now 4295023040
[ 522.975905] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295020535, last ping 4295021788, now 4295023040
[ 522.976087] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295020535, last ping 4295021788, now 4295023040

[ 720.156456] INFO: task umount:11860 blocked for more than 120 seconds.
[ 720.157477] Not tainted 4.4.0-93-generic #116-Ubuntu
[ 720.158282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 720.159695] INFO: task umount:11861 blocked for more than 120 seconds.
[ 720.160680] Not tainted 4.4.0-93-generic #116-Ubuntu
[ 720.161587] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 733.180181] sd 2:0:0:1: timing out command, waited 180s
[ 733.181191] blk_update_request: I/O error, dev sdc, sector 0
[ 733.182164] sd 4:0:0:1: timing out command, waited 180s
[ 733.183084] blk_update_request: I/O error, dev sdb, sector 0

[ OK ] Unmounted /ext4.
[ OK ] Unmounted /xfs.
[ OK ] Reached target Unmount All Filesystems.
[ OK ] Stopped target Local File Systems (Pre).
[ OK ] Stopped Create Static Device Nodes in /dev.
[ OK ] Stopped Remount Root and Kernel File Systems.
[ OK ] Reached target Shutdown.
[ 805.018886] systemd-shutdown[1]: Failed to finalize DM devices, ignoring

<system hangs forever>

And that is likely because I added 2 paths to each of my nics - using multipath, likely unrelated - and configured bonding + vlan within "ifupdown" - likely what causes this. After that change, I was able to make systemd to freeze in the same (likely) output as of this bug (and the one...

Read more...

Matt Schulte (gimpyestrada) wrote :

As I said I don't have one up right now, but Comment 3 above has my interfaces for one of the recreates.

Nothing bothered me in any way. I wanted to make sure that you knew that I wasn't one guy who was looking to fix his one system, that's all.

You said "Usually those type of "partnership" product enablements/fixes"...what do you mean by that? Is there some avenue that I could be pursuing to get my bugs addressed sooner?

Rafael David Tinoco (inaddy) wrote :

This is what I discovered:

This systemd issue is caused because there are iscsi connections still logged in (not because of multipath, bonding, vlans, etc). One way of reproducing this is by simply editing the open-iscsi.service unit and disabling the logout script. It will, for me in all times, cause the same hang as we are talking to.

inaddy@iscsihang:~$ sudo systemctl edit --full open-iscsi.service

...
ExecStop=/lib/open-iscsi/umountiscsi.sh
ExecStop=/bin/sync
#ExecStop=/lib/open-iscsi/logout-all.sh

But of course no one is doing this. So, reading logout-all you will discover that it logs out all paths but it has some caveats: ISCSI_ROOT_KEEP_ALL_SESSIONS_AT_SHUTDOWN (from /etc/default/open-iscsi) and /run/open-iscsi/shutdown-keep-sessions would cause those iscsi sessions not to be logged out.

Still, I wasn't using neither to reproduce so I found out something else. Disabling both systemd services:

inaddy@iscsihang:~$ systemctl list-unit-files --full | grep iscsi
iscsi.service disabled (alias for iscsid)
iscsid.service disabled
open-iscsi.service disabled

I was still able to reproduce the issue:

http://pastebin.ubuntu.com/25388348/

By issuing a logout when the iscsid daemon is gone. It looks like iscsid is lost after it has died/been killed on the sessions that were previously established. If you try to login again, it will be totally lost when trying to logout, causing leftovers (opened iscsi sessions) causing systemd to hang (even not having daemons running, in this case).

I'm investigating:

- Should systemd hang on those leftovers ? Since there are caveats that explicitly allow you to leave opened sessions, for / for example, I doubt it.

- Does iscsid daemon contain a bug that causes sessions to remain un-mapped and opened if daemon is restarted without a logout of previously opened sessions ?

Now that I'm able to reproduce this at will, I'll be able to answer those questions soon.

Best,
Rafael

PS: Matt, I asked partnership program managers to reach out to understand better your future needs.

Matt Schulte (gimpyestrada) wrote :

Excellent, thank you for the details.

Rafael David Tinoco (inaddy) wrote :

For now, the workaround for this bug is this:

inaddy@iscsihang:~$ sudo su -

# disconnect what iscsid has mapped
root@iscsihang:~$ iscsiadm -m node --logoutall=all

# disconnect all leftovers
root@iscsihang:~$ for file in `find /sys/devices -name *delete`; do echo 1 > $file; done

# shutdown
root@iscsihang:~$ shutdown -h now

(as long as you don't have iscsi root device)

Concentrating efforts in why systemd doesn't finish when iscsi connections and paths are online. Will check why (even recent version) open-iscsi looses track of connections if killed/restarted by a new login.

Rafael David Tinoco (inaddy) wrote :
Download full text (6.4 KiB)

Okay, I removed almost everything out of equation:

- removed networking interfaces from systemd
- removed open-iscsi login/logout logic from systemd
- used a single network interface for the logins, on a single iscsi portal

was still able to reproduce the issue by:

- doing a simple login after configuring the network device:
  ./net-start.sh ; sleep 1 ; sudo iscsiadm -m node --login

- shutting down the network device before any logout:
  ./net-stop.sh ; sudo shutdown -h now

There was no systemd service order in play (in between open-iscsi, iscsid and networking / ifupdown scripts) and I was able to cause the same issue 100%, same messages, same symptoms. This tells us that, definitely turning down the network while still logged iscsi sessions is causing the hangs (check #3 for why).

Summary of Causes

1)

network shuts down -> iscsi luns are logged out (attempt) -> iscsid daemon is shutdown
or
network shuts down -> iscsi daemon is shutdown -> iscsi luns are logged out

- logout would fail due to lack of network
- iscsi daemon would die and/or logout wouldn't work
- iscsi sessions would still be there
- system would hang (transport layer can't logout)

2)

iscsi daemon is shutdown -> iscsi luns are logged out -> network shuts down

- this would cause the bug i'm mentioning: daemon dies, logout doesn't work.
- some iscsi sessions would still be there
- system would hang (transport layer can't logout)
- (to check: open-iscsi.service ExecStop= running in parallel ?!?)

3)

I used KVM watchdog virtual device + NMI from host to crash the guest during the hang

http://pastebin.ubuntu.com/25394744/

And, finally, the hang is because kernel is hanged during its shutdown logic.

> 0 0 0 ffffffff81e11500 RU 0.0 0 0 [swapper/0]
> 0 0 1 ffff8801a6a20e00 RU 0.0 0 0 [swapper/1]
> 0 0 2 ffff8801a6a21c00 RU 0.0 0 0 [swapper/2]
> 0 0 3 ffff8801a6a22a00 RU 0.0 0 0 [swapper/3]

ALL CPUs were idling during the hang.

crash> runq
CPU 0 RUNQUEUE: ffff8801ae016e00
  CURRENT: PID: 0 TASK: ffffffff81e11500 COMMAND: "swapper/0"
  RT PRIO_ARRAY: ffff8801ae016fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae016e98
     [no tasks queued]

CPU 1 RUNQUEUE: ffff8801ae096e00
  CURRENT: PID: 0 TASK: ffff8801a6a20e00 COMMAND: "swapper/1"
  RT PRIO_ARRAY: ffff8801ae096fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae096e98
     [no tasks queued]

CPU 2 RUNQUEUE: ffff8801ae116e00
  CURRENT: PID: 0 TASK: ffff8801a6a21c00 COMMAND: "swapper/2"
  RT PRIO_ARRAY: ffff8801ae116fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae116e98
     [no tasks queued]

CPU 3 RUNQUEUE: ffff8801ae196e00
  CURRENT: PID: 0 TASK: ffff8801a6a22a00 COMMAND: "swapper/3"
  RT PRIO_ARRAY: ffff8801ae196fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae196e98
     [no tasks queued]

NO task was scheduled to run.

crash> ps -u
   PID PPID CPU TASK ST %MEM VSZ RSS COMM
      1 0 1 ffff8801a69b8000 UN 0.0 15484 3204 systemd-shutdow

There was just ONE SINGLE user task running (systemd-shutdown)

crash> set ffff8801a69b8000
 ...

Read more...

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1569925

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Xenial):
status: New → Incomplete
Changed in linux (Ubuntu Zesty):
status: New → Incomplete
Rafael David Tinoco (inaddy) wrote :

## SUMMARY 1

Affects open-iscsi (NOT systemd):

* ISCSID NEEDS TO BE UP & RUNNING (AND LOGGED TO PORTAL) FOR THE LOGOUT TO WORK

https://pastebin.canonical.com/197359/

Daemon shutdown documentation in iscsiadm command says (about stopping iscsid daemon with iscsiadm -k):
""" ... This will immediately stop all iscsid operations and shutdown iscsid. It does not logout any sessions. Running this command is the same as doing "killall iscsid". Neither should normally be used, because if iscsid is doing error recovery or if there is an error while iscsid is not running, the system may not be able to recover. This command and iscsid's SIGTERM handling are experimental. """

Basically meaning you're not able to shutdown the daemon in a clean manner if you have connected sessions. And basically, checking the pastebin, you see the need for running iscsid daemon to logout through iscsiadm command.

So from:

(systemctl edit --full open-iscsi.service)
...
Wants=network-online.target remote-fs-pre.target iscsid.service
After=network-online.target iscsid.service
Before=remote-fs-pre.target
...
ExecStartPre=/bin/systemctl --quiet is-active iscsid.service
ExecStart=/sbin/iscsiadm -m node --loginall=automatic
ExecStart=/lib/open-iscsi/activate-storage.sh
ExecStop=/lib/open-iscsi/umountiscsi.sh
ExecStop=/bin/sync
ExecStop=/lib/open-iscsi/logout-all.sh
...

and

(systemctl edit --full iscsid.service)
...
Wants=network-online.target remote-fs-pre.target
Before=remote-fs-pre.target
After=network.target network-online.target
...
ExecStartPre=/lib/open-iscsi/startup-checks.sh
ExecStart=/sbin/iscsid
ExecStop=/sbin/iscsiadm -k 0 2
...

If umountiscsi.sh (it doesn't kill applications using a mount point) can't umount a mount point (from fstab generator or a mount unit) that is still being used, the logout-all.sh will fail for logging out the session of the disk being used. But this service unit would be considered stopped, allowing "iscsid.service" unit to also be shutdown (killing the iscsid daemon and leaving an opened iscsi session).

This would be problematic for iscsi root disks AND for the kernel issue (comment #73), and this is being taken care by Francis Ginther (fginther) in the next iscsi SRU: https://pastebin.canonical.com/196463/ by creating an iscsi-cleanup.service that will run a script that logs out all remaining iscsi sessions (including /) right before kernel shutdown is called by systemd-shutdown like showed in comment #73.

Changed in open-iscsi (Ubuntu Xenial):
importance: Undecided → Medium
status: New → In Progress
Changed in open-iscsi (Ubuntu Zesty):
importance: Undecided → Medium
status: New → In Progress
Changed in open-iscsi (Ubuntu Artful):
importance: Undecided → Medium
status: New → In Progress
Changed in linux (Ubuntu Xenial):
importance: Undecided → Medium
status: Incomplete → In Progress
Changed in linux (Ubuntu Zesty):
importance: Undecided → Medium
status: Incomplete → In Progress
Changed in linux (Ubuntu Artful):
importance: Undecided → Medium
status: Incomplete → In Progress
no longer affects: systemd (Ubuntu)
Changed in systemd (Ubuntu Xenial):
status: In Progress → Triaged
Changed in systemd (Ubuntu Zesty):
importance: Undecided → Medium
status: New → Triaged
Changed in systemd (Ubuntu Artful):
assignee: Nish Aravamudan (nacc) → nobody
importance: High → Medium
status: Confirmed → Triaged
no longer affects: systemd (Ubuntu Xenial)
no longer affects: systemd (Ubuntu Zesty)
no longer affects: systemd (Ubuntu Artful)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in linux (Ubuntu Zesty):
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in linux (Ubuntu Artful):
assignee: nobody → Rafael David Tinoco (inaddy)
3 comments hidden view all 102 comments
Rafael David Tinoco (inaddy) wrote :

2 things for previous comment #75:

1)
I pasted wrong pastebin URL. I'll describe the fix: its a service unit file that runs at the end of shutdown, right before it, re-using /run (tmpfs). Service has a preparation script that copies binaries from initrd image - so there is a minimum execution environment available - and runs the second script. Second script initializes iscsid daemon and waits in a loop for all the portals to be ONLINE and existing sessions to be LOGGED. After that it does the logout on all existing sessions (this case this would be leftovers from my previous comment AND / root disks).

2)
I tested this approach and it works good for iscsi root disks because there is a network interface that stays connected all the time. In the case of leftovers from previous unsuccessful umount attempts, the network.target would be already gone (with networkd / ifupdown interfaces already shutdown). IF I keep my interfaces configured, then this approach works for root disks AND for leftovers. For this approach to be successful for this case, this "cleanup" service would have to raise existing interfaces up again for the iscsid to login and logout to work (OR the kernel hang be resolved, then leftovers would be just left there, without logouts).

Changed in open-iscsi (Ubuntu Xenial):
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in open-iscsi (Ubuntu Zesty):
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in open-iscsi (Ubuntu Artful):
assignee: nobody → Rafael David Tinoco (inaddy)
tags: added: rls-aa-notfixing
Rafael David Tinoco (inaddy) wrote :

I created a workaround for this issue while its being worked on:

http://pastebin.ubuntu.com/25519820/

Create a file: /lib/systemd/system-shutdown/debug.sh and place the contents in it.

This workaround (based on some ideas from sil2100 and slangasek for iscsi / umount) is basically bringing the interfaces up and remounting / so it can finally cleanup everything.

Things to note:

- It ONLY RUNS if there are iscsi leftovers
- It uses ifupdown only for networking (/etc/network/interfaces)
- It has to remount / to do networking and to run iscsid
- If it fails to bring network it will hang like before (kernel issue)
- It waits for iscsi to be logged in again (might take awhile to shutdown)
- If logout fails, it hangs again (unless the network is left configured, i could change it)

What is this script different then altering open-iscsi.service ?

It runs at the very end of systemd shutdown and it is very unlikely that there are any services holding references to iscsi mounts, disallowing them to be logged out.

Now I'll test Debian SID and check if this is found there, to open a bug in Debian project as well.
Before moving on into open-iscsi services - to create a cleanup unit file for the open-iscsi package, like this workaround - i'll dig into the kernel issue. I'm afraid no fix will be as good as making sure kernel let the queued I/O cmd go. I have also to make sure this workaround is changed to allow root iscsi to be logged out.

tags: added: id-598b6543459f8ccf5dfbc04c
Matthijs Hoekstra (mahoekst) wrote :

Even with the script a shutdown/reboot takes a very long time. Any expectations when a fix will be ready? I hit this all the time on 2 installed servers.

Rafael David Tinoco (inaddy) wrote :
Download full text (3.2 KiB)

Hello Matthijs

Unfortunately the best way to make this not to happen is by fixing the kernel hang situation, when kernel calls sd_sync_cache() to every configured device before the shutdown. There is a single I/O cmd hanging in all scsi paths and the I/O error is never propagated to block layer (despite iscsi having proper I/O error settings). I'm finishing analysing some kernel dumps so I can finally understand what is happening in the transport layer (this happens with more recent kernels also).

The workaround was to create a script that would restore the iscsi connection, wait for the login to happen again and the paths are back online, and cleanly logout, allowing the sd_sync_cache() operation to be finalized.

If you are facing this problem, I know for sure that your iscsi connections are not being finalized before the network is off. This means that you have to pay attention on how you configured your iscsi disks:

- guarantee that iscsiadm was configured with "interfaces" so it works on startup:

  sudo iscsiadm -m iface -I ens4 --op=new -n iface.hwaddress -v 52:54:00:b4:21:bb
  sudo iscsiadm -m iface -I ens7 --op=new -n iface.hwaddress -v 52:54:00:c2:34:1b

- the discovery/login has to be made AFTER the iscsiadm had interfaces added

  sudo iscsiadm -m discovery --op=new --op=del --type sendtargets --portal $SERVER1
  sudo iscsiadm -m discovery --op=new --op=del --type sendtargets --portal $SERVER2

  # iscsiadm -m node --loginall=automatic HAS TO WORK or else init scripts will fail

  http://pastebin.ubuntu.com/25894472/

- configure the volumes in /etc/fstab with "_netdev" parameter for systemd unit ordering

  LABEL=BLUE /blue ext4 defaults,_netdev 0 1
  LABEL=GREEN /green ext4 defaults,_netdev 0 1
  LABEL=PURPLE /purple ext4 defaults,_netdev 0 1
  LABEL=RED /red ext4 defaults,_netdev 0 1
  LABEL=YELLOW /yellow ext4 defaults,_netdev 0 1

You have to make sure open-iscsi and iscsid systemd units are started after the network is available and are stopped before they disappear. That might be your problem, if configuration above is correct.

inaddy@iscsihang:~$ systemctl edit --full iscsid.service
inaddy@iscsihang:~$ systemctl edit --full open-iscsi.service

The defaults are:

[Unit]
Description=iSCSI initiator daemon (iscsid)
Documentation=man:iscsid(8)
Wants=network-online.target remote-fs-pre.target
Before=remote-fs-pre.target
After=network.target network-online.target

and

[Unit]
Description=Login to default iSCSI targets
Documentation=man:iscsiadm(8) man:iscsid(8)
Wants=network-online.target remote-fs-pre.target iscsid.service
After=network-online.target iscsid.service
Before=remote-fs-pre.target

So you can see that iscsid.service runs BEFORE open-iscsi.service. In my case, I'm configuring network using rc-local.service (since this is my lab) and I had to guarantee the ordering also:

If, after configuring your system like this, you still face problems, you can use this script:

http://pastebin.ubuntu.com/25894592/

And provide me the DEBUG=/.shutdown.log file, created after its execution, attached to this launchpad case. Its likely that you will have hang iscsi connections for some reason (services ordering, lack of volume...

Read more...

Amanda Gartner (amandagartner) wrote :

I followed all of the setup steps, but was still encountering the ping timeout issue at reboot. I then ran the script http://pastebin.ubuntu.com/25894592/ you provided and am no longer hitting the issue. Was the script supposed to fix the issue? Please let me know. Thank you!

Rafael David Tinoco (inaddy) wrote :

Hello Amanda,

I'm actively working into this, will provide better feedback soon. The script is just a workaround for now, so people can shut down properly, if they are facing this.

Despite what userland does, the kernel will always potentially hang with iscsi sessions left opened during shutdown. If, after following the setup steps, you still faced the issue, its possible that your shutdown systemd unit ordering is wrong (your mount points are being unmounted before they are released). I'm working in the kernel issue, so, it shall cover all the userland-created situations.

Rafael David Tinoco (inaddy) wrote :

Long story short -> https://goo.gl/vPyh8C

Basically the block device enqueues the last request (a SYNC scsi command coming from sd_shutdown) for every scsi device there is on the system. Unfortunately, since the OS is shutting down, in between the block request and its execution, we have userland (systemd) killing iscsid, without proper logout, and/or removing the network.

What happens next is that the mid layer (SCSI) tries to deliver the request through the transport layer (iscsi_tcp_sw) but it fails since the transport layer checks the session status and finds out that the session is not in LOGIN state.

The default behaviour of the transport layer (iscsi_tcp_sw) in such situation is to tell the mid-layer to keep resetting the request timeout timer while it tries to recover (something that will never happen because the network is gone).

Changing that default behavior to state that the scsi command was NOT handled by the transport layer (iscsi_tcp_sw) implies in making the scsi timeout function to try to "abort" the scsi command, which also creates other commands that will timeout because of the transport layer.

Best scenario so far was to change BLK_EH_NOT_HANDLED for BLK_EH_HANDLED in the scsi_times_out function and make the kernel to be able to shutdown. By doing that, I'm confirming to block device something that DID NOT happen, meaning that the command never left the transport layer.

This might be ONE of possible ways to fix this: I can mark in the transport layer that I have timed-out DURING the shutdown procedure, cancelling all the block device requests without having to invoke the scsi error handling mechanism, generating more traffic in transport layer (what would also cause more timeouts, causing a loop in the problematic sequence).

Anyway, I'll get back to this next week and hopefully identify best course of action.

Rafael David Tinoco (inaddy) wrote :

I have proposed a fix to the kernel iscsi transport code. You can follow upstream discussion, if any, in the following address:

https://marc.info/?t=151268397600008&r=1&w=2

The patch proposal is here:

https://marc.info/?l=linux-scsi&m=151268396227285&w=2

If upstream chooses to accept my proposal, which fixes the issue like demonstrated here:

https://marc.info/?l=linux-scsi&m=151274369910906&w=2

Then I'll propose Canonical kernel-team a SRU based on the upstream patch.

For now this case is on hold for upstream discussion/decision.

Thank you
-Rafael

That's awesome sir, thank you for your diligence.

Rafael David Tinoco (inaddy) wrote :

The scsi/iscsi team has accepted by fix proposal and merged the patch for 4.16 merge window:

https://<email address hidden>/msg10111.html

I'll propose the cherry-pick as SRU for Ubuntu kernels as soon as it is merged.

tags: added: sts
Rafael David Tinoco (inaddy) wrote :

Changed open-iscsi to opinion since I choose to fix the kernel instead of fixing Userland. No matter what you do in userland, the kernel had space for freezing and hanging in different scenarios depending on how userland disconnected the transport layer. I kept the "linux" source package as being affected and will SRU it through Ubuntu Kernel Team.

Changed in open-iscsi (Ubuntu Xenial):
status: In Progress → Opinion
Changed in open-iscsi (Ubuntu):
status: In Progress → Opinion
Changed in open-iscsi (Ubuntu Zesty):
status: In Progress → Opinion
Changed in open-iscsi (Ubuntu Artful):
status: In Progress → Opinion
description: updated
Changed in open-iscsi (Ubuntu Trusty):
status: New → Opinion
Changed in linux (Ubuntu Trusty):
status: New → In Progress
assignee: nobody → Rafael David Tinoco (inaddy)
importance: Undecided → Medium
Changed in open-iscsi (Ubuntu Trusty):
assignee: nobody → Rafael David Tinoco (inaddy)
importance: Undecided → Medium
no longer affects: open-iscsi (Ubuntu Zesty)
no longer affects: linux (Ubuntu Zesty)
Rafael David Tinoco (inaddy) wrote :

I have just submitted the SRU proposal to kernel team mailing list. I'll wait for the SRU to be made and this public bug to be marked as Fix Committed before verifying the fix again for the final release.

Seth Forshee (sforshee) on 2018-01-23
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Rafael David Tinoco (inaddy) wrote :

Commit has been merged upstream:

commit d754941225a7dbc61f6dd2173fa9498049f9a7ee
Author: Rafael David Tinoco <email address hidden>
Date: Thu Dec 7 19:59:13 2017 -0200

    scsi: libiscsi: Allow sd_shutdown on bad transport

    If, for any reason, userland shuts down iscsi transport interfaces
    before proper logouts - like when logging in to LUNs manually, without
    logging out on server shutdown, or when automated scripts can't
    umount/logout from logged LUNs - kernel will hang forever on its
    sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all
    still existent paths.

    PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow"
     #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
     #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
     #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
     #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
     #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
     #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
     #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
     #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
     #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
     #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c

    This happens because iscsi_eh_cmd_timed_out(), the transport layer
    timeout helper, would tell the queue timeout function (scsi_times_out)
    to reset the request timer over and over, until the session state is
    back to logged in state. Unfortunately, during server shutdown, this
    might never happen again.

    Other option would be "not to handle" the issue in the transport
    layer. That would trigger the error handler logic, which would also need
    the session state to be logged in again.

    Best option, for such case, is to tell upper layers that the command was
    handled during the transport layer error handler helper, marking it as
    DID_NO_CONNECT, which will allow completion and inform about the
    problem.

    After the session was marked as ISCSI_STATE_FAILED, due to the first
    timeout during the server shutdown phase, all subsequent cmds will fail
    to be queued, allowing upper logic to fail faster.

    Signed-off-by: Rafael David Tinoco <email address hidden>
    Reviewed-by: Lee Duncan <email address hidden>
    Signed-off-by: Martin K. Petersen <email address hidden>

Applied in Bionic:

https://lists.ubuntu.com/archives/kernel-team/2018-January/089509.html

And Fix Committed.

Ack-ed for Trusty, Xenial, Artful:

https://lists.ubuntu.com/archives/kernel-team/2018-January/089774.html
https://lists.ubuntu.com/archives/kernel-team/2018-February/089809.html

Waiting to be Fix Committed there. A soon as it is released to -proposed I'll verify it and provide feedback so it can be Fix Released.

Changed in linux (Ubuntu Artful):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed
Matt Schulte (gimpyestrada) wrote :

I apologize for my ignorance, but now that tall these check ins have occurred, how do I know when an installable package has been built and is available in the various repos?

And thank you @inaddy for the diligence getting this one resolved.

Rafael David Tinoco (inaddy) wrote :

Hello Matt, this is part of the SRU procedure (https://wiki.ubuntu.com/StableReleaseUpdates). For the kernel, the SRU is a bit different (https://wiki.ubuntu.com/Kernel/Dev/StablePatchFormat, https://wiki.ubuntu.com/Kernel/StableReleaseCadence). Now that the bug has been marked as "Fix Committed" is means that the patch has been merged in the "master-next" branch of every one of the following Ubuntu versions:

Changed in linux (Ubuntu Artful):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed

When the new kernel package is built, it will be placed into -proposed for each of the Ubuntu versions and you will be able to test the kernel and provide feedback here. Since I was the one that fixed it, I'll verify each of the kernels to make sure it fixed the issue. After verification, and at least 7 days in -proposed, the packages shall be moved to -updates.

Hope it helps you understand the fixing steps. Cheers o/

Rafael David Tinoco (inaddy) wrote :

BTW, kernel team usually updates all affecting cases with the version that was built that address this issue (so you know what to text in -proposed).

Matt Schulte (gimpyestrada) wrote :

Thanks, and will notifications appear here in the bug during these stages?

Rafael David Tinoco (inaddy) wrote :

Yep, soon kernel team will advise (here) about this patch been released in a specific kernel version that is in -proposed and will call out for testing/verification. I'll give all feedback here also and will mark, if appropriate, case as "verification-done" so new kernel is marked as solving this issue.

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-artful' to 'verification-done-artful'. If the problem still exists, change the tag 'verification-needed-artful' to 'verification-failed-artful'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-artful
tags: added: verification-needed-xenial

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Rafael David Tinoco (inaddy) wrote :

Hello Kleber,

Thanks for the feedback. Working in verification.

Rafael David Tinoco (inaddy) wrote :

# xenial - 4.4.0-112

[ OK ] Reached target Shutdown.
[ 70.272171] connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907361, last ping 4294908612, now 4294909864
[ 70.274189] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907361, last ping 4294908612, now 4294909864
[ 70.288082] connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907362, last ping 4294908614, now 4294909868
[ 70.290066] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907363, last ping 4294908614, now 4294909868
[ 70.292039] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907362, last ping 4294908614, now 4294909868

<hang for more than 5 minutes>

# xenial - 4.4.0-116

[ OK ] Reached target Shutdown.
[ 38.384175] connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899386, last ping 4294900640, now 4294901892
[ 38.386107] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899387, last ping 4294900640, now 4294901892
[ 38.388023] connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899387, last ping 4294900640, now 4294901892
[ 38.388159] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899386, last ping 4294900640, now 4294901892
[ 38.388159] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899387, last ping 4294900640, now 4294901892
[ 89.825203] reboot: Restarting system

-> xenial-verification-done

tags: added: verification-done-xenial
removed: verification-needed-xenial
Rafael David Tinoco (inaddy) wrote :

# xenial - 4.4.0-112

[ OK ] Reached target Shutdown.
[ 70.272171] connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907361, last ping 4294908612, now 4294909864
[ 70.274189] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907361, last ping 4294908612, now 4294909864
[ 70.288082] connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907362, last ping 4294908614, now 4294909868
[ 70.290066] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907363, last ping 4294908614, now 4294909868
[ 70.292039] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907362, last ping 4294908614, now 4294909868

<hang for more than 5 minutes>

# xenial - 4.4.0-116

[ OK ] Reached target Shutdown.
[ 38.384175] connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899386, last ping 4294900640, now 4294901892
[ 38.386107] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899387, last ping 4294900640, now 4294901892
[ 38.388023] connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899387, last ping 4294900640, now 4294901892
[ 38.388159] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899386, last ping 4294900640, now 4294901892
[ 38.388159] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899387, last ping 4294900640, now 4294901892
[ 89.825203] reboot: Restarting system

-> xenial-verification-done

--------

# artful - 4.13.0-32

[ OK ] Reached target Shutdown.
[ 194.277712] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294938304, last ping 4294939584, now 4294940864
[ 194.279684] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294938304, last ping 4294939584, now 4294940864
[ 194.281483] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294938304, last ping 4294939584, now 4294940864
[ 194.283265] connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294938304, last ping 4294939584, now 4294940864
[ 194.285067] connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294938304, last ping 4294939584, now 4294940864

<hang for more than 15 minutes>

# artful - 4.13.0-35

[ OK ] Reached target Shutdown.
[ 416.109989] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294993856, last ping 4294995136, now 4294996416
[ 416.111922] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294993856, last ping 4294995136, now 4294996416
[ 416.113869] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294993856, last ping 4294995136, now 4294996416
[ 416.115741] connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294993856, last ping 4294995136, now 4294996416
[ 416.117650] connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294993856, last ping 4294995136, now 4294996416
[ 469.614166] reboot: Restarting system

-> artful-verification-done

tags: added: verification-done-artful
removed: verification-needed-artful
Rafael David Tinoco (inaddy) wrote :

Checking if Trusty's fix is in -proposed (no comment given from Kernel Team about it)

Displaying first 40 and last 40 comments. View all 102 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions