Okay, I removed almost everything out of equation: - removed networking interfaces from systemd - removed open-iscsi login/logout logic from systemd - used a single network interface for the logins, on a single iscsi portal was still able to reproduce the issue by: - doing a simple login after configuring the network device: ./net-start.sh ; sleep 1 ; sudo iscsiadm -m node --login - shutting down the network device before any logout: ./net-stop.sh ; sudo shutdown -h now There was no systemd service order in play (in between open-iscsi, iscsid and networking / ifupdown scripts) and I was able to cause the same issue 100%, same messages, same symptoms. This tells us that, definitely turning down the network while still logged iscsi sessions is causing the hangs (check #3 for why). Summary of Causes 1) network shuts down -> iscsi luns are logged out (attempt) -> iscsid daemon is shutdown or network shuts down -> iscsi daemon is shutdown -> iscsi luns are logged out - logout would fail due to lack of network - iscsi daemon would die and/or logout wouldn't work - iscsi sessions would still be there - system would hang (transport layer can't logout) 2) iscsi daemon is shutdown -> iscsi luns are logged out -> network shuts down - this would cause the bug i'm mentioning: daemon dies, logout doesn't work. - some iscsi sessions would still be there - system would hang (transport layer can't logout) - (to check: open-iscsi.service ExecStop= running in parallel ?!?) 3) I used KVM watchdog virtual device + NMI from host to crash the guest during the hang http://pastebin.ubuntu.com/25394744/ And, finally, the hang is because kernel is hanged during its shutdown logic. > 0 0 0 ffffffff81e11500 RU 0.0 0 0 [swapper/0] > 0 0 1 ffff8801a6a20e00 RU 0.0 0 0 [swapper/1] > 0 0 2 ffff8801a6a21c00 RU 0.0 0 0 [swapper/2] > 0 0 3 ffff8801a6a22a00 RU 0.0 0 0 [swapper/3] ALL CPUs were idling during the hang. crash> runq CPU 0 RUNQUEUE: ffff8801ae016e00 CURRENT: PID: 0 TASK: ffffffff81e11500 COMMAND: "swapper/0" RT PRIO_ARRAY: ffff8801ae016fb0 [no tasks queued] CFS RB_ROOT: ffff8801ae016e98 [no tasks queued] CPU 1 RUNQUEUE: ffff8801ae096e00 CURRENT: PID: 0 TASK: ffff8801a6a20e00 COMMAND: "swapper/1" RT PRIO_ARRAY: ffff8801ae096fb0 [no tasks queued] CFS RB_ROOT: ffff8801ae096e98 [no tasks queued] CPU 2 RUNQUEUE: ffff8801ae116e00 CURRENT: PID: 0 TASK: ffff8801a6a21c00 COMMAND: "swapper/2" RT PRIO_ARRAY: ffff8801ae116fb0 [no tasks queued] CFS RB_ROOT: ffff8801ae116e98 [no tasks queued] CPU 3 RUNQUEUE: ffff8801ae196e00 CURRENT: PID: 0 TASK: ffff8801a6a22a00 COMMAND: "swapper/3" RT PRIO_ARRAY: ffff8801ae196fb0 [no tasks queued] CFS RB_ROOT: ffff8801ae196e98 [no tasks queued] NO task was scheduled to run. crash> ps -u PID PPID CPU TASK ST %MEM VSZ RSS COMM 1 0 1 ffff8801a69b8000 UN 0.0 15484 3204 systemd-shutdow There was just ONE SINGLE user task running (systemd-shutdown) crash> set ffff8801a69b8000 PID: 1 COMMAND: "systemd-shutdow" TASK: ffff8801a69b8000 [THREAD_INFO: ffff8801a69c0000] CPU: 1 STATE: TASK_UNINTERRUPTIBLE crash> bt PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow" #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5 #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199 #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604 #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10 #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7 #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7 #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c #10 [ffff8801a69c3dc0] device_shutdown at ffffffff8156112c #11 [ffff8801a69c3df0] kernel_power_off at ffffffff810a32f5 #12 [ffff8801a69c3e00] SYSC_reboot at ffffffff810a34d3 #13 [ffff8801a69c3f40] sys_reboot at ffffffff810a359e #14 [ffff8801a69c3f50] entry_SYSCALL_64_fastpath at ffffffff818431f2 RIP: 00007face7d8f856 RSP: 00007ffd4c271d08 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 00007ffd4c271240 RCX: 00007face7d8f856 RDX: 000000004321fedc RSI: 0000000028121969 RDI: fffffffffee1dead RBP: 00007ffd4c271230 R8: 0000000000001400 R9: 0000000000000005 R10: 00007face88d36c0 R11: 0000000000000202 R12: 00007ffd4c2713d0 R13: 000000b7e6795c83 R14: 00007ffd4c271c30 R15: 0000000000000001 ORIG_RAX: 00000000000000a9 CS: 0033 SS: 002b And it called the "kernel_power_off" logic from a system call. This logic iterates over all devices calling: if (dev->bus && dev->bus->shutdown) { dev->bus->shutdown(dev); } else if (dev->driver && dev->driver->shutdown) { dev->driver->shutdown(dev); } This called sd_shutdown, which called sd_sync_cache, which sent a SYNCHRONIZE_CACHE scsi op code to the device in question (one of the iscsi paths). scsi_execute called blk_execute_rq to put the request on device's queue, for execution. /* Prevent hang_check timer from firing at us during very long I/O */ hang_check = sysctl_hung_task_timeout_secs; if (hang_check) while (!wait_for_completion_io_timeout(&wait, hang_check * (HZ/2))); The default timeout for that operation is hang_check 120 seconds * 250/2 AND, "wait_for_completion_io_timeout" returns 0, !0 == 1 so.. this will run forever. So basically kernel is trying o complete SYNCHRONIZE_CACHE operation, the SCSI cmd is already in device's queue and there is no return (for obvious reasons). Obs 1) I have tested this with kernel 4.11 with the same symptom. I have to check now: is there any error propagation logic to cancel this so system can shutdown ? We are not getting a SCSI sense message because the "bus" (iscsi/tcp) is gone. Should we hard flush the device's queue ? Obs 2) This isn't necessarily related to systemd. IMO this should also be addressed in kernel, allowing a broken transport layer - specially tcp/ip - to allow the shutdown to happen. Obs 3) The cause of this can be related to systemd (if there are leftovers) or iscsid daemon (if it died and/or got restarted before logging out. is there a way of keeping history of sessions so it knows what to logout after daemon starts again ?). I'll get back to this next week, after reading more this dump.