Bug #1569925 “Shutdown hang on 16.04 with iscsi targets” : Bugs : open-iscsi package : Ubuntu

Revision history for this message

Ubuntu Foundations Team Bug Bot (crichton) wrote on 2016-04-13:

#1

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1569925/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags:

added: bot-comment

Brian Murray (brian-murray) on 2016-04-13

tags:	added: xenial
affects:	ubuntu → systemd (Ubuntu)

Revision history for this message

Martin Pitt (pitti) wrote on 2016-04-14:

#2

My first suspicion is that the interface that the iscsi device is on lives in /etc/network/interfaces (or interfaces.d/*) somewhere and thus is being shut down by networking.service. Ordinarily this network interface is being set up in initramfs by open-iscsi and should not otherwise be configured/touched by the OS (i. e. not by ifupdown, NetworkManager, or networkd).

- Which network interface is the iscsi device on, and how did you configure this?
- Please attach /etc/network/interfaces*

It would also be useful to attach a complete shutdown log. Do a reboot and attach /var/log/syslog, that should have sufficient data for the network interface shutdowns (it won't cover the open-iscsi error though as that happens too late, but we have that part in the screenshot).

Changed in systemd (Ubuntu):
status:	New → Incomplete

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2016-04-14:

#3

root@ICTM1612S02H1:/etc/network# cat interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eno1
iface eno1 inet dhcp

auto enp130s0
iface enp130s0 inet6 static
address fe80::192:12:21:115
netmask 64
mtu 9000

auto enp4s0f0
iface enp4s0f0 inet6 static
address fe80::192:12:22:115
netmask 64
mtu 9000

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2016-04-14:

#4

In the above, the two NICs used for iSCSI traffic are the enp130s0 and the enp4s0f0.

In case you were wondering I have another server running ipv4 and it also hangs.

Revision history for this message

Martin Pitt (pitti) wrote on 2016-04-14:

#5

Do you have your root partition on iscsi, or just some auxiliary ones like /opt?

If it's the root partition, then you can't put that on an interface which is "auto" in /etc/network/interfaces -- ifupdown will tear it down during shutdown and sever the connection, explaining the hang. In this case, open-iscsi's initramfs integration will already bring it up. (Hence my question how you set this up).

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2016-04-14:

#6

requested syslog Edit (6.6 MiB, text/plain)

Here is the requested syslog.

Reboot happens after:

Apr 14 09:49:42 ICTM1612S02H1 root: MDS: Rebooting Now

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2016-04-14:

#7

Nothing for the system is on iscsi. They are just extra volumes that I use for generating block or filesystem IO to our storage.

Revision history for this message

Martin Pitt (pitti) wrote on 2016-04-18:

#8

Unfortunately rsyslog gets shut down too early, so we don't see much from this. Can you please re-try with enabling persistent journal:

sudo mkdir -p /var/log/journal
sudo systemd-tmpfiles --create --prefix /var/log/journal

then reboot once to actually enable on-disk journal, then reboot again to record the shutdown sequence. After that, please do

sudo journalctl -b -1 > /tmp/journal.txt

and attach /tmp/journal.txt here.

And again, how exactly did you set this up? Is this reproducible in a VM somehow? Can you please give me /etc/fstab and other files where you configured this?

Thanks!

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2016-04-26:

#9

journal.txt Edit (3.8 MiB, text/plain)

Here is the requested journal.txt file.

I had a bit of difficulty after performing the requested operations, my server kept booting into the (initramfs) prompt.

The journal log may contain more than one reboot attempt.

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2016-04-26:

#10

iscsi.tar.gz Edit (8.0 KiB, application/x-tar)

This contains the configuration of my iscsi sessions. It is a tar of the entire /etc/iscsi folder, it should show what we've got going on.

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2016-04-26:

#11

And just for good measure this is my /etc/network/interfaces file:

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eno1
iface eno1 inet dhcp

auto enp130s0
iface enp130s0 inet6 static
address fe80::192:12:21:115
netmask 64
mtu 9000

auto enp4s0f0
iface enp4s0f0 inet6 static
address fe80::192:12:22:115
netmask 64
mtu 9000

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2016-04-26:

#12

Contents of fstab:

# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
# / was on /dev/sda1 during installation
UUID=3f62d8b3-3afe-4cd7-8d36-49101b1d7b02 / ext4 errors=remount-ro 0 1
# swap was on /dev/sda5 during installation
UUID=9c54f202-7e90-4633-9134-706e6d7080fc none swap sw 0 0
10.113.242.59:/IOP /opt/iop nfs noauto 0 0

#IOMNT START#
/dev/mapper/3600a09800059d66a0000fa5c56f3e78f /iomnt-3600a09800059d66a0000fa5c56f3e78f ext3 nobarrier,_netdev 0 0
/dev/mapper/3600a09800059d7b40000223656f3ef42 /iomnt-3600a09800059d7b40000223656f3ef42 ext4 nobarrier,_netdev,discard 0 0
/dev/mapper/3600a09800059d66a0000fa5456f3e782 /iomnt-3600a09800059d66a0000fa5456f3e782 xfs nobarrier,_netdev,discard 0 0
/dev/mapper/3600a09800059d66a0000fa5056f3e77b /iomnt-3600a09800059d66a0000fa5056f3e77b btrfs nobarrier,_netdev,discard 0 0
/dev/mapper/360080e50003415e60000147d5605f7fb /iomnt-360080e50003415e60000147d5605f7fb ext3 nobarrier,_netdev 0 0
/dev/mapper/360080e500034153e0000ed065605faa2 /iomnt-360080e500034153e0000ed065605faa2 ext4 nobarrier,_netdev,discard 0 0
#IOMNT END#

Revision history for this message

sles (slesru) wrote on 2016-05-23:

#13

I have the same problem , target is HPE MSA, and problem happens not every time, about in 1/5 of reboots- I run 16.04 , so I just tested...
Looks like iscsi not always stopped before network interfaces are down.

Revision history for this message

sles (slesru) wrote on 2016-05-23:

#14

>Looks like iscsi not always stopped before network interfaces are down.

No, just reproduced this, iscsi was logged out...

Revision history for this message

sles (slesru) wrote on 2016-05-23:

#15

Because it is not always reproduced, I guess this is dependency problem.
Looked into /etc/init.d/iscsid

# Required-Stop: $network $local_fs sendsigs

I guess we need $remote_fs here?

Revision history for this message

sles (slesru) wrote on 2016-05-23:

#16

OK, I'm trying reboot and reboot and now I think this is kernel bug and it is multipath related...

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2016-05-23:

#17

I definitely have remote filesystems mounted (using _netdev if that matters) when I experience this problem.

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2016-06-20:

#18

I have finally been able to verify that this is reproducible with the released 16.04 from the ISO.

What else can be provided here to move this bug along?

(This is being tracked internally in CQ 860251)

Revision history for this message

rc_micha (m-niendorf) wrote on 2016-07-08:

#19

I have the same problem with my multipath iscsi session. I just logged in without mounting the volumes.
While rebooting I get always the message "systemd-shutdown[1]: Failed to finalize DM devices, ignoring" but in most cases the server just ignores that an reboots.

The iscsi multipath volumes are blacklisted in lvm config and are currently not mounted.

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2016-07-11:

#20

rc_micha: Your volumes aren't mounted, but the iscsi sessions are logged in? If you log out of your iscsi sessions does it shutdown clean?

Revision history for this message

Andrew Morton (amorton12) wrote on 2016-08-08:

#21

Seeing what I believe is this bug on a 16.04 install that boots from iSCSI LUN. When I shut down, it goes just past shutting down the network interfaces, then slows down and starts throwing task blocked messages from the kernel. It eventually gets down to "Reached Target Shutdown" before getting completely stuck. I have to manually shut down or reboot the machine from that point.

Revision history for this message

daigang (daigang0701) wrote on 2016-08-18:

#22

I got same preblom
look iscsi not logout before Network-manager cut the network

so iscsi can not logout.

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2016-08-18:

#23

There is clearly something wrong here.

FWIW This issue did not exist in 14.04.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2016-10-18:

#24

[Expired for systemd (Ubuntu) because there has been no activity for 60 days.]

Changed in systemd (Ubuntu):
status:	Incomplete → Expired

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2016-10-18:

#25

If this is still a problem, what do we do now? Do I just keep testing with new releases and file new bugs?

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2016-10-25:

#26

For completeness, I upgraded to 16.04.1, the issue is still present.

I upgraded to the latest xenial-updates packages as of 10/24/16 and the issue is still present.

I then re-installed with 16.10 and the issue is still present. I guess I'll open a new bug for 16.10 and refer back to this one.

Matt Schulte (gimpyestrada) on 2016-10-26

Changed in systemd (Ubuntu):
status:	Expired → Confirmed

Revision history for this message

Alberto Salvia Novella (es20490446e) wrote on 2016-10-29:

#27

Screenshot.jpg Edit (166.4 KiB, image/jpeg)

description:

updated

Alberto Salvia Novella (es20490446e) on 2016-10-29

Changed in systemd (Ubuntu):
importance:	Undecided → High

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2016-10-31:

#28

New bug for 16.10: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1636862

Revision history for this message

Chris Osgood (ubuntu-functionalfuture) wrote on 2016-11-21:

#29

Seeing a similar issue here with so called "diskless" clients. That is, all of their filesystems are network mounts including the root filesystem. These systems use iscsi targets for the root filesystem and during shutdown the network is turned off before the file I/O operations are completed which causes the system to hang with filesystem IO errors.

I'm currently looking to see if there is some way to make systemd keep the network up no matter what, all the way until the machine physically reboots or shuts down (ie. disable the network shutdown completely).

This worked in 14.04 and got broken in 16.04 apparently due to the switch to systemd as far as I can tell (great tool, isn't it?).

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2017-01-24:

#30

Any idea on this one folks?

Revision history for this message

Nish Aravamudan (nacc) wrote on 2017-02-21:

#31

So let's separate out two issues in this bug.

@gimpeystrada, I believe your issue was always with iSCSI disks that were not in-use as the root/boot disk, correct? Looking at your fstab, I think also you don't have multipath over iSCSI, correct?

For anyone else indicating this bug is also affecting them, if @gimpeystrada replies in the affirmative to the above (Not using iSCSI for root disk, not using multipath), then I would like to use separate bugs for those issues. They may all have the same root-cause, but it becomes rather noisy, otherwise.

@gimpestrada, have you tried 16.10 or 17.04 to see if it has been fixed already? Just curious.

Changed in systemd (Ubuntu):
assignee:	nobody → Nish Aravamudan (nacc)

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2017-02-21:

#32

iSCSI disks are not the root/boot disks.

Multipath IS in use. Hence the /dev/mapper at the beginning of the disk names.

Yes I have tried 16.10, it is still present and I have another bug for that release. https://bugs.launchpad.net/bugs/1636862

Revision history for this message

Nish Aravamudan (nacc) wrote on 2017-02-21:

#33

@gimpeystrada: thank you for the prompt response!

Err, right on multipath. Is it at all possible to test without multipath in use? To help narrow down if it's really in iSCSI or if it's in the multipath layer?

Note additional bugs aren't required (I understand at some point this one may have expired, but we'll end up creating tasks here for all affected releases).

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2017-02-21:

#34

It is possible to test without multipath, this will take some time as I no longer have anything with 16.04 installed. For fun I may go ahead and try with 17.04 so we can figure out if it still exists then we'll kind of kill two birds with one stone.

Revision history for this message

Nish Aravamudan (nacc) wrote on 2017-02-21:

#35

No problem -- also, you can just test in 16.10 without multipath. I can spend time reproducing it, etc. later to verify whatever fix we come up with is correct.

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2017-02-22:

#36

Didn't take as long as I thought. Installed the most recent build of Zesty (zesty-server-amd64.iso 2017-02-22 06:54 676M).

I was able to:

1. Confirm the issue still exists.
2. Confirm that the issue is not caused by multipath (i.e. it still occurs during reboot when multipathing is disabled).

Revision history for this message

Scott (scopa) wrote on 2017-05-24:

#37

Fresh install 16.04 download ISO from ubuntu
Can confirm this issue as well.
If I manually log out of the iscsi session reboot no issue
If I am logged into iscsi session reboot hangs. Need to physically power cycle server

Revision history for this message

Nish Aravamudan (nacc) wrote on 2017-05-24: Re: [Bug 1569925] Re: Shutdown hang on 16.04 with iscsi targets

#38

On Wed, May 24, 2017 at 1:45 PM, Scott <email address hidden> wrote:
> Fresh install 16.04 download ISO from ubuntu
> Can confirm this issue as well.
> If I manually log out of the iscsi session reboot no issue

Interesting data point.

> If I am logged into iscsi session reboot hangs. Need to physically power cycle server

I think the open-iscsi or iscsid services have a killall sessions
portion when the service shuts down. I wonder if that's either a) not
running or b) somehow incorrect. I could imagine for iSCSI root it'd
be an issue, but for non-root, I'm not sure why that's not working.

Revision history for this message

Nish Aravamudan (nacc) wrote on 2017-06-06:

#39

Download full text (3.4 KiB)

@gimpeystrada: Thank you for providing your journal logs (in the future, providing the raw journal file is better, as then we can run `journalctl` on it locally). From the logs:

Apr 26 08:52:13 ICTM1612S02H1 iscsiadm[3234]: iscsiadm: initiator reported error (8 - connection timed out)
Apr 26 08:52:13 ICTM1612S02H1 iscsiadm[3234]: iscsiadm: Could not log into all portals
Apr 26 08:52:13 ICTM1612S02H1 iscsiadm[3234]: :0113,3260] successful.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Child 3234 belongs to open-iscsi.service
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Main process exited, code=exited, status=8/n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Changed start -> failed
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=936 reply_cookie=0 error=n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd-logind[2801]: Got message type=signal sender=:1.2 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=936 reply_cookie=0 error=n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Job open-iscsi.service/start finished, result=failed
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: Failed to start Login to default iSCSI targets.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=937 reply_cookie=0 error=n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Unit entered failed state.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Failed with result 'exit-code'.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: cgroup is empty

I *think* the issue in this case is that open-iscsi.service is actually in a failed state before shutdown, which I think means it does not run the ExecStop commands? Can you verify that is the case (boot your system, `systemctl is-system-running` should indicate "degraded", `systemctl status open-iscsi.service` should indicate Failed). The problem is that the open-iscsi.service has:

ExecStartPre=/bin/systemctl --quiet is-active iscsid.service
ExecStart=/sbin/iscsiadm -m node --loginall=automatic
ExecStart=/lib/open-iscsi/activate-storage.sh
ExecStop=/lib/open-iscsi/umountiscsi.sh
ExecStop=/bin/sync
ExecStop=/lib/open-iscsi/logout-all.sh

This indicates it, on 'start' of the service, it will try to login to and active all configured iSCSI targets (the two ExecStart lines). However, if either of those fail (as they did in the journal in this case), the ExecStop lines *do not* run. From `man systemd.service`:

           Note that if any of the commands specified in ExecStartPre=,
           ExecStart=, or ExecStartPost= fail (and are not prefixed with "-",
           see above) or time out before the service is fully up, execution
           continues with commands specified in ExecStopPost=, the commands in
           ExecStop= are skipped.

So, I think that we should be using...

@gimpeystrada: Thank you for providing your journal logs (in the future, providing the raw journal file is better, as then we can run `journalctl` on it locally). From the logs:

Apr 26 08:52:13 ICTM1612S02H1 iscsiadm[3234]: iscsiadm: initiator reported error (8 - connection timed out)
Apr 26 08:52:13 ICTM1612S02H1 iscsiadm[3234]: iscsiadm: Could not log into all portals
Apr 26 08:52:13 ICTM1612S02H1 iscsiadm[3234]: :0113,3260] successful.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Child 3234 belongs to open-iscsi.service
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Main process exited, code=exited, status=8/n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Changed start -> failed
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=936 reply_cookie=0 error=n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd-logind[2801]: Got message type=signal sender=:1.2 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=936 reply_cookie=0 error=n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Job open-iscsi.service/start finished, result=failed
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: Failed to start Login to default iSCSI targets.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=937 reply_cookie=0 error=n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Unit entered failed state.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Failed with result 'exit-code'.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: cgroup is empty

I *think* the issue in this case is that open-iscsi.service is actually in a failed state before shutdown, which I think means it does not run the ExecStop commands? Can you verify that is the case (boot your system, `systemctl is-system-running` should indicate "degraded", `systemctl status open-iscsi.service` should indicate Failed). The problem is that the open-iscsi.service has:

ExecStartPre=/bin/systemctl --quiet is-active iscsid.service
ExecStart=/sbin/iscsiadm -m node --loginall=automatic
ExecStart=/lib/open-iscsi/activate-storage.sh
ExecStop=/lib/open-iscsi/umountiscsi.sh
ExecStop=/bin/sync
ExecStop=/lib/open-iscsi/logout-all.sh

This indicates it, on 'start' of the service, it will try to login to and active all configured iSCSI targets (the two ExecStart lines). However, if either of those fail (as they did in the journal in this case), the ExecStop lines *do not* run. From `man systemd.service`:

Note that if any of the commands specified in ExecStartPre=,
           ExecStart=, or ExecStartPost= fail (and are not prefixed with "-",
           see above) or time out before the service is fully up, execution
           continues with commands specified in ExecStopPost=, the commands in
           ExecStop= are skipped.

So, I think that we should be using ExecStopPost instead.

@gimpeystrada, can you test locally if your system works correctly by editing /lib/systemd/system/open-iscsi.service to use ExecStopPost rather than ExecStop? You will need to run `systemctl daemon-reload` after editing the service file, I believe.

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2017-06-06:

#40

My test configuration has long since been re-tasked. I will eventually be able to come back and test as you ask, but it may take a few weeks. If anyone else on the bug has their systems up and available, please test and reply.

Dimitri John Ledkov (xnox) on 2017-06-07

Changed in systemd (Ubuntu Xenial):
assignee:	nobody → Dimitri John Ledkov (xnox)
importance:	Undecided → High

Launchpad Janitor (janitor) on 2017-06-19

Changed in systemd (Ubuntu Xenial):
status:	New → Confirmed

Dimitri John Ledkov (xnox) on 2017-08-21

Changed in systemd (Ubuntu Xenial):
assignee:	Dimitri John Ledkov (xnox) → nobody

Rafael David Tinoco (rafaeldtinoco) on 2017-08-22

Changed in systemd (Ubuntu Xenial):
assignee:	nobody → Rafael David Tinoco (inaddy)
status:	Confirmed → In Progress

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2017-08-26:

#73

Download full text (6.4 KiB)

Okay, I removed almost everything out of equation:

- removed networking interfaces from systemd
- removed open-iscsi login/logout logic from systemd
- used a single network interface for the logins, on a single iscsi portal

was still able to reproduce the issue by:

- doing a simple login after configuring the network device:
./net-start.sh ; sleep 1 ; sudo iscsiadm -m node --login

- shutting down the network device before any logout:
./net-stop.sh ; sudo shutdown -h now

There was no systemd service order in play (in between open-iscsi, iscsid and networking / ifupdown scripts) and I was able to cause the same issue 100%, same messages, same symptoms. This tells us that, definitely turning down the network while still logged iscsi sessions is causing the hangs (check #3 for why).

Summary of Causes

1)

network shuts down -> iscsi luns are logged out (attempt) -> iscsid daemon is shutdown
or
network shuts down -> iscsi daemon is shutdown -> iscsi luns are logged out

- logout would fail due to lack of network
- iscsi daemon would die and/or logout wouldn't work
- iscsi sessions would still be there
- system would hang (transport layer can't logout)

2)

iscsi daemon is shutdown -> iscsi luns are logged out -> network shuts down

- this would cause the bug i'm mentioning: daemon dies, logout doesn't work.
- some iscsi sessions would still be there
- system would hang (transport layer can't logout)
- (to check: open-iscsi.service ExecStop= running in parallel ?!?)

3)

I used KVM watchdog virtual device + NMI from host to crash the guest during the hang

http://pastebin.ubuntu.com/25394744/

And, finally, the hang is because kernel is hanged during its shutdown logic.

> 0 0 0 ffffffff81e11500 RU 0.0 0 0 [swapper/0]
> 0 0 1 ffff8801a6a20e00 RU 0.0 0 0 [swapper/1]
> 0 0 2 ffff8801a6a21c00 RU 0.0 0 0 [swapper/2]
> 0 0 3 ffff8801a6a22a00 RU 0.0 0 0 [swapper/3]

ALL CPUs were idling during the hang.

crash> runq
CPU 0 RUNQUEUE: ffff8801ae016e00
  CURRENT: PID: 0 TASK: ffffffff81e11500 COMMAND: "swapper/0"
  RT PRIO_ARRAY: ffff8801ae016fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae016e98
     [no tasks queued]

CPU 1 RUNQUEUE: ffff8801ae096e00
  CURRENT: PID: 0 TASK: ffff8801a6a20e00 COMMAND: "swapper/1"
  RT PRIO_ARRAY: ffff8801ae096fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae096e98
     [no tasks queued]

CPU 2 RUNQUEUE: ffff8801ae116e00
  CURRENT: PID: 0 TASK: ffff8801a6a21c00 COMMAND: "swapper/2"
  RT PRIO_ARRAY: ffff8801ae116fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae116e98
     [no tasks queued]

CPU 3 RUNQUEUE: ffff8801ae196e00
  CURRENT: PID: 0 TASK: ffff8801a6a22a00 COMMAND: "swapper/3"
  RT PRIO_ARRAY: ffff8801ae196fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae196e98
     [no tasks queued]

NO task was scheduled to run.

crash> ps -u
PID PPID CPU TASK ST %MEM VSZ RSS COMM
1 0 1 ffff8801a69b8000 UN 0.0 15484 3204 systemd-shutdow

There was just ONE SINGLE user task running (systemd-shutdown)

crash> set ffff8801a69b8000
...

Okay, I removed almost everything out of equation:

- removed networking interfaces from systemd
- removed open-iscsi login/logout logic from systemd
- used a single network interface for the logins, on a single iscsi portal

was still able to reproduce the issue by:

- doing a simple login after configuring the network device: 
  ./net-start.sh ; sleep 1 ; sudo iscsiadm -m node --login

- shutting down the network device before any logout:
  ./net-stop.sh ; sudo shutdown -h now

There was no systemd service order in play (in between open-iscsi, iscsid and networking / ifupdown scripts) and I was able to cause the same issue 100%, same messages, same symptoms. This tells us that, definitely turning down the network while still logged iscsi sessions is causing the hangs (check #3 for why).

Summary of Causes

1)

network shuts down -> iscsi luns are logged out (attempt) -> iscsid daemon is shutdown
or
network shuts down -> iscsi daemon is shutdown -> iscsi luns are logged out

- logout would fail due to lack of network
- iscsi daemon would die and/or logout wouldn't work
- iscsi sessions would still be there
- system would hang (transport layer can't logout)

2)

iscsi daemon is shutdown -> iscsi luns are logged out -> network shuts down

- this would cause the bug i'm mentioning: daemon dies, logout doesn't work.
- some iscsi sessions would still be there
- system would hang (transport layer can't logout)
- (to check: open-iscsi.service ExecStop= running in parallel ?!?)

3)

I used KVM watchdog virtual device + NMI from host to crash the guest during the hang

http://pastebin.ubuntu.com/25394744/

And, finally, the hang is because kernel is hanged during its shutdown logic.

>     0      0   0  ffffffff81e11500  RU   0.0       0      0  [swapper/0]
>     0      0   1  ffff8801a6a20e00  RU   0.0       0      0  [swapper/1]
>     0      0   2  ffff8801a6a21c00  RU   0.0       0      0  [swapper/2]
>     0      0   3  ffff8801a6a22a00  RU   0.0       0      0  [swapper/3]

ALL CPUs were idling during the hang.

crash> runq
CPU 0 RUNQUEUE: ffff8801ae016e00
  CURRENT: PID: 0      TASK: ffffffff81e11500  COMMAND: "swapper/0"
  RT PRIO_ARRAY: ffff8801ae016fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae016e98
     [no tasks queued]

CPU 1 RUNQUEUE: ffff8801ae096e00
  CURRENT: PID: 0      TASK: ffff8801a6a20e00  COMMAND: "swapper/1"
  RT PRIO_ARRAY: ffff8801ae096fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae096e98
     [no tasks queued]

CPU 2 RUNQUEUE: ffff8801ae116e00
  CURRENT: PID: 0      TASK: ffff8801a6a21c00  COMMAND: "swapper/2"
  RT PRIO_ARRAY: ffff8801ae116fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae116e98
     [no tasks queued]

CPU 3 RUNQUEUE: ffff8801ae196e00
  CURRENT: PID: 0      TASK: ffff8801a6a22a00  COMMAND: "swapper/3"
  RT PRIO_ARRAY: ffff8801ae196fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae196e98
     [no tasks queued]

NO task was scheduled to run.

crash> ps -u
   PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
      1      0   1  ffff8801a69b8000  UN   0.0   15484   3204  systemd-shutdow

There was just ONE SINGLE user task running (systemd-shutdown)

crash> set ffff8801a69b8000
    PID: 1
COMMAND: "systemd-shutdow"
   TASK: ffff8801a69b8000  [THREAD_INFO: ffff8801a69c0000]
    CPU: 1
  STATE: TASK_UNINTERRUPTIBLE 
crash> bt
PID: 1      TASK: ffff8801a69b8000  CPU: 1   COMMAND: "systemd-shutdow"
 #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
 #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
 #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
 #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
 #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
 #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
 #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
 #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
 #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
 #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c
#10 [ffff8801a69c3dc0] device_shutdown at ffffffff8156112c
#11 [ffff8801a69c3df0] kernel_power_off at ffffffff810a32f5
#12 [ffff8801a69c3e00] SYSC_reboot at ffffffff810a34d3
#13 [ffff8801a69c3f40] sys_reboot at ffffffff810a359e
#14 [ffff8801a69c3f50] entry_SYSCALL_64_fastpath at ffffffff818431f2
    RIP: 00007face7d8f856  RSP: 00007ffd4c271d08  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 00007ffd4c271240  RCX: 00007face7d8f856
    RDX: 000000004321fedc  RSI: 0000000028121969  RDI: fffffffffee1dead
    RBP: 00007ffd4c271230   R8: 0000000000001400   R9: 0000000000000005
    R10: 00007face88d36c0  R11: 0000000000000202  R12: 00007ffd4c2713d0
    R13: 000000b7e6795c83  R14: 00007ffd4c271c30  R15: 0000000000000001
    ORIG_RAX: 00000000000000a9  CS: 0033  SS: 002b

And it called the "kernel_power_off" logic from a system call.

This logic iterates over all devices calling:

if (dev->bus && dev->bus->shutdown) {
    dev->bus->shutdown(dev);
} else if (dev->driver && dev->driver->shutdown) {
    dev->driver->shutdown(dev);
}

This called sd_shutdown, which called sd_sync_cache, which sent a SYNCHRONIZE_CACHE scsi op code to the device in question (one of the iscsi paths). scsi_execute called blk_execute_rq to put the request on device's queue, for execution.

/* Prevent hang_check timer from firing at us during very long I/O */
hang_check = sysctl_hung_task_timeout_secs;
if (hang_check)
    while (!wait_for_completion_io_timeout(&wait, hang_check * (HZ/2)));

The default timeout for that operation is hang_check 120 seconds * 250/2 AND, "wait_for_completion_io_timeout" returns 0, !0 == 1 so.. this will run forever.

So basically kernel is trying o complete SYNCHRONIZE_CACHE operation, the SCSI cmd is already in device's queue and there is no return (for obvious reasons).

Obs 1)
I have tested this with kernel 4.11 with the same symptom. I have to check now: is there any error propagation logic to cancel this so system can shutdown ? We are not getting a SCSI sense message because the "bus" (iscsi/tcp) is gone. Should we hard flush the device's queue ?

Obs 2)
This isn't necessarily related to systemd. IMO this should also be addressed in kernel, allowing a broken transport layer - specially tcp/ip - to allow the shutdown to happen.

Obs 3)
The cause of this can be related to systemd (if there are leftovers) or iscsid daemon (if it died and/or got restarted before logging out. is there a way of keeping history of sessions so it knows what to logout after daemon starts again ?).

I'll get back to this next week, after reading more this dump.

Revision history for this message

Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote on 2017-09-01: Missing required logs.

#74

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1569925

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete
Changed in linux (Ubuntu Xenial):
status:	New → Incomplete
Changed in linux (Ubuntu Zesty):
status:	New → Incomplete

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2017-09-01:

#75

## SUMMARY 1

Affects open-iscsi (NOT systemd):

* ISCSID NEEDS TO BE UP & RUNNING (AND LOGGED TO PORTAL) FOR THE LOGOUT TO WORK

https://pastebin.canonical.com/197359/

Daemon shutdown documentation in iscsiadm command says (about stopping iscsid daemon with iscsiadm -k):
""" ... This will immediately stop all iscsid operations and shutdown iscsid. It does not logout any sessions. Running this command is the same as doing "killall iscsid". Neither should normally be used, because if iscsid is doing error recovery or if there is an error while iscsid is not running, the system may not be able to recover. This command and iscsid's SIGTERM handling are experimental. """

Basically meaning you're not able to shutdown the daemon in a clean manner if you have connected sessions. And basically, checking the pastebin, you see the need for running iscsid daemon to logout through iscsiadm command.

So from:

(systemctl edit --full open-iscsi.service)
...
Wants=network-online.target remote-fs-pre.target iscsid.service
After=network-online.target iscsid.service
Before=remote-fs-pre.target
...
ExecStartPre=/bin/systemctl --quiet is-active iscsid.service
ExecStart=/sbin/iscsiadm -m node --loginall=automatic
ExecStart=/lib/open-iscsi/activate-storage.sh
ExecStop=/lib/open-iscsi/umountiscsi.sh
ExecStop=/bin/sync
ExecStop=/lib/open-iscsi/logout-all.sh
...

and

(systemctl edit --full iscsid.service)
...
Wants=network-online.target remote-fs-pre.target
Before=remote-fs-pre.target
After=network.target network-online.target
...
ExecStartPre=/lib/open-iscsi/startup-checks.sh
ExecStart=/sbin/iscsid
ExecStop=/sbin/iscsiadm -k 0 2
...

If umountiscsi.sh (it doesn't kill applications using a mount point) can't umount a mount point (from fstab generator or a mount unit) that is still being used, the logout-all.sh will fail for logging out the session of the disk being used. But this service unit would be considered stopped, allowing "iscsid.service" unit to also be shutdown (killing the iscsid daemon and leaving an opened iscsi session).

This would be problematic for iscsi root disks AND for the kernel issue (comment #73), and this is being taken care by Francis Ginther (fginther) in the next iscsi SRU: https://pastebin.canonical.com/196463/ by creating an iscsi-cleanup.service that will run a script that logs out all remaining iscsi sessions (including /) right before kernel shutdown is called by systemd-shutdown like showed in comment #73.

## SUMMARY 1

Affects open-iscsi (NOT systemd):

* ISCSID NEEDS TO BE UP & RUNNING (AND LOGGED TO PORTAL) FOR THE LOGOUT TO WORK

https://pastebin.canonical.com/197359/

Daemon shutdown documentation in iscsiadm command says (about stopping iscsid daemon with iscsiadm -k): 
""" ... This will immediately stop all iscsid operations and shutdown iscsid. It does not logout any sessions. Running this command is the same as doing "killall iscsid". Neither should normally be used, because if iscsid is doing error recovery or if there is an error while iscsid is not running, the system may not be able to recover. This command and iscsid's SIGTERM handling are experimental. """

Basically meaning you're not able to shutdown the daemon in a clean manner if you have connected sessions. And basically, checking the pastebin, you see the need for running iscsid daemon to logout through iscsiadm command.

So from:
 
(systemctl edit --full open-iscsi.service)
...
Wants=network-online.target remote-fs-pre.target iscsid.service
After=network-online.target iscsid.service
Before=remote-fs-pre.target
...
ExecStartPre=/bin/systemctl --quiet is-active iscsid.service
ExecStart=/sbin/iscsiadm -m node --loginall=automatic
ExecStart=/lib/open-iscsi/activate-storage.sh
ExecStop=/lib/open-iscsi/umountiscsi.sh
ExecStop=/bin/sync
ExecStop=/lib/open-iscsi/logout-all.sh
...

and

(systemctl edit --full iscsid.service)
...
Wants=network-online.target remote-fs-pre.target
Before=remote-fs-pre.target
After=network.target network-online.target
...
ExecStartPre=/lib/open-iscsi/startup-checks.sh
ExecStart=/sbin/iscsid
ExecStop=/sbin/iscsiadm -k 0 2
...

If umountiscsi.sh (it doesn't kill applications using a mount point) can't umount a mount point (from fstab generator or a mount unit) that is still being used, the logout-all.sh will fail for logging out the session of the disk being used. But this service unit would be considered stopped, allowing "iscsid.service" unit to also be shutdown (killing the iscsid daemon and leaving an opened iscsi session).

This would be problematic for iscsi root disks AND for the kernel issue (comment #73), and this is being taken care by Francis Ginther (fginther) in the next iscsi SRU: https://pastebin.canonical.com/196463/ by creating an iscsi-cleanup.service that will run a script that logs out all remaining iscsi sessions (including /) right before kernel shutdown is called by systemd-shutdown like showed in comment #73.

Rafael David Tinoco (rafaeldtinoco) on 2017-09-01

Changed in open-iscsi (Ubuntu Xenial):
importance:	Undecided → Medium
status:	New → In Progress
Changed in open-iscsi (Ubuntu Zesty):
importance:	Undecided → Medium
status:	New → In Progress
Changed in open-iscsi (Ubuntu Artful):
importance:	Undecided → Medium
status:	New → In Progress
Changed in linux (Ubuntu Xenial):
importance:	Undecided → Medium
status:	Incomplete → In Progress
Changed in linux (Ubuntu Zesty):
importance:	Undecided → Medium
status:	Incomplete → In Progress
Changed in linux (Ubuntu Artful):
importance:	Undecided → Medium
status:	Incomplete → In Progress
no longer affects:	systemd (Ubuntu)
Changed in systemd (Ubuntu Xenial):
status:	In Progress → Triaged
Changed in systemd (Ubuntu Zesty):
importance:	Undecided → Medium
status:	New → Triaged
Changed in systemd (Ubuntu Artful):
assignee:	Nish Aravamudan (nacc) → nobody
importance:	High → Medium
status:	Confirmed → Triaged
no longer affects:	systemd (Ubuntu Xenial)
no longer affects:	systemd (Ubuntu Zesty)
no longer affects:	systemd (Ubuntu Artful)
Changed in linux (Ubuntu Xenial):
assignee:	nobody → Rafael David Tinoco (inaddy)

Rafael David Tinoco (rafaeldtinoco) on 2017-09-01

Changed in linux (Ubuntu Zesty):
assignee:	nobody → Rafael David Tinoco (inaddy)
Changed in linux (Ubuntu Artful):
assignee:	nobody → Rafael David Tinoco (inaddy)

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2017-09-01:

#79

2 things for previous comment #75:

1)
I pasted wrong pastebin URL. I'll describe the fix: its a service unit file that runs at the end of shutdown, right before it, re-using /run (tmpfs). Service has a preparation script that copies binaries from initrd image - so there is a minimum execution environment available - and runs the second script. Second script initializes iscsid daemon and waits in a loop for all the portals to be ONLINE and existing sessions to be LOGGED. After that it does the logout on all existing sessions (this case this would be leftovers from my previous comment AND / root disks).

2)
I tested this approach and it works good for iscsi root disks because there is a network interface that stays connected all the time. In the case of leftovers from previous unsuccessful umount attempts, the network.target would be already gone (with networkd / ifupdown interfaces already shutdown). IF I keep my interfaces configured, then this approach works for root disks AND for leftovers. For this approach to be successful for this case, this "cleanup" service would have to raise existing interfaces up again for the iscsid to login and logout to work (OR the kernel hang be resolved, then leftovers would be just left there, without logouts).

Rafael David Tinoco (rafaeldtinoco) on 2017-09-06

Changed in open-iscsi (Ubuntu Xenial):
assignee:	nobody → Rafael David Tinoco (inaddy)
Changed in open-iscsi (Ubuntu Zesty):
assignee:	nobody → Rafael David Tinoco (inaddy)
Changed in open-iscsi (Ubuntu Artful):
assignee:	nobody → Rafael David Tinoco (inaddy)

Brian Murray (brian-murray) on 2017-09-11

tags:

added: rls-aa-notfixing

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2017-09-12:

#80

I created a workaround for this issue while its being worked on:

http://pastebin.ubuntu.com/25519820/

Create a file: /lib/systemd/system-shutdown/debug.sh and place the contents in it.

This workaround (based on some ideas from sil2100 and slangasek for iscsi / umount) is basically bringing the interfaces up and remounting / so it can finally cleanup everything.

Things to note:

- It ONLY RUNS if there are iscsi leftovers
- It uses ifupdown only for networking (/etc/network/interfaces)
- It has to remount / to do networking and to run iscsid
- If it fails to bring network it will hang like before (kernel issue)
- It waits for iscsi to be logged in again (might take awhile to shutdown)
- If logout fails, it hangs again (unless the network is left configured, i could change it)

What is this script different then altering open-iscsi.service ?

It runs at the very end of systemd shutdown and it is very unlikely that there are any services holding references to iscsi mounts, disallowing them to be logged out.

Now I'll test Debian SID and check if this is found there, to open a bug in Debian project as well.
Before moving on into open-iscsi services - to create a cleanup unit file for the open-iscsi package, like this workaround - i'll dig into the kernel issue. I'm afraid no fix will be as good as making sure kernel let the queued I/O cmd go. I have also to make sure this workaround is changed to allow root iscsi to be logged out.

Francis Ginther (fginther) on 2017-09-29

tags:

added: id-598b6543459f8ccf5dfbc04c

Revision history for this message

Matthijs Hoekstra (mahoekst) wrote on 2017-11-05:

#81

Even with the script a shutdown/reboot takes a very long time. Any expectations when a fix will be ready? I hit this all the time on 2 installed servers.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2017-11-05:

#82

Download full text (3.2 KiB)

Hello Matthijs

Unfortunately the best way to make this not to happen is by fixing the kernel hang situation, when kernel calls sd_sync_cache() to every configured device before the shutdown. There is a single I/O cmd hanging in all scsi paths and the I/O error is never propagated to block layer (despite iscsi having proper I/O error settings). I'm finishing analysing some kernel dumps so I can finally understand what is happening in the transport layer (this happens with more recent kernels also).

The workaround was to create a script that would restore the iscsi connection, wait for the login to happen again and the paths are back online, and cleanly logout, allowing the sd_sync_cache() operation to be finalized.

If you are facing this problem, I know for sure that your iscsi connections are not being finalized before the network is off. This means that you have to pay attention on how you configured your iscsi disks:

- guarantee that iscsiadm was configured with "interfaces" so it works on startup:

sudo iscsiadm -m iface -I ens4 --op=new -n iface.hwaddress -v 52:54:00:b4:21:bb
sudo iscsiadm -m iface -I ens7 --op=new -n iface.hwaddress -v 52:54:00:c2:34:1b

- the discovery/login has to be made AFTER the iscsiadm had interfaces added

sudo iscsiadm -m discovery --op=new --op=del --type sendtargets --portal $SERVER1
sudo iscsiadm -m discovery --op=new --op=del --type sendtargets --portal $SERVER2

# iscsiadm -m node --loginall=automatic HAS TO WORK or else init scripts will fail

http://pastebin.ubuntu.com/25894472/

- configure the volumes in /etc/fstab with "_netdev" parameter for systemd unit ordering

  LABEL=BLUE /blue ext4 defaults,_netdev 0 1
  LABEL=GREEN /green ext4 defaults,_netdev 0 1
  LABEL=PURPLE /purple ext4 defaults,_netdev 0 1
  LABEL=RED /red ext4 defaults,_netdev 0 1
  LABEL=YELLOW /yellow ext4 defaults,_netdev 0 1

You have to make sure open-iscsi and iscsid systemd units are started after the network is available and are stopped before they disappear. That might be your problem, if configuration above is correct.

inaddy@iscsihang:~$ systemctl edit --full iscsid.service
inaddy@iscsihang:~$ systemctl edit --full open-iscsi.service

The defaults are:

[Unit]
Description=iSCSI initiator daemon (iscsid)
Documentation=man:iscsid(8)
Wants=network-online.target remote-fs-pre.target
Before=remote-fs-pre.target
After=network.target network-online.target

and

[Unit]
Description=Login to default iSCSI targets
Documentation=man:iscsiadm(8) man:iscsid(8)
Wants=network-online.target remote-fs-pre.target iscsid.service
After=network-online.target iscsid.service
Before=remote-fs-pre.target

So you can see that iscsid.service runs BEFORE open-iscsi.service. In my case, I'm configuring network using rc-local.service (since this is my lab) and I had to guarantee the ordering also:

If, after configuring your system like this, you still face problems, you can use this script:

http://pastebin.ubuntu.com/25894592/

And provide me the DEBUG=/.shutdown.log file, created after its execution, attached to this launchpad case. Its likely that you will have hang iscsi connections for some reason (services ordering, lack of volume...

Hello Matthijs

Unfortunately the best way to make this not to happen is by fixing the kernel hang situation, when kernel calls sd_sync_cache() to every configured device before the shutdown. There is a single I/O cmd hanging in all scsi paths and the I/O error is never propagated to block layer (despite iscsi having proper I/O error settings). I'm finishing analysing some kernel dumps so I can finally understand what is happening in the transport layer (this happens with more recent kernels also).

The workaround was to create a script that would restore the iscsi connection, wait for the login to happen again and the paths are back online, and cleanly logout, allowing the sd_sync_cache() operation to be finalized.

If you are facing this problem, I know for sure that your iscsi connections are not being finalized before the network is off. This means that you have to pay attention on how you configured your iscsi disks:

- guarantee that iscsiadm was configured with "interfaces" so it works on startup:

sudo iscsiadm -m iface -I ens4 --op=new -n iface.hwaddress -v 52:54:00:b4:21:bb
  sudo iscsiadm -m iface -I ens7 --op=new -n iface.hwaddress -v 52:54:00:c2:34:1b

- the discovery/login has to be made AFTER the iscsiadm had interfaces added

sudo iscsiadm -m discovery --op=new --op=del --type sendtargets --portal $SERVER1
  sudo iscsiadm -m discovery --op=new --op=del --type sendtargets --portal $SERVER2

# iscsiadm -m node --loginall=automatic HAS TO WORK or else init scripts will fail

http://pastebin.ubuntu.com/25894472/

- configure the volumes in /etc/fstab with "_netdev" parameter for systemd unit ordering

LABEL=BLUE /blue ext4 defaults,_netdev 0 1
  LABEL=GREEN /green ext4 defaults,_netdev 0 1
  LABEL=PURPLE /purple ext4 defaults,_netdev 0 1
  LABEL=RED /red ext4 defaults,_netdev 0 1
  LABEL=YELLOW /yellow ext4 defaults,_netdev 0 1

You have to make sure open-iscsi and iscsid systemd units are started after the network is available and are stopped before they disappear. That might be your problem, if configuration above is correct.

inaddy@iscsihang:~$ systemctl edit --full iscsid.service
inaddy@iscsihang:~$ systemctl edit --full open-iscsi.service

The defaults are:

[Unit]
Description=iSCSI initiator daemon (iscsid)
Documentation=man:iscsid(8)
Wants=network-online.target remote-fs-pre.target
Before=remote-fs-pre.target
After=network.target network-online.target

and

[Unit]
Description=Login to default iSCSI targets
Documentation=man:iscsiadm(8) man:iscsid(8)
Wants=network-online.target remote-fs-pre.target iscsid.service
After=network-online.target iscsid.service
Before=remote-fs-pre.target

So you can see that iscsid.service runs BEFORE open-iscsi.service.  In my case, I'm configuring network using rc-local.service (since this is my lab) and I had to guarantee the ordering also:

If, after configuring your system like this, you still face problems, you can use this script:

http://pastebin.ubuntu.com/25894592/

And provide me the DEBUG=/.shutdown.log file, created after its execution, attached to this launchpad case. Its likely that you will have hang iscsi connections for some reason (services ordering, lack of volumes in fstab so umounts are not done, etc).

Hope it helps for now.

Revision history for this message

Amanda Gartner (amandagartner) wrote on 2017-11-10:

#83

I followed all of the setup steps, but was still encountering the ping timeout issue at reboot. I then ran the script http://pastebin.ubuntu.com/25894592/ you provided and am no longer hitting the issue. Was the script supposed to fix the issue? Please let me know. Thank you!

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2017-11-10:

#84

Hello Amanda,

I'm actively working into this, will provide better feedback soon. The script is just a workaround for now, so people can shut down properly, if they are facing this.

Despite what userland does, the kernel will always potentially hang with iscsi sessions left opened during shutdown. If, after following the setup steps, you still faced the issue, its possible that your shutdown systemd unit ordering is wrong (your mount points are being unmounted before they are released). I'm working in the kernel issue, so, it shall cover all the userland-created situations.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2017-11-30:

#85

Long story short -> https://goo.gl/vPyh8C

Basically the block device enqueues the last request (a SYNC scsi command coming from sd_shutdown) for every scsi device there is on the system. Unfortunately, since the OS is shutting down, in between the block request and its execution, we have userland (systemd) killing iscsid, without proper logout, and/or removing the network.

What happens next is that the mid layer (SCSI) tries to deliver the request through the transport layer (iscsi_tcp_sw) but it fails since the transport layer checks the session status and finds out that the session is not in LOGIN state.

The default behaviour of the transport layer (iscsi_tcp_sw) in such situation is to tell the mid-layer to keep resetting the request timeout timer while it tries to recover (something that will never happen because the network is gone).

Changing that default behavior to state that the scsi command was NOT handled by the transport layer (iscsi_tcp_sw) implies in making the scsi timeout function to try to "abort" the scsi command, which also creates other commands that will timeout because of the transport layer.

Best scenario so far was to change BLK_EH_NOT_HANDLED for BLK_EH_HANDLED in the scsi_times_out function and make the kernel to be able to shutdown. By doing that, I'm confirming to block device something that DID NOT happen, meaning that the command never left the transport layer.

This might be ONE of possible ways to fix this: I can mark in the transport layer that I have timed-out DURING the shutdown procedure, cancelling all the block device requests without having to invoke the scsi error handling mechanism, generating more traffic in transport layer (what would also cause more timeouts, causing a loop in the problematic sequence).

Anyway, I'll get back to this next week and hopefully identify best course of action.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2017-12-08:

#86

I have proposed a fix to the kernel iscsi transport code. You can follow upstream discussion, if any, in the following address:

https://marc.info/?t=151268397600008&r=1&w=2

The patch proposal is here:

https://marc.info/?l=linux-scsi&m=151268396227285&w=2

If upstream chooses to accept my proposal, which fixes the issue like demonstrated here:

https://marc.info/?l=linux-scsi&m=151274369910906&w=2

Then I'll propose Canonical kernel-team a SRU based on the upstream patch.

For now this case is on hold for upstream discussion/decision.

Thank you
-Rafael

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2017-12-08: RE: [Bug 1569925] Re: Shutdown hang on 16.04 with iscsi targets

#87

That's awesome sir, thank you for your diligence.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2018-01-04:

#88

The scsi/iscsi team has accepted by fix proposal and merged the patch for 4.16 merge window:

https://<email address hidden>/msg10111.html

I'll propose the cherry-pick as SRU for Ubuntu kernels as soon as it is merged.

Rafael David Tinoco (rafaeldtinoco) on 2018-01-19

tags:

added: sts

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2018-01-19:

#89

Changed open-iscsi to opinion since I choose to fix the kernel instead of fixing Userland. No matter what you do in userland, the kernel had space for freezing and hanging in different scenarios depending on how userland disconnected the transport layer. I kept the "linux" source package as being affected and will SRU it through Ubuntu Kernel Team.

Changed in open-iscsi (Ubuntu Xenial):
status:	In Progress → Opinion
Changed in open-iscsi (Ubuntu):
status:	In Progress → Opinion
Changed in open-iscsi (Ubuntu Zesty):
status:	In Progress → Opinion
Changed in open-iscsi (Ubuntu Artful):
status:	In Progress → Opinion

Rafael David Tinoco (rafaeldtinoco) on 2018-01-19

description:

updated

Rafael David Tinoco (rafaeldtinoco) on 2018-01-19

Changed in open-iscsi (Ubuntu Trusty):
status:	New → Opinion
Changed in linux (Ubuntu Trusty):
status:	New → In Progress
assignee:	nobody → Rafael David Tinoco (inaddy)
importance:	Undecided → Medium
Changed in open-iscsi (Ubuntu Trusty):
assignee:	nobody → Rafael David Tinoco (inaddy)
importance:	Undecided → Medium
no longer affects:	open-iscsi (Ubuntu Zesty)
no longer affects:	linux (Ubuntu Zesty)

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2018-01-19:

#90

I have just submitted the SRU proposal to kernel team mailing list. I'll wait for the SRU to be made and this public bug to be marked as Fix Committed before verifying the fix again for the final release.

Seth Forshee (sforshee) on 2018-01-23

Changed in linux (Ubuntu Bionic):
status:	In Progress → Fix Committed

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2018-02-03:

#91

Commit has been merged upstream:

commit d754941225a7dbc61f6dd2173fa9498049f9a7ee
Author: Rafael David Tinoco <email address hidden>
Date: Thu Dec 7 19:59:13 2017 -0200

scsi: libiscsi: Allow sd_shutdown on bad transport

    If, for any reason, userland shuts down iscsi transport interfaces
    before proper logouts - like when logging in to LUNs manually, without
    logging out on server shutdown, or when automated scripts can't
    umount/logout from logged LUNs - kernel will hang forever on its
    sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all
    still existent paths.

    PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow"
     #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
     #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
     #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
     #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
     #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
     #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
     #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
     #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
     #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
     #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c

    This happens because iscsi_eh_cmd_timed_out(), the transport layer
    timeout helper, would tell the queue timeout function (scsi_times_out)
    to reset the request timer over and over, until the session state is
    back to logged in state. Unfortunately, during server shutdown, this
    might never happen again.

    Other option would be "not to handle" the issue in the transport
    layer. That would trigger the error handler logic, which would also need
    the session state to be logged in again.

    Best option, for such case, is to tell upper layers that the command was
    handled during the transport layer error handler helper, marking it as
    DID_NO_CONNECT, which will allow completion and inform about the
    problem.

    After the session was marked as ISCSI_STATE_FAILED, due to the first
    timeout during the server shutdown phase, all subsequent cmds will fail
    to be queued, allowing upper logic to fail faster.

    Signed-off-by: Rafael David Tinoco <email address hidden>
    Reviewed-by: Lee Duncan <email address hidden>
    Signed-off-by: Martin K. Petersen <email address hidden>

Applied in Bionic:

https://lists.ubuntu.com/archives/kernel-team/2018-January/089509.html

And Fix Committed.

Ack-ed for Trusty, Xenial, Artful:

https://lists.ubuntu.com/archives/kernel-team/2018-January/089774.html
https://lists.ubuntu.com/archives/kernel-team/2018-February/089809.html

Waiting to be Fix Committed there. A soon as it is released to -proposed I'll verify it and provide feedback so it can be Fix Released.

Commit has been merged upstream:

commit d754941225a7dbc61f6dd2173fa9498049f9a7ee
Author: Rafael David Tinoco <rafael.tinoco@canonical.com>
Date:   Thu Dec 7 19:59:13 2017 -0200

scsi: libiscsi: Allow sd_shutdown on bad transport

If, for any reason, userland shuts down iscsi transport interfaces
    before proper logouts - like when logging in to LUNs manually, without
    logging out on server shutdown, or when automated scripts can't
    umount/logout from logged LUNs - kernel will hang forever on its
    sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all
    still existent paths.

PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow"
     #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
     #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
     #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
     #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
     #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
     #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
     #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
     #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
     #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
     #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c

This happens because iscsi_eh_cmd_timed_out(), the transport layer
    timeout helper, would tell the queue timeout function (scsi_times_out)
    to reset the request timer over and over, until the session state is
    back to logged in state. Unfortunately, during server shutdown, this
    might never happen again.

Other option would be "not to handle" the issue in the transport
    layer. That would trigger the error handler logic, which would also need
    the session state to be logged in again.

Best option, for such case, is to tell upper layers that the command was
    handled during the transport layer error handler helper, marking it as
    DID_NO_CONNECT, which will allow completion and inform about the
    problem.

After the session was marked as ISCSI_STATE_FAILED, due to the first
    timeout during the server shutdown phase, all subsequent cmds will fail
    to be queued, allowing upper logic to fail faster.

Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com>
    Reviewed-by: Lee Duncan <lduncan@suse.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Applied in Bionic:

https://lists.ubuntu.com/archives/kernel-team/2018-January/089509.html

And Fix Committed.

Ack-ed for Trusty, Xenial, Artful:

https://lists.ubuntu.com/archives/kernel-team/2018-January/089774.html
https://lists.ubuntu.com/archives/kernel-team/2018-February/089809.html

Waiting to be Fix Committed there. A soon as it is released to -proposed I'll verify it and provide feedback so it can be Fix Released.

Khaled El Mously (kmously) on 2018-02-04

Changed in linux (Ubuntu Artful):
status:	In Progress → Fix Committed
Changed in linux (Ubuntu Xenial):
status:	In Progress → Fix Committed
Changed in linux (Ubuntu Trusty):
status:	In Progress → Fix Committed

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2018-02-05:

#92

I apologize for my ignorance, but now that tall these check ins have occurred, how do I know when an installable package has been built and is available in the various repos?

And thank you @inaddy for the diligence getting this one resolved.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2018-02-05:

#93

Hello Matt, this is part of the SRU procedure (https://wiki.ubuntu.com/StableReleaseUpdates). For the kernel, the SRU is a bit different (https://wiki.ubuntu.com/Kernel/Dev/StablePatchFormat, https://wiki.ubuntu.com/Kernel/StableReleaseCadence). Now that the bug has been marked as "Fix Committed" is means that the patch has been merged in the "master-next" branch of every one of the following Ubuntu versions:

Changed in linux (Ubuntu Artful):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed

When the new kernel package is built, it will be placed into -proposed for each of the Ubuntu versions and you will be able to test the kernel and provide feedback here. Since I was the one that fixed it, I'll verify each of the kernels to make sure it fixed the issue. After verification, and at least 7 days in -proposed, the packages shall be moved to -updates.

Hope it helps you understand the fixing steps. Cheers o/

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2018-02-05:

#94

BTW, kernel team usually updates all affecting cases with the version that was built that address this issue (so you know what to text in -proposed).

Revision history for this message

Matt Schulte (gimpyestrada) wrote on 2018-02-05:

#95

Thanks, and will notifications appear here in the bug during these stages?

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2018-02-05:

#96

Yep, soon kernel team will advise (here) about this patch been released in a specific kernel version that is in -proposed and will call out for testing/verification. I'll give all feedback here also and will mark, if appropriate, case as "verification-done" so new kernel is marked as solving this issue.

Revision history for this message

Kleber Sacilotto de Souza (kleber-souza) wrote on 2018-02-14:

#97

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-artful' to 'verification-done-artful'. If the problem still exists, change the tag 'verification-needed-artful' to 'verification-failed-artful'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags:	added: verification-needed-artful
tags:	added: verification-needed-xenial

Revision history for this message

Kleber Sacilotto de Souza (kleber-souza) wrote on 2018-02-14:

#98

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2018-02-14:

#99

Hello Kleber,

Thanks for the feedback. Working in verification.

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2018-02-14:

#100

# xenial - 4.4.0-112

[ OK ] Reached target Shutdown.
[ 70.272171] connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907361, last ping 4294908612, now 4294909864
[ 70.274189] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907361, last ping 4294908612, now 4294909864
[ 70.288082] connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907362, last ping 4294908614, now 4294909868
[ 70.290066] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907363, last ping 4294908614, now 4294909868
[ 70.292039] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907362, last ping 4294908614, now 4294909868

# xenial - 4.4.0-116

[ OK ] Reached target Shutdown.
[ 38.384175] connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899386, last ping 4294900640, now 4294901892
[ 38.386107] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899387, last ping 4294900640, now 4294901892
[ 38.388023] connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899387, last ping 4294900640, now 4294901892
[ 38.388159] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899386, last ping 4294900640, now 4294901892
[ 38.388159] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899387, last ping 4294900640, now 4294901892
[ 89.825203] reboot: Restarting system

-> xenial-verification-done

tags:

added: verification-done-xenial
removed: verification-needed-xenial

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2018-02-14:

#101

# xenial - 4.4.0-112

[ OK ] Reached target Shutdown.
[ 70.272171] connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907361, last ping 4294908612, now 4294909864
[ 70.274189] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907361, last ping 4294908612, now 4294909864
[ 70.288082] connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907362, last ping 4294908614, now 4294909868
[ 70.290066] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907363, last ping 4294908614, now 4294909868
[ 70.292039] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907362, last ping 4294908614, now 4294909868

# xenial - 4.4.0-116

[ OK ] Reached target Shutdown.
[ 38.384175] connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899386, last ping 4294900640, now 4294901892
[ 38.386107] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899387, last ping 4294900640, now 4294901892
[ 38.388023] connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899387, last ping 4294900640, now 4294901892
[ 38.388159] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899386, last ping 4294900640, now 4294901892
[ 38.388159] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899387, last ping 4294900640, now 4294901892
[ 89.825203] reboot: Restarting system

-> xenial-verification-done

--------

# artful - 4.13.0-32

[ OK ] Reached target Shutdown.
[ 194.277712] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294938304, last ping 4294939584, now 4294940864
[ 194.279684] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294938304, last ping 4294939584, now 4294940864
[ 194.281483] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294938304, last ping 4294939584, now 4294940864
[ 194.283265] connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294938304, last ping 4294939584, now 4294940864
[ 194.285067] connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294938304, last ping 4294939584, now 4294940864

# artful - 4.13.0-35

[ OK ] Reached target Shutdown.
[ 416.109989] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294993856, last ping 4294995136, now 4294996416
[ 416.111922] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294993856, last ping 4294995136, now 4294996416
[ 416.113869] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294993856, last ping 4294995136, now 4294996416
[ 416.115741] connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294993856, last ping 4294995136, now 4294996416
[ 416.117650] connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294993856, last ping 4294995136, now 4294996416
[ 469.614166] reboot: Restarting system

-> artful-verification-done

# xenial - 4.4.0-112

[  OK  ] Reached target Shutdown.
[   70.272171]  connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907361, last ping 4294908612, now 4294909864
[   70.274189]  connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907361, last ping 4294908612, now 4294909864
[   70.288082]  connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907362, last ping 4294908614, now 4294909868
[   70.290066]  connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907363, last ping 4294908614, now 4294909868
[   70.292039]  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294907362, last ping 4294908614, now 4294909868

# xenial - 4.4.0-116

[  OK  ] Reached target Shutdown.
[   38.384175]  connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899386, last ping 4294900640, now 4294901892
[   38.386107]  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899387, last ping 4294900640, now 4294901892
[   38.388023]  connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899387, last ping 4294900640, now 4294901892
[   38.388159]  connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899386, last ping 4294900640, now 4294901892
[   38.388159]  connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294899387, last ping 4294900640, now 4294901892
[   89.825203] reboot: Restarting system

-> xenial-verification-done

--------

# artful - 4.13.0-32

[  OK  ] Reached target Shutdown.
[  194.277712]  connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294938304, last ping 4294939584, now 4294940864
[  194.279684]  connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294938304, last ping 4294939584, now 4294940864
[  194.281483]  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294938304, last ping 4294939584, now 4294940864
[  194.283265]  connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294938304, last ping 4294939584, now 4294940864
[  194.285067]  connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294938304, last ping 4294939584, now 4294940864

# artful - 4.13.0-35

[  OK  ] Reached target Shutdown.
[  416.109989]  connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294993856, last ping 4294995136, now 4294996416
[  416.111922]  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294993856, last ping 4294995136, now 4294996416
[  416.113869]  connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294993856, last ping 4294995136, now 4294996416
[  416.115741]  connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294993856, last ping 4294995136, now 4294996416
[  416.117650]  connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294993856, last ping 4294995136, now 4294996416
[  469.614166] reboot: Restarting system

-> artful-verification-done

tags:

added: verification-done-artful
removed: verification-needed-artful

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2018-02-14:

#102

Checking if Trusty's fix is in -proposed (no comment given from Kernel Team about it)

Revision history for this message

Launchpad Janitor (janitor) wrote on 2018-02-21:

#103

Download full text (20.1 KiB)

This bug was fixed in the package linux - 4.13.0-36.40

---------------
linux (4.13.0-36.40) artful; urgency=medium

* linux: 4.13.0-36.40 -proposed tracker (LP: #1750010)

* Rebuild without "CVE-2017-5754 ARM64 KPTI fixes" patch set

linux (4.13.0-35.39) artful; urgency=medium

* linux: 4.13.0-35.39 -proposed tracker (LP: #1748743)

  * CVE-2017-5715 (Spectre v2 Intel)
    - Revert "UBUNTU: SAUCE: turn off IBPB when full retpoline is present"
    - SAUCE: turn off IBRS when full retpoline is present
    - [Packaging] retpoline files must be sorted
    - [Packaging] pull in retpoline files

linux (4.13.0-34.37) artful; urgency=medium

* linux: 4.13.0-34.37 -proposed tracker (LP: #1748475)

* libata: apply MAX_SEC_1024 to all LITEON EP1 series devices (LP: #1743053)
- libata: apply MAX_SEC_1024 to all LITEON EP1 series devices

  * KVM patches for s390x to provide facility bits 81 (ppa15) and 82 (bpb)
    (LP: #1747090)
    - KVM: s390: wire up bpb feature

  * artful 4.13 i386 kernels crash after memory hotplug remove (LP: #1747069)
    - Revert "mm, memory_hotplug: do not associate hotadded memory to zones until
      online"

  * CVE-2017-5715 (Spectre v2 Intel)
    - x86/feature: Enable the x86 feature to control Speculation
    - x86/feature: Report presence of IBPB and IBRS control
    - x86/enter: MACROS to set/clear IBRS and set IBPB
    - x86/enter: Use IBRS on syscall and interrupts
    - x86/idle: Disable IBRS entering idle and enable it on wakeup
    - x86/idle: Disable IBRS when offlining cpu and re-enable on wakeup
    - x86/mm: Set IBPB upon context switch
    - x86/mm: Only set IBPB when the new thread cannot ptrace current thread
    - x86/entry: Stuff RSB for entry to kernel for non-SMEP platform
    - x86/kvm: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD to kvm
    - x86/kvm: Set IBPB when switching VM
    - x86/kvm: Toggle IBRS on VM entry and exit
    - x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature
    - x86/spec_ctrl: Add lock to serialize changes to ibrs and ibpb control
    - x86/cpu/AMD: Add speculative control support for AMD
    - x86/microcode: Extend post microcode reload to support IBPB feature
    - KVM: SVM: Do not intercept new speculative control MSRs
    - x86/svm: Set IBRS value on VM entry and exit
    - x86/svm: Set IBPB when running a different VCPU
    - KVM: x86: Add speculative control CPUID support for guests
    - SAUCE: turn off IBPB when full retpoline is present

  * Artful 4.13 fixes for tun (LP: #1748846)
    - tun: call dev_get_valid_name() before register_netdevice()
    - tun: allow positive return values on dev_get_valid_name() call
    - tun/tap: sanitize TUNSETSNDBUF input

* boot failure on AMD Raven + WestonXT (LP: #1742759)
- SAUCE: drm/amdgpu: add atpx quirk handling (v2)

linux (4.13.0-33.36) artful; urgency=low

* linux: 4.13.0-33.36 -proposed tracker (LP: #1746903)

  [ Stefan Bader ]
  * starting VMs causing retpoline4 to reboot (LP: #1747507) // CVE-2017-5715
    (Spectre v2 retpoline)
    - x86/retpoline: Fill RSB on context switch for affected CPUs
    - x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
    - x86/retpol...

This bug was fixed in the package linux - 4.13.0-36.40

---------------
linux (4.13.0-36.40) artful; urgency=medium

* linux: 4.13.0-36.40 -proposed tracker (LP: #1750010)

* Rebuild without "CVE-2017-5754 ARM64 KPTI fixes" patch set

linux (4.13.0-35.39) artful; urgency=medium

* linux: 4.13.0-35.39 -proposed tracker (LP: #1748743)

* CVE-2017-5715 (Spectre v2 Intel)
    - Revert "UBUNTU: SAUCE: turn off IBPB when full retpoline is present"
    - SAUCE: turn off IBRS when full retpoline is present
    - [Packaging] retpoline files must be sorted
    - [Packaging] pull in retpoline files

linux (4.13.0-34.37) artful; urgency=medium

* linux: 4.13.0-34.37 -proposed tracker (LP: #1748475)

* libata: apply MAX_SEC_1024 to all LITEON EP1 series devices (LP: #1743053)
    - libata: apply MAX_SEC_1024 to all LITEON EP1 series devices

* KVM patches for s390x to provide facility bits 81 (ppa15) and 82 (bpb)
    (LP: #1747090)
    - KVM: s390: wire up bpb feature

* artful 4.13 i386 kernels crash after memory hotplug remove (LP: #1747069)
    - Revert "mm, memory_hotplug: do not associate hotadded memory to zones until
      online"

* CVE-2017-5715 (Spectre v2 Intel)
    - x86/feature: Enable the x86 feature to control Speculation
    - x86/feature: Report presence of IBPB and IBRS control
    - x86/enter: MACROS to set/clear IBRS and set IBPB
    - x86/enter: Use IBRS on syscall and interrupts
    - x86/idle: Disable IBRS entering idle and enable it on wakeup
    - x86/idle: Disable IBRS when offlining cpu and re-enable on wakeup
    - x86/mm: Set IBPB upon context switch
    - x86/mm: Only set IBPB when the new thread cannot ptrace current thread
    - x86/entry: Stuff RSB for entry to kernel for non-SMEP platform
    - x86/kvm: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD to kvm
    - x86/kvm: Set IBPB when switching VM
    - x86/kvm: Toggle IBRS on VM entry and exit
    - x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature
    - x86/spec_ctrl: Add lock to serialize changes to ibrs and ibpb control
    - x86/cpu/AMD: Add speculative control support for AMD
    - x86/microcode: Extend post microcode reload to support IBPB feature
    - KVM: SVM: Do not intercept new speculative control MSRs
    - x86/svm: Set IBRS value on VM entry and exit
    - x86/svm: Set IBPB when running a different VCPU
    - KVM: x86: Add speculative control CPUID support for guests
    - SAUCE: turn off IBPB when full retpoline is present

* Artful 4.13 fixes for tun (LP: #1748846)
    - tun: call dev_get_valid_name() before register_netdevice()
    - tun: allow positive return values on dev_get_valid_name() call
    - tun/tap: sanitize TUNSETSNDBUF input

* boot failure on AMD Raven + WestonXT (LP: #1742759)
    - SAUCE: drm/amdgpu: add atpx quirk handling (v2)

linux (4.13.0-33.36) artful; urgency=low

* linux: 4.13.0-33.36 -proposed tracker (LP: #1746903)

[ Stefan Bader ]
  * starting VMs causing retpoline4 to reboot (LP: #1747507) // CVE-2017-5715
    (Spectre v2 retpoline)
    - x86/retpoline: Fill RSB on context switch for affected CPUs
    - x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
    - x86/retpoline: Optimize inline assembler for vmexit_fill_RSB
    - x86/retpoline: Remove the esp/rsp thunk
    - x86/retpoline: Simplify vmexit_fill_RSB()

* Missing install-time driver for QLogic QED 25/40/100Gb Ethernet NIC
    (LP: #1743638)
    - [d-i] Add qede to nic-modules udeb

* hisi_sas: driver robustness fixes (LP: #1739807)
    - scsi: hisi_sas: fix reset and port ID refresh issues
    - scsi: hisi_sas: avoid potential v2 hw interrupt issue
    - scsi: hisi_sas: fix v2 hw underflow residual value
    - scsi: hisi_sas: add v2 hw DFX feature
    - scsi: hisi_sas: add irq and tasklet cleanup in v2 hw
    - scsi: hisi_sas: service interrupt ITCT_CLR interrupt in v2 hw
    - scsi: hisi_sas: fix internal abort slot timeout bug
    - scsi: hisi_sas: us start_phy in PHY_FUNC_LINK_RESET
    - scsi: hisi_sas: fix NULL check in SMP abort task path
    - scsi: hisi_sas: fix the risk of freeing slot twice
    - scsi: hisi_sas: kill tasklet when destroying irq in v3 hw
    - scsi: hisi_sas: complete all tasklets prior to host reset

* [Artful/Zesty] ACPI APEI error handling bug fixes (LP: #1732990)
    - ACPI: APEI: fix the wrong iteration of generic error status block
    - ACPI / APEI: clear error status before acknowledging the error

* [Zesty/Artful] On ARM64 PCIE physical function passthrough guest fails to
    boot (LP: #1732804)
    - vfio/pci: Virtualize Maximum Payload Size
    - vfio/pci: Virtualize Maximum Read Request Size

* hisi_sas: Add ATA command support for SMR disks (LP: #1739891)
    - scsi: hisi_sas: support zone management commands

* thunderx2: i2c driver PEC and ACPI clock fixes (LP: #1738073)
    - ACPI / APD: Add clock frequency for ThunderX2 I2C controller
    - i2c: xlp9xx: Get clock frequency with clk API
    - i2c: xlp9xx: Handle I2C_M_RECV_LEN in msg->flags

* Falkor erratum 1041 needs workaround (LP: #1738497)
    - [Config] CONFIG_QCOM_FALKOR_ERRATUM_E1041=y
    - arm64: Add software workaround for Falkor erratum 1041

* ThunderX: TX failure unless checksum offload disabled (LP: #1736593)
    - net: thunderx: Fix TCP/UDP checksum offload for IPv6 pkts
    - net: thunderx: Fix TCP/UDP checksum offload for IPv4 pkts

* arm64/thunderx: Unhandled context faults in ACPI mode (LP: #1736774)
    - PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF
    - PCI: Apply Cavium ThunderX ACS quirk to more Root Ports

* arm64: Unfair rwlock can stall the system (LP: #1732238)
    - locking/qrwlock: Use 'struct qrwlock' instead of 'struct __qrwlock'
    - locking/atomic: Add atomic_cond_read_acquire()
    - locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock
    - locking/qrwlock, arm64: Move rwlock implementation over to qrwlocks
    - locking/qrwlock: Prevent slowpath writers getting held up by fastpath

* Shutdown hang on 16.04 with iscsi targets (LP: #1569925)
    - scsi: libiscsi: Allow sd_shutdown on bad transport

* bt_iter() crash due to NULL pointer (LP: #1744300)
    - blk-mq-tag: check for NULL rq when iterating tags

* hisilicon hibmc regression due to ea642c3216cb ("drm/ttm: add io_mem_pfn
    callback") (LP: #1738334)
    - SAUCE: drm: hibmc: Initialize the hibmc_bo_driver.io_mem_pfn

* CVE-2017-5754 ARM64 KPTI fixes
    - arm64: Add ASM_BUG()
    - arm64: consistently use bl for C exception entry
    - arm64: syscallno is secretly an int, make it official
    - arm64: Abstract syscallno manipulation
    - arm64: move non-entry code out of .entry.text
    - arm64: unwind: avoid percpu indirection for irq stack
    - arm64: unwind: disregard frame.sp when validating frame pointer
    - arm64: mm: Fix set_memory_valid() declaration
    - arm64: Convert __inval_cache_range() to area-based
    - arm64: Expose DC CVAP to userspace
    - arm64: Handle trapped DC CVAP
    - arm64: Implement pmem API support
    - arm64: uaccess: Implement *_flushcache variants
    - arm64/vdso: Support mremap() for vDSO
    - arm64: unwind: reference pt_regs via embedded stack frame
    - arm64: unwind: remove sp from struct stackframe
    - arm64: uaccess: Add the uaccess_flushcache.c file
    - arm64: fix pmem interface definition
    - arm64: compat: Remove leftover variable declaration
    - fork: allow arch-override of VMAP stack alignment
    - arm64: kernel: remove {THREAD,IRQ_STACK}_START_SP
    - arm64: factor out PAGE_* and CONT_* definitions
    - arm64: clean up THREAD_* definitions
    - arm64: clean up irq stack definitions
    - arm64: move SEGMENT_ALIGN to <asm/memory.h>
    - efi/arm64: add EFI_KIMG_ALIGN
    - arm64: factor out entry stack manipulation
    - arm64: assembler: allow adr_this_cpu to use the stack pointer
    - arm64: use an irq stack pointer
    - arm64: add basic VMAP_STACK support
    - arm64: add on_accessible_stack()
    - arm64: add VMAP_STACK overflow detection
    - arm64: Convert pte handling from inline asm to using (cmp)xchg
    - kvm: arm64: Convert kvm_set_s2pte_readonly() from inline asm to cmpxchg()
    - arm64: Move PTE_RDONLY bit handling out of set_pte_at()
    - arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect()
    - arm64: Remove the !CONFIG_ARM64_HW_AFDBM alternative code paths
    - arm64: introduce separated bits for mm_context_t flags
    - arm64: cleanup {COMPAT_,}SET_PERSONALITY() macro
    - KVM: arm/arm64: Fix guest external abort matching
    - KVM: arm/arm64: vgic: constify seq_operations and file_operations
    - KVM: arm/arm64: vITS: Drop its_ite->lpi field
    - KVM: arm/arm64: Extract GICv3 max APRn index calculation
    - KVM: arm/arm64: Support uaccess of GICC_APRn
    - arm64: move TASK_* definitions to <asm/processor.h>
    - arm64: Use larger stacks when KASAN is selected
    - arm64: sysreg: Move SPE registers and PSB into common header files
    - arm64: head: Init PMSCR_EL2.{PA,PCT} when entered at EL2 without VHE
    - arm64: Update fault_info table with new exception types
    - arm64: Use existing defines for mdscr
    - arm64: Fix single stepping in kernel traps
    - arm64: asm-bug: Renumber macro local labels to avoid clashes
    - arm64: Implement arch-specific pte_access_permitted()
    - arm64: explicitly mask all exceptions
    - arm64: introduce an order for exceptions
    - arm64: Move the async/fiq helpers to explicitly set process context flags
    - arm64: Mask all exceptions during kernel_exit
    - arm64: entry.S: Remove disable_dbg
    - arm64: entry.S: convert el1_sync
    - arm64: entry.S convert el0_sync
    - arm64: entry.S: convert elX_irq
    - arm64: entry.S: move SError handling into a C function for future expansion
    - arm64: pgd: Mark pgd_cache as __ro_after_init
    - arm64: cpu_ops: Add missing 'const' qualifiers
    - arm64: context: Fix comments and remove pointless smp_wmb()
    - arm64: SW PAN: Point saved ttbr0 at the zero page when switching to init_mm
    - arm64: SW PAN: Update saved ttbr0 value on enter_lazy_tlb
    - arm64: Expose support for optional ARMv8-A features
    - arm64: KVM: Hide unsupported AArch64 CPU features from guests
    - arm64: mm: Use non-global mappings for kernel space
    - arm64: mm: Temporarily disable ARM64_SW_TTBR0_PAN
    - arm64: mm: Move ASID from TTBR0 to TTBR1
    - arm64: mm: Remove pre_ttbr0_update_workaround for Falkor erratum #E1003
    - arm64: mm: Rename post_ttbr0_update_workaround
    - arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN
    - arm64: mm: Allocate ASIDs in pairs
    - arm64: mm: Add arm64_kernel_unmapped_at_el0 helper
    - arm64: mm: Invalidate both kernel and user ASIDs when performing TLBI
    - arm64: entry: Add exception trampoline page for exceptions from EL0
    - arm64: mm: Map entry trampoline into trampoline and kernel page tables
    - arm64: entry: Explicitly pass exception level to kernel_ventry macro
    - arm64: entry: Hook up entry trampoline to exception vectors
    - arm64: erratum: Work around Falkor erratum #E1003 in trampoline code
    - arm64: tls: Avoid unconditional zeroing of tpidrro_el0 for native tasks
    - arm64: entry: Add fake CPU feature for unmapping the kernel at EL0
    - arm64: Kconfig: Add CONFIG_UNMAP_KERNEL_AT_EL0
    - arm64: mm: Introduce TTBR_ASID_MASK for getting at the ASID in the TTBR
    - arm64: kaslr: Put kernel vectors address in separate data page
    - arm64: use RET instruction for exiting the trampoline
    - arm64: Kconfig: Reword UNMAP_KERNEL_AT_EL0 kconfig entry
    - arm64: Fix the feature type for ID register fields
    - arm64: Take into account ID_AA64PFR0_EL1.CSV3
    - arm64: cpufeature: Pass capability structure to ->enable callback
    - drivers/firmware: Expose psci_get_version through psci_ops structure
    - arm64: Move post_ttbr_update_workaround to C code
    - arm64: Add skeleton to harden the branch predictor against aliasing attacks
    - arm64: KVM: Use per-CPU vector when BP hardening is enabled
    - arm64: KVM: Make PSCI_VERSION a fast path
    - arm64: cputype: Add missing MIDR values for Cortex-A72 and Cortex-A75
    - arm64: Implement branch predictor hardening for affected Cortex-A CPUs
    - arm64: Define cputype macros for Falkor CPU
    - arm64: Implement branch predictor hardening for Falkor
    - arm64: cputype: Add MIDR values for Cavium ThunderX2 CPUs
    - bpf: inline map in map lookup functions for array and htab
    - bpf: perf event change needed for subsequent bpf helpers
    - bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations
    - arm64: Branch predictor hardening for Cavium ThunderX2
    - arm64: capabilities: Handle duplicate entries for a capability
    - arm64: kpti: Fix the interaction between ASID switching and software PAN
    - SAUCE: arm: Add BTB invalidation on switch_mm for Cortex-A9, A12 and A17
    - SAUCE: arm: Invalidate BTB on prefetch abort outside of user mapping on
      Cortex A8, A9, A12 and A17
    - SAUCE: arm: KVM: Invalidate BTB on guest exit
    - SAUCE: arm: Add icache invalidation on switch_mm for Cortex-A15
    - SAUCE: arm: Invalidate icache on prefetch abort outside of user mapping on
      Cortex-A15
    - SAUCE: arm: KVM: Invalidate icache on guest exit for Cortex-A15
    - SAUCE: asm-generic/barrier: add generic nospec helpers
    - SAUCE: Documentation: document nospec helpers
    - SAUCE: arm64: implement nospec_{load,ptr}()
    - SAUCE: arm: implement nospec_ptr()
    - SAUCE: bpf: inhibit speculated out-of-bounds pointers
    - SAUCE: arm64: Implement branch predictor hardening for Falkor
    - SAUCE: arm64: Branch predictor hardening for Cavium ThunderX2
    - [Config] UNMAP_KERNEL_AT_EL0=y && HARDEN_BRANCH_PREDICTOR=y

* [artful] panic in update_stack_state when reading /proc/<pid>/stack on i386
    (LP: #1747263)
    - x86/unwind: Fix dereference of untrusted pointer

* CVE-2017-5753 (Spectre v1 Intel)
    - x86/cpu/AMD: Remove now unused definition of MFENCE_RDTSC feature
    - SAUCE: reinstate MFENCE_RDTSC feature definition
    - locking/barriers: introduce new observable speculation barrier
    - bpf: prevent speculative execution in eBPF interpreter
    - x86, bpf, jit: prevent speculative execution when JIT is enabled
    - SAUCE: FIX: x86, bpf, jit: prevent speculative execution when JIT is enabled
    - uvcvideo: prevent speculative execution
    - carl9170: prevent speculative execution
    - p54: prevent speculative execution
    - qla2xxx: prevent speculative execution
    - cw1200: prevent speculative execution
    - Thermal/int340x: prevent speculative execution
    - ipv4: prevent speculative execution
    - ipv6: prevent speculative execution
    - fs: prevent speculative execution
    - net: mpls: prevent speculative execution
    - udf: prevent speculative execution
    - userns: prevent speculative execution
    - SAUCE: powerpc: add osb barrier
    - SAUCE: s390/spinlock: add osb memory barrier
    - SAUCE: claim mitigation via observable speculation barrier

* CVE-2017-5715 (Spectre v2 retpoline)
    - x86/asm: Fix inline asm call constraints for Clang
    - kvm: vmx: Scrub hardware GPRs at VM-exit
    - sysfs/cpu: Add vulnerability folder
    - x86/cpu: Implement CPU vulnerabilites sysfs functions
    - x86/tboot: Unbreak tboot with PTI enabled
    - objtool: Detect jumps to retpoline thunks
    - objtool: Allow alternatives to be ignored
    - x86/retpoline: Add initial retpoline support
    - x86/spectre: Add boot time option to select Spectre v2 mitigation
    - x86/retpoline/crypto: Convert crypto assembler indirect jumps
    - x86/retpoline/entry: Convert entry assembler indirect jumps
    - x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
    - x86/retpoline/hyperv: Convert assembler indirect jumps
    - x86/retpoline/xen: Convert Xen hypercall indirect jumps
    - x86/retpoline/checksum32: Convert assembler indirect jumps
    - x86/retpoline/irq32: Convert assembler indirect jumps
    - x86/retpoline: Fill return stack buffer on vmexit
    - selftests/x86: Add test_vsyscall
    - x86/pti: Fix !PCID and sanitize defines
    - security/Kconfig: Correct the Documentation reference for PTI
    - x86,perf: Disable intel_bts when PTI
    - x86/retpoline: Remove compile time warning
    - [Config] enable CONFIG_GENERIC_CPU_VULNERABILITIES
    - [Config] enable CONFIG_RETPOLINE
    - [Packaging] retpoline -- add call site validation
    - [Config] disable retpoline checks for first upload

* CVE-2017-5715 (revert embargoed) // CVE-2017-5753 (revert embargoed)
    - Revert "UBUNTU: SAUCE: x86/entry: Fix up retpoline assembler labels"
    - Revert "kvm: vmx: Scrub hardware GPRs at VM-exit"
    - Revert "Revert "x86/svm: Add code to clear registers on VM exit""
    - Revert "UBUNTU: SAUCE: x86/microcode: Extend post microcode reload to
      support IBPB feature -- repair missmerge"
    - Revert "UBUNTU: SAUCE: x86/kvm: Fix stuff_RSB() for 32-bit"
    - Revert "s390/spinlock: add gmb memory barrier"
    - Revert "powerpc: add gmb barrier"
    - Revert "x86/cpu/AMD: Remove now unused definition of MFENCE_RDTSC feature"
    - Revert "x86/svm: Add code to clear registers on VM exit"
    - Revert "x86/svm: Add code to clobber the RSB on VM exit"
    - Revert "KVM: x86: Add speculative control CPUID support for guests"
    - Revert "x86/svm: Set IBPB when running a different VCPU"
    - Revert "x86/svm: Set IBRS value on VM entry and exit"
    - Revert "KVM: SVM: Do not intercept new speculative control MSRs"
    - Revert "x86/microcode: Extend post microcode reload to support IBPB feature"
    - Revert "x86/cpu/AMD: Add speculative control support for AMD"
    - Revert "x86/entry: Use retpoline for syscall's indirect calls"
    - Revert "x86/syscall: Clear unused extra registers on 32-bit compatible
      syscall entrance"
    - Revert "x86/syscall: Clear unused extra registers on syscall entrance"
    - Revert "x86/spec_ctrl: Add lock to serialize changes to ibrs and ibpb
      control"
    - Revert "x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature"
    - Revert "x86/kvm: Pad RSB on VM transition"
    - Revert "x86/kvm: Toggle IBRS on VM entry and exit"
    - Revert "x86/kvm: Set IBPB when switching VM"
    - Revert "x86/kvm: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD to kvm"
    - Revert "x86/entry: Stuff RSB for entry to kernel for non-SMEP platform"
    - Revert "x86/mm: Only set IBPB when the new thread cannot ptrace current
      thread"
    - Revert "x86/mm: Set IBPB upon context switch"
    - Revert "x86/idle: Disable IBRS when offlining cpu and re-enable on wakeup"
    - Revert "x86/idle: Disable IBRS entering idle and enable it on wakeup"
    - Revert "x86/enter: Use IBRS on syscall and interrupts"
    - Revert "x86/enter: MACROS to set/clear IBRS and set IBPB"
    - Revert "x86/feature: Report presence of IBPB and IBRS control"
    - Revert "x86/feature: Enable the x86 feature to control Speculation"
    - Revert "udf: prevent speculative execution"
    - Revert "net: mpls: prevent speculative execution"
    - Revert "fs: prevent speculative execution"
    - Revert "ipv6: prevent speculative execution"
    - Revert "userns: prevent speculative execution"
    - Revert "Thermal/int340x: prevent speculative execution"
    - Revert "cw1200: prevent speculative execution"
    - Revert "qla2xxx: prevent speculative execution"
    - Revert "p54: prevent speculative execution"
    - Revert "carl9170: prevent speculative execution"
    - Revert "uvcvideo: prevent speculative execution"
    - Revert "x86, bpf, jit: prevent speculative execution when JIT is enabled"
    - Revert "bpf: prevent speculative execution in eBPF interpreter"
    - Revert "locking/barriers: introduce new memory barrier gmb()"

* Unable to boot with i386 4.13.0-25 / 4.13.0-26 / 4.13.0-31 kernel on Xenial
    / Artful (LP: #1745118)
    - x86/mm: Fix overlap of i386 CPU_ENTRY_AREA with FIX_BTMAP

* 4.13: unable to increase MTU configuration for GRE devices (LP: #1743746)
    - ip_gre: remove the incorrect mtu limit for ipgre tap

* CVE-2017-17712
    - net: ipv4: fix for a race condition in raw_sendmsg

* upload urgency should be medium by default (LP: #1745338)
    - [Packaging] update urgency to medium by default

* CVE-2017-15115
    - sctp: do not peel off an assoc from one netns to another one

* CVE-2017-8824
    - dccp: CVE-2017-8824: use-after-free in DCCP code

-- Khalid Elmously <khalid.elmously@canonical.com>  Fri, 16 Feb 2018 12:49:24 -0500

Changed in linux (Ubuntu Artful):
status:	Fix Committed → Fix Released
status:	Fix Committed → Fix Released

Revision history for this message

Launchpad Janitor (janitor) wrote on 2018-02-21:

#105

Download full text (10.7 KiB)

This bug was fixed in the package linux - 4.4.0-116.140

---------------
linux (4.4.0-116.140) xenial; urgency=medium

* linux: 4.4.0-116.140 -proposed tracker (LP: #1748990)

  * BUG: unable to handle kernel NULL pointer dereference at 0000000000000009
    (LP: #1748671)
    - SAUCE: net: ipv4: fix for a race condition in raw_sendmsg -- fix backport

linux (4.4.0-115.139) xenial; urgency=medium

* linux: 4.4.0-115.138 -proposed tracker (LP: #1748745)

  * CVE-2017-5715 (Spectre v2 Intel)
    - Revert "UBUNTU: SAUCE: turn off IBPB when full retpoline is present"
    - SAUCE: turn off IBRS when full retpoline is present
    - [Packaging] retpoline files must be sorted
    - [Packaging] pull in retpoline files

linux (4.4.0-114.137) xenial; urgency=medium

* linux: 4.4.0-114.137 -proposed tracker (LP: #1748484)

  * ALSA backport missing NVIDIA GPU codec IDs to patch table to
    Ubuntu 16.04 LTS Kernel (LP: #1744117)
    - ALSA: hda - Add missing NVIDIA GPU codec IDs to patch table

* Shutdown hang on 16.04 with iscsi targets (LP: #1569925)
- scsi: libiscsi: Allow sd_shutdown on bad transport

* libata: apply MAX_SEC_1024 to all LITEON EP1 series devices (LP: #1743053)
- libata: apply MAX_SEC_1024 to all LITEON EP1 series devices

  * KVM patches for s390x to provide facility bits 81 (ppa15) and 82 (bpb)
    (LP: #1747090)
    - KVM: s390: wire up bpb feature
    - KVM: s390: Enable all facility bits that are known good for passthrough

  * CVE-2017-5715 (Spectre v2 Intel)
    - SAUCE: drop lingering gmb() macro
    - x86/feature: Enable the x86 feature to control Speculation
    - x86/feature: Report presence of IBPB and IBRS control
    - x86/enter: MACROS to set/clear IBRS and set IBPB
    - x86/enter: Use IBRS on syscall and interrupts
    - x86/idle: Disable IBRS entering idle and enable it on wakeup
    - x86/idle: Disable IBRS when offlining cpu and re-enable on wakeup
    - x86/mm: Set IBPB upon context switch
    - x86/mm: Only set IBPB when the new thread cannot ptrace current thread
    - x86/kvm: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD to kvm
    - x86/kvm: Set IBPB when switching VM
    - x86/kvm: Toggle IBRS on VM entry and exit
    - x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature
    - x86/spec_ctrl: Add lock to serialize changes to ibrs and ibpb control
    - x86/cpu/amd, kvm: Satisfy guest kernel reads of IC_CFG MSR
    - x86/cpu/AMD: Add speculative control support for AMD
    - x86/microcode: Extend post microcode reload to support IBPB feature
    - KVM: SVM: Do not intercept new speculative control MSRs
    - x86/svm: Set IBRS value on VM entry and exit
    - x86/svm: Set IBPB when running a different VCPU
    - KVM: x86: Add speculative control CPUID support for guests
    - SAUCE: Fix spec_ctrl support in KVM
    - SAUCE: turn off IBPB when full retpoline is present

linux (4.4.0-113.136) xenial; urgency=low

* linux: 4.4.0-113.136 -proposed tracker (LP: #1746936)

  [ Stefan Bader ]
  * Missing install-time driver for QLogic QED 25/40/100Gb Ethernet NIC
    (LP: #1743638)
    - [d-i] Add qede to nic-modules udeb

* CVE-2017-5753 (Spectre v1 Intel)
- x86/cpu/AMD: Make t...

This bug was fixed in the package linux - 4.4.0-116.140

---------------
linux (4.4.0-116.140) xenial; urgency=medium

* linux: 4.4.0-116.140 -proposed tracker (LP: #1748990)

* BUG: unable to handle kernel NULL pointer dereference at 0000000000000009
    (LP: #1748671)
    - SAUCE: net: ipv4: fix for a race condition in raw_sendmsg -- fix backport

linux (4.4.0-115.139) xenial; urgency=medium

* linux: 4.4.0-115.138 -proposed tracker (LP: #1748745)

* CVE-2017-5715 (Spectre v2 Intel)
    - Revert "UBUNTU: SAUCE: turn off IBPB when full retpoline is present"
    - SAUCE: turn off IBRS when full retpoline is present
    - [Packaging] retpoline files must be sorted
    - [Packaging] pull in retpoline files

linux (4.4.0-114.137) xenial; urgency=medium

* linux: 4.4.0-114.137 -proposed tracker (LP: #1748484)

* ALSA backport missing NVIDIA GPU codec IDs to patch table to
    Ubuntu 16.04 LTS Kernel (LP: #1744117)
    - ALSA: hda - Add missing NVIDIA GPU codec IDs to patch table

* Shutdown hang on 16.04 with iscsi targets (LP: #1569925)
    - scsi: libiscsi: Allow sd_shutdown on bad transport

* libata: apply MAX_SEC_1024 to all LITEON EP1 series devices (LP: #1743053)
    - libata: apply MAX_SEC_1024 to all LITEON EP1 series devices

* KVM patches for s390x to provide facility bits 81 (ppa15) and 82 (bpb)
    (LP: #1747090)
    - KVM: s390: wire up bpb feature
    - KVM: s390: Enable all facility bits that are known good for passthrough

* CVE-2017-5715 (Spectre v2 Intel)
    - SAUCE: drop lingering gmb() macro
    - x86/feature: Enable the x86 feature to control Speculation
    - x86/feature: Report presence of IBPB and IBRS control
    - x86/enter: MACROS to set/clear IBRS and set IBPB
    - x86/enter: Use IBRS on syscall and interrupts
    - x86/idle: Disable IBRS entering idle and enable it on wakeup
    - x86/idle: Disable IBRS when offlining cpu and re-enable on wakeup
    - x86/mm: Set IBPB upon context switch
    - x86/mm: Only set IBPB when the new thread cannot ptrace current thread
    - x86/kvm: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD to kvm
    - x86/kvm: Set IBPB when switching VM
    - x86/kvm: Toggle IBRS on VM entry and exit
    - x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature
    - x86/spec_ctrl: Add lock to serialize changes to ibrs and ibpb control
    - x86/cpu/amd, kvm: Satisfy guest kernel reads of IC_CFG MSR
    - x86/cpu/AMD: Add speculative control support for AMD
    - x86/microcode: Extend post microcode reload to support IBPB feature
    - KVM: SVM: Do not intercept new speculative control MSRs
    - x86/svm: Set IBRS value on VM entry and exit
    - x86/svm: Set IBPB when running a different VCPU
    - KVM: x86: Add speculative control CPUID support for guests
    - SAUCE: Fix spec_ctrl support in KVM
    - SAUCE: turn off IBPB when full retpoline is present

linux (4.4.0-113.136) xenial; urgency=low

* linux: 4.4.0-113.136 -proposed tracker (LP: #1746936)

[ Stefan Bader ]
  * Missing install-time driver for QLogic QED 25/40/100Gb Ethernet NIC
    (LP: #1743638)
    - [d-i] Add qede to nic-modules udeb

* CVE-2017-5753 (Spectre v1 Intel)
    - x86/cpu/AMD: Make the LFENCE instruction serialized
    - x86/cpu/AMD: Remove now unused definition of MFENCE_RDTSC feature
    - SAUCE: reinstate MFENCE_RDTSC feature definition
    - locking/barriers: introduce new observable speculation barrier
    - bpf: prevent speculative execution in eBPF interpreter
    - x86, bpf, jit: prevent speculative execution when JIT is enabled
    - SAUCE: FIX: x86, bpf, jit: prevent speculative execution when JIT is enabled
    - carl9170: prevent speculative execution
    - qla2xxx: prevent speculative execution
    - Thermal/int340x: prevent speculative execution
    - ipv4: prevent speculative execution
    - ipv6: prevent speculative execution
    - fs: prevent speculative execution
    - net: mpls: prevent speculative execution
    - udf: prevent speculative execution
    - userns: prevent speculative execution
    - SAUCE: claim mitigation via observable speculation barrier
    - SAUCE: powerpc: add osb barrier
    - SAUCE: s390/spinlock: add osb memory barrier
    - SAUCE: arm64: no osb() implementation yet
    - SAUCE: arm: no osb() implementation yet

* CVE-2017-5715 (Spectre v2 retpoline)
    - x86/cpuid: Provide get_scattered_cpuid_leaf()
    - x86/cpu: Factor out application of forced CPU caps
    - x86/cpufeatures: Make CPU bugs sticky
    - x86/cpufeatures: Add X86_BUG_CPU_INSECURE
    - x86/cpu, x86/pti: Do not enable PTI on AMD processors
    - x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
    - x86/cpufeatures: Add X86_BUG_SPECTRE_V[12]
    - x86/cpu: Merge bugs.c and bugs_64.c
    - sysfs/cpu: Add vulnerability folder
    - x86/cpu: Implement CPU vulnerabilites sysfs functions
    - x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
    - x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
    - x86/asm: Use register variable to get stack pointer value
    - x86/kbuild: enable modversions for symbols exported from asm
    - x86/asm: Make asm/alternative.h safe from assembly
    - EXPORT_SYMBOL() for asm
    - kconfig.h: use __is_defined() to check if MODULE is defined
    - x86/retpoline: Add initial retpoline support
    - x86/spectre: Add boot time option to select Spectre v2 mitigation
    - x86/retpoline/crypto: Convert crypto assembler indirect jumps
    - x86/retpoline/entry: Convert entry assembler indirect jumps
    - x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
    - x86/retpoline/hyperv: Convert assembler indirect jumps
    - x86/retpoline/xen: Convert Xen hypercall indirect jumps
    - x86/retpoline/checksum32: Convert assembler indirect jumps
    - x86/retpoline/irq32: Convert assembler indirect jumps
    - x86/retpoline: Fill return stack buffer on vmexit
    - x86/retpoline: Remove compile time warning
    - x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
    - module: Add retpoline tag to VERMAGIC
    - x86/mce: Make machine check speculation protected
    - retpoline: Introduce start/end markers of indirect thunk
    - kprobes/x86: Blacklist indirect thunk functions for kprobes
    - kprobes/x86: Disable optimizing on the function jumps to indirect thunk
    - x86/retpoline: Optimize inline assembler for vmexit_fill_RSB
    - [Config] CONFIG_RETPOLINE=y
    - [Packaging] retpoline -- add call site validation
    - [Config] disable retpoline checks for first upload

* CVE-2017-5715 (revert embargoed) // CVE-2017-5753 (revert embargoed)
    - Revert "UBUNTU: SAUCE: Fix spec_ctrl support in KVM"
    - Revert "x86/cpuid: Provide get_scattered_cpuid_leaf()"
    - Revert "kvm: vmx: Scrub hardware GPRs at VM-exit"
    - Revert "Revert "x86/svm: Add code to clear registers on VM exit""
    - Revert "UBUNTU: SAUCE: x86/microcode: Extend post microcode reload to
      support IBPB feature -- repair missmerge"
    - Revert "arm: no gmb() implementation yet"
    - Revert "arm64: no gmb() implementation yet"
    - Revert "UBUNTU: SAUCE: x86/kvm: Fix stuff_RSB() for 32-bit"
    - Revert "s390/spinlock: add gmb memory barrier"
    - Revert "powerpc: add gmb barrier"
    - Revert "x86/cpu/AMD: Remove now unused definition of MFENCE_RDTSC feature"
    - Revert "x86/cpu/AMD: Make the LFENCE instruction serialized"
    - Revert "x86/svm: Add code to clear registers on VM exit"
    - Revert "x86/svm: Add code to clobber the RSB on VM exit"
    - Revert "KVM: x86: Add speculative control CPUID support for guests"
    - Revert "x86/svm: Set IBPB when running a different VCPU"
    - Revert "x86/svm: Set IBRS value on VM entry and exit"
    - Revert "KVM: SVM: Do not intercept new speculative control MSRs"
    - Revert "x86/microcode: Extend post microcode reload to support IBPB feature"
    - Revert "x86/cpu/AMD: Add speculative control support for AMD"
    - Revert "x86/cpu/amd, kvm: Satisfy guest kernel reads of IC_CFG MSR"
    - Revert "x86/entry: Use retpoline for syscall's indirect calls"
    - Revert "x86/syscall: Clear unused extra registers on 32-bit compatible
      syscall entrance"
    - Revert "x86/syscall: Clear unused extra registers on syscall entrance"
    - Revert "x86/spec_ctrl: Add lock to serialize changes to ibrs and ibpb
      control"
    - Revert "x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature"
    - Revert "x86/kvm: Pad RSB on VM transition"
    - Revert "x86/kvm: Toggle IBRS on VM entry and exit"
    - Revert "x86/kvm: Set IBPB when switching VM"
    - Revert "x86/kvm: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD to kvm"
    - Revert "x86/entry: Stuff RSB for entry to kernel for non-SMEP platform"
    - Revert "x86/mm: Only set IBPB when the new thread cannot ptrace current
      thread"
    - Revert "x86/mm: Set IBPB upon context switch"
    - Revert "x86/idle: Disable IBRS when offlining cpu and re-enable on wakeup"
    - Revert "x86/idle: Disable IBRS entering idle and enable it on wakeup"
    - Revert "x86/enter: Use IBRS on syscall and interrupts"
    - Revert "x86/enter: MACROS to set/clear IBRS and set IBPB"
    - Revert "x86/feature: Report presence of IBPB and IBRS control"
    - Revert "x86/feature: Enable the x86 feature to control Speculation"
    - Revert "udf: prevent speculative execution"
    - Revert "net: mpls: prevent speculative execution"
    - Revert "fs: prevent speculative execution"
    - Revert "ipv6: prevent speculative execution"
    - Revert "userns: prevent speculative execution"
    - Revert "Thermal/int340x: prevent speculative execution"
    - Revert "qla2xxx: prevent speculative execution"
    - Revert "carl9170: prevent speculative execution"
    - Revert "uvcvideo: prevent speculative execution"
    - Revert "x86, bpf, jit: prevent speculative execution when JIT is enabled"
    - Revert "bpf: prevent speculative execution in eBPF interpreter"

* CVE-2017-17712
    - net: ipv4: fix for a race condition in raw_sendmsg

* upload urgency should be medium by default (LP: #1745338)
    - [Packaging] update urgency to medium by default

* CVE-CVE-2017-12190
    - more bio_map_user_iov() leak fixes

* CVE-2015-8952
    - mbcache2: reimplement mbcache
    - ext2: convert to mbcache2
    - ext4: convert to mbcache2
    - mbcache2: limit cache size
    - mbcache2: Use referenced bit instead of LRU
    - ext4: kill ext4_mballoc_ready
    - ext4: shortcut setting of xattr to the same value
    - mbcache: remove mbcache
    - mbcache2: rename to mbcache
    - mbcache: get rid of _e_hash_list_head
    - mbcache: add reusable flag to cache entries

* CVE-2017-15115
    - sctp: do not peel off an assoc from one netns to another one

* CVE-2017-8824
    - dccp: CVE-2017-8824: use-after-free in DCCP code

-- Khalid Elmously <khalid.elmously@canonical.com>  Mon, 12 Feb 2018 20:17:57 +0000

Changed in linux (Ubuntu Xenial):
status:	Fix Committed → Fix Released
status:	Fix Committed → Fix Released

Revision history for this message

Launchpad Janitor (janitor) wrote on 2018-02-22:

#107

Download full text (3.3 KiB)

This bug was fixed in the package linux - 4.15.0-10.11

---------------
linux (4.15.0-10.11) bionic; urgency=medium

* linux: 4.15.0-10.11 -proposed tracker (LP: #1749250)

  * "swiotlb: coherent allocation failed" dmesg spam with linux 4.15.0-9.10
    (LP: #1749202)
    - swiotlb: suppress warning when __GFP_NOWARN is set
    - drm/ttm: specify DMA_ATTR_NO_WARN for huge page pools

  * linux-tools: perf incorrectly linking libbfd (LP: #1748922)
    - SAUCE: tools -- add ability to disable libbfd
    - [Packaging] correct disablement of libbfd

  * [Artful] Realtek ALC225: 2 secs noise when a headset plugged in
    (LP: #1744058)
    - ALSA: hda/realtek - update ALC225 depop optimize

* [Artful] Support headset mode for DELL WYSE (LP: #1723913)
- SAUCE: ALSA: hda/realtek - Add support headset mode for DELL WYSE

  * headset mic can't be detected on two Dell machines (LP: #1748807)
    - ALSA: hda/realtek - Support headset mode for ALC215/ALC285/ALC289
    - ALSA: hda - Fix headset mic detection problem for two Dell machines

  * Bionic update to v4.15.3 stable release (LP: #1749191)
    - ip6mr: fix stale iterator
    - net: igmp: add a missing rcu locking section
    - qlcnic: fix deadlock bug
    - qmi_wwan: Add support for Quectel EP06
    - r8169: fix RTL8168EP take too long to complete driver initialization.
    - tcp: release sk_frag.page in tcp_disconnect
    - vhost_net: stop device during reset owner
    - ipv6: addrconf: break critical section in addrconf_verify_rtnl()
    - ipv6: change route cache aging logic
    - Revert "defer call to mem_cgroup_sk_alloc()"
    - net: ipv6: send unsolicited NA after DAD
    - rocker: fix possible null pointer dereference in
      rocker_router_fib_event_work
    - tcp_bbr: fix pacing_gain to always be unity when using lt_bw
    - cls_u32: add missing RCU annotation.
    - ipv6: Fix SO_REUSEPORT UDP socket with implicit sk_ipv6only
    - soreuseport: fix mem leak in reuseport_add_sock()
    - net_sched: get rid of rcu_barrier() in tcf_block_put_ext()
    - net: sched: fix use-after-free in tcf_block_put_ext
    - media: mtk-vcodec: add missing MODULE_LICENSE/DESCRIPTION
    - media: soc_camera: soc_scale_crop: add missing
      MODULE_DESCRIPTION/AUTHOR/LICENSE
    - media: tegra-cec: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
    - gpio: uniphier: fix mismatch between license text and MODULE_LICENSE
    - crypto: tcrypt - fix S/G table for test_aead_speed()
    - Linux 4.15.3

  * bnx2x_attn_int_deasserted3:4323 MC assert! (LP: #1715519) //
    CVE-2018-1000026
    - net: create skb_gso_validate_mac_len()
    - bnx2x: disable GSO where gso_size is too big for hardware

* ethtool -p fails to light NIC LED on HiSilicon D05 systems (LP: #1748567)
- net: hns: add ACPI mode support for ethtool -p

  * CVE-2017-5715 (Spectre v2 Intel)
    - [Packaging] retpoline files must be sorted
    - [Packaging] pull in retpoline files

* [Feature] PXE boot with Intel Omni-Path (LP: #1712031)
- d-i: Add hfi1 to nic-modules

  * CVE-2017-5715 (Spectre v2 retpoline)
    - [Packaging] retpoline -- add call site validation
    - [Config] disable retpoline checks for first upload

* Do not dup...

This bug was fixed in the package linux - 4.15.0-10.11

---------------
linux (4.15.0-10.11) bionic; urgency=medium

* linux: 4.15.0-10.11 -proposed tracker (LP: #1749250)

* "swiotlb: coherent allocation failed" dmesg spam with linux 4.15.0-9.10
    (LP: #1749202)
    - swiotlb: suppress warning when __GFP_NOWARN is set
    - drm/ttm: specify DMA_ATTR_NO_WARN for huge page pools

* linux-tools: perf incorrectly linking libbfd (LP: #1748922)
    - SAUCE: tools -- add ability to disable libbfd
    - [Packaging] correct disablement of libbfd

* [Artful] Realtek ALC225: 2 secs noise when a headset plugged in
    (LP: #1744058)
    - ALSA: hda/realtek - update ALC225 depop optimize

* [Artful] Support headset mode for DELL WYSE (LP: #1723913)
    - SAUCE: ALSA: hda/realtek - Add support headset mode for DELL WYSE

* headset mic can't be detected on two Dell machines (LP: #1748807)
    - ALSA: hda/realtek - Support headset mode for ALC215/ALC285/ALC289
    - ALSA: hda - Fix headset mic detection problem for two Dell machines

* Bionic update to v4.15.3 stable release (LP: #1749191)
    - ip6mr: fix stale iterator
    - net: igmp: add a missing rcu locking section
    - qlcnic: fix deadlock bug
    - qmi_wwan: Add support for Quectel EP06
    - r8169: fix RTL8168EP take too long to complete driver initialization.
    - tcp: release sk_frag.page in tcp_disconnect
    - vhost_net: stop device during reset owner
    - ipv6: addrconf: break critical section in addrconf_verify_rtnl()
    - ipv6: change route cache aging logic
    - Revert "defer call to mem_cgroup_sk_alloc()"
    - net: ipv6: send unsolicited NA after DAD
    - rocker: fix possible null pointer dereference in
      rocker_router_fib_event_work
    - tcp_bbr: fix pacing_gain to always be unity when using lt_bw
    - cls_u32: add missing RCU annotation.
    - ipv6: Fix SO_REUSEPORT UDP socket with implicit sk_ipv6only
    - soreuseport: fix mem leak in reuseport_add_sock()
    - net_sched: get rid of rcu_barrier() in tcf_block_put_ext()
    - net: sched: fix use-after-free in tcf_block_put_ext
    - media: mtk-vcodec: add missing MODULE_LICENSE/DESCRIPTION
    - media: soc_camera: soc_scale_crop: add missing
      MODULE_DESCRIPTION/AUTHOR/LICENSE
    - media: tegra-cec: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
    - gpio: uniphier: fix mismatch between license text and MODULE_LICENSE
    - crypto: tcrypt - fix S/G table for test_aead_speed()
    - Linux 4.15.3

* bnx2x_attn_int_deasserted3:4323 MC assert! (LP: #1715519) //
    CVE-2018-1000026
    - net: create skb_gso_validate_mac_len()
    - bnx2x: disable GSO where gso_size is too big for hardware

* ethtool -p fails to light NIC LED on HiSilicon D05 systems (LP: #1748567)
    - net: hns: add ACPI mode support for ethtool -p

* CVE-2017-5715 (Spectre v2 Intel)
    - [Packaging] retpoline files must be sorted
    - [Packaging] pull in retpoline files

* [Feature] PXE boot with Intel Omni-Path (LP: #1712031)
    - d-i: Add hfi1 to nic-modules

* CVE-2017-5715 (Spectre v2 retpoline)
    - [Packaging] retpoline -- add call site validation
    - [Config] disable retpoline checks for first upload

* Do not duplicate changelog entries assigned to more than one bug or CVE
    (LP: #1743383)
    - [Packaging] git-ubuntu-log -- handle multiple bugs/cves better

-- Seth Forshee <seth.forshee@canonical.com>  Tue, 13 Feb 2018 11:33:58 -0600

Changed in linux (Ubuntu Bionic):
status:	Fix Committed → Fix Released

Revision history for this message

Stefan Bader (smb) wrote on 2018-03-19:

#108

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'. If the problem still exists, change the tag 'verification-needed-trusty' to 'verification-failed-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags:

added: verification-needed-trusty

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2018-03-22:

#109

I'm verifying for trusty and its taking sometime because looks like system still hangs on shutdown. I have generated a kdump (from a watchdog timer) in the exact moment trusty is hanging on shutdown to see why BUT crash can't handle a kdump from this new kernel and I still don't know why. Testing latest crash tool to see if its because of recent meltdown/spectre patches or something like it.

Rafael David Tinoco (rafaeldtinoco) on 2018-03-28

tags:

added: verification-failed-trusty
removed: verification-needed-trusty

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2018-03-28:

#110

Download full text (3.4 KiB)

I'm failing trusty's patch because:

1) It was not backported fully, missing the debug message at its bottom. That led me to initially think, when debugging, that the patch hasn't been included at all. Later I could check kernel tree and saw the it was manually merged, missing that debug message part.

2) Since functional piece of code was backported, it should, still, have worked, and it did not. Something in trusty is not making it to work like expected. I'm still investigating it through a kdump taken during a shutdown hang on trusty, but, I don't want to hold the kernel release because of that:

This is the output for my tests:

https://pastebin.ubuntu.com/p/B9fp5y5gqK/

And you can check (1) with:

[ 144.862511] session4: iscsi_eh_cmd_timed_out return nh

It should be "or shutdown" there, together.

Differently than other tests, in trusty's kernel (3.13) you can see:

[ 144.860010] session4: iscsi_eh_cmd_timed_out scsi cmd ffff880037a9a100 timedout
[ 144.861400] session4: iscsi_eh_cmd_timed_out sc on shutdown, handled
[ 144.862511] session4: iscsi_eh_cmd_timed_out return nh
[ 144.863483] session4: iscsi_queuecommand iscsi: cmd 0x3 is not queued (2)

[ 144.864794] session4: iscsi_eh_device_reset LU Reset [sc ffff880037a9a100 lun 1]
[ 144.865907] session4: iscsi_eh_device_reset dev reset result = FAILED
[ 144.866875] session4: iscsi_eh_target_reset tgt Reset [sc ffff880037a9a100 tgt iqn.2017.tgtd:tid5.lun]
[ 144.868343] session4: iscsi_eh_target_reset tgt iqn.2017.tgtd:tid5.lun reset result = FAILED
[ 144.869580] session4: iscsi_eh_session_reset wait for relogin

The iscsi transport layer, did not queue the other commands, after the first one timed out, but instead of warning the upper layer, it proceeded with a "device reset", causing the "relogin" logic to be stuck since the transport layer was already gone.

This is the hang backtrace:

crash> bt
PID: 5980 TASK: ffff880057e19800 CPU: 0 COMMAND: "halt"
#0 [ffff880037b53a50] __schedule at ffffffff8173af59
#1 [ffff880037b53ab8] schedule at ffffffff8173b3e9
#2 [ffff880037b53ac8] schedule_timeout at ffffffff8173a5d5
#3 [ffff880037b53b70] io_schedule_timeout at ffffffff8173bacb
#4 [ffff880037b53ba0] wait_for_completion_io_timeout at ffffffff8173c1e6
#5 [ffff880037b53bf8] blk_execute_rq at ffffffff8134b9fb
#6 [ffff880037b53ca8] scsi_execute at ffffffff814f1a77
#7 [ffff880037b53cf0] scsi_execute_req_flags at ffffffff814f2cfc
#8 [ffff880037b53d58] sd_sync_cache at ffffffff81500626
#9 [ffff880037b53dd0] sd_shutdown at ffffffff81500bb9
#10 [ffff880037b53df0] device_shutdown at ffffffff814a4495
#11 [ffff880037b53e20] kernel_power_off at ffffffff81096b75
#12 [ffff880037b53e30] SYSC_reboot at ffffffff81096d4b
#13 [ffff880037b53f70] sys_reboot at ffffffff81096ebe
#14 [ffff880037b53f80] system_call_fastpath at ffffffff81748170
    RIP: 00007ff48aa45bc6 RSP: 00007ffef9bc4be8 RFLAGS: 00010246
    RAX: 00000000000000a9 RBX: ffffffff81748170 RCX: 000000000000001e
    RDX: 000000004321fedc RSI: 0000000028121969 RDI: fffffffffee1dead
    RBP: 0000000000000000 R8: fefefefefefefeff R9: 0000000000000000
    R10: 00007ff48ad13c8c R11: 0000000000000202 R12: ffffffff8...

I'm failing trusty's patch because:

1) It was not backported fully, missing the debug message at its bottom. That led me to initially think, when debugging, that the patch hasn't been included at all. Later I could check kernel tree and saw the it was manually merged, missing that debug message part.

2) Since functional piece of code was backported, it should, still, have worked, and it did not. Something in trusty is not making it to work like expected. I'm still investigating it through a kdump taken during a shutdown hang on trusty, but, I don't want to hold the kernel release because of that:

This is the output for my tests:

https://pastebin.ubuntu.com/p/B9fp5y5gqK/

And you can check (1) with:

[  144.862511]  session4: iscsi_eh_cmd_timed_out return nh

It should be "or shutdown" there, together.

Differently than other tests, in trusty's kernel (3.13) you can see:

[  144.860010]  session4: iscsi_eh_cmd_timed_out scsi cmd ffff880037a9a100 timedout
[  144.861400]  session4: iscsi_eh_cmd_timed_out sc on shutdown, handled
[  144.862511]  session4: iscsi_eh_cmd_timed_out return nh
[  144.863483]  session4: iscsi_queuecommand iscsi: cmd 0x3 is not queued (2)

[  144.864794]  session4: iscsi_eh_device_reset LU Reset [sc ffff880037a9a100 lun 1]
[  144.865907]  session4: iscsi_eh_device_reset dev reset result = FAILED
[  144.866875]  session4: iscsi_eh_target_reset tgt Reset [sc ffff880037a9a100 tgt iqn.2017.tgtd:tid5.lun]
[  144.868343]  session4: iscsi_eh_target_reset tgt iqn.2017.tgtd:tid5.lun reset result = FAILED
[  144.869580]  session4: iscsi_eh_session_reset wait for relogin

The iscsi transport layer, did not queue the other commands, after the first one timed out, but instead of warning the upper layer, it proceeded with a "device reset", causing the "relogin" logic to be stuck since the transport layer was already gone.

This is the hang backtrace:

crash> bt
PID: 5980   TASK: ffff880057e19800  CPU: 0   COMMAND: "halt"
 #0 [ffff880037b53a50] __schedule at ffffffff8173af59
 #1 [ffff880037b53ab8] schedule at ffffffff8173b3e9
 #2 [ffff880037b53ac8] schedule_timeout at ffffffff8173a5d5
 #3 [ffff880037b53b70] io_schedule_timeout at ffffffff8173bacb
 #4 [ffff880037b53ba0] wait_for_completion_io_timeout at ffffffff8173c1e6
 #5 [ffff880037b53bf8] blk_execute_rq at ffffffff8134b9fb
 #6 [ffff880037b53ca8] scsi_execute at ffffffff814f1a77
 #7 [ffff880037b53cf0] scsi_execute_req_flags at ffffffff814f2cfc
 #8 [ffff880037b53d58] sd_sync_cache at ffffffff81500626
 #9 [ffff880037b53dd0] sd_shutdown at ffffffff81500bb9
#10 [ffff880037b53df0] device_shutdown at ffffffff814a4495
#11 [ffff880037b53e20] kernel_power_off at ffffffff81096b75
#12 [ffff880037b53e30] SYSC_reboot at ffffffff81096d4b
#13 [ffff880037b53f70] sys_reboot at ffffffff81096ebe
#14 [ffff880037b53f80] system_call_fastpath at ffffffff81748170
    RIP: 00007ff48aa45bc6  RSP: 00007ffef9bc4be8  RFLAGS: 00010246
    RAX: 00000000000000a9  RBX: ffffffff81748170  RCX: 000000000000001e
    RDX: 000000004321fedc  RSI: 0000000028121969  RDI: fffffffffee1dead
    RBP: 0000000000000000   R8: fefefefefefefeff   R9: 0000000000000000
    R10: 00007ff48ad13c8c  R11: 0000000000000202  R12: ffffffff81096ebe
    R13: ffff880037b53f78  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 00000000000000a9  CS: 0033  SS: 002b

Taken by a watchdog timeout on a hang kvm system that had network interface shutdown before the transport layer could log out.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2018-04-04:

#111

This bug was fixed in the package linux - 3.13.0-144.193

---------------
linux (3.13.0-144.193) trusty; urgency=medium

* linux: 3.13.0-144.193 -proposed tracker (LP: #1755227)

* CVE-2017-12762
- isdn/i4l: fix buffer overflow

* CVE-2017-17807
- KEYS: add missing permission check for request_key() destination

  * bnx2x_attn_int_deasserted3:4323 MC assert! (LP: #1715519) //
    CVE-2018-1000026
    - net: Add ndo_gso_check
    - net: create skb_gso_validate_mac_len()
    - bnx2x: disable GSO where gso_size is too big for hardware

* CVE-2017-17448
- netfilter: nfnetlink_cthelper: Add missing permission checks

* CVE-2017-11089
- cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE

* CVE-2018-5332
- RDS: Heap OOB write in rds_message_alloc_sgs()

* ppc64el: Do not call ibm,os-term on panic (LP: #1736954)
- powerpc: Do not call ppc_md.panic in fadump panic notifier

* CVE-2017-17805
- crypto: salsa20 - fix blkcipher_walk API usage

  * [Hyper-V] storvsc: do not assume SG list is continuous when doing bounce
    buffers (LP: #1742480)
    - SAUCE: storvsc: do not assume SG list is continuous when doing bounce
      buffers

* Shutdown hang on 16.04 with iscsi targets (LP: #1569925)
- scsi: libiscsi: Allow sd_shutdown on bad transport

* CVE-2017-17741
- KVM: Fix stack-out-of-bounds read in write_mmio

* CVE-2017-5715 (Spectre v2 Intel)
- [Packaging] pull in retpoline files

-- Stefan Bader <email address hidden> Thu, 15 Mar 2018 15:08:03 +0100

Changed in linux (Ubuntu Trusty):
status:	Fix Committed → Fix Released
status:	Fix Committed → Fix Released

Ubuntu
open-iscsi package

Shutdown hang on 16.04 with iscsi targets

Bug Description

CVE References

Other bug subscribers

Related questions

Bug attachments

Remote bug watches

	Status	Importance	Assigned to
linux (Ubuntu)	Fix Released	Medium	Rafael David Tinoco
Trusty	Fix Released	Medium	Rafael David Tinoco
Xenial	Fix Released	Medium	Rafael David Tinoco
Artful	Fix Released	Medium	Rafael David Tinoco
Bionic	Fix Released	Medium	Rafael David Tinoco
open-iscsi (Ubuntu)	Opinion	Medium	Rafael David Tinoco
Trusty	Opinion	Medium	Rafael David Tinoco
Xenial	Opinion	Medium	Rafael David Tinoco
Artful	Opinion	Medium	Rafael David Tinoco
Bionic	Opinion	Medium	Rafael David Tinoco

Ubuntuopen-iscsi package

Shutdown hang on 16.04 with iscsi targets

Bug Description

CVE References

Other bug subscribers

Related questions

Bug attachments

Remote bug watches

Ubuntu
open-iscsi package