[2.5] iSCSI systemd services fails and blocks for 1 min 30 seconds

Bug #1792905 reported by Blake Rouse on 2018-09-17
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
High
Unassigned
cloud-images
Critical
Unassigned
cloud-initramfs-tools (Ubuntu)
High
Unassigned
Xenial
High
Scott Moser
Bionic
High
Scott Moser
Cosmic
High
Unassigned
livecd-rootfs (Ubuntu)
Undecided
Unassigned
Xenial
Undecided
Unassigned
Bionic
Undecided
Unassigned
Cosmic
Undecided
Unassigned
open-iscsi (Ubuntu)
Low
Unassigned
Xenial
Low
Unassigned
Bionic
Low
Unassigned
Cosmic
Low
Unassigned

Bug Description

[Impact]

 * Affects environments where the base image is read-only but kernel modules are copied from the initramfs to the real root via cloud-initramfs-copymods package.

 * This affects users of our stable release images available from http://cloud-images.ubuntu.com.

 * The attached fixes ensure /lib/modules always exists by creating it explicitly instead of relying on it to come from a package.

[Test Case]

 * Download http://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.squashfs

 * Unpack it via `sudo unsquashfs bionic-server-cloudimg-amd64.squashfs`

 * Inspect the unpacked root filesystem and find that '/lib/modules' is missing.

 * Install local build scripts as described at https://github.com/chrisglass/ubuntu-old-fashioned (note: you will need ubuntu-old-fashioned master for cosmic)

* Re-build the images using the updated livecd-rootfs package.

* Unpack the resulting livecd.ubuntu-cpc.squashfs artifact using unsquashfs again.

* Inspect the unpacked root filesystem and find that '/lib/modules' exists.

* It is pure luck that package purges which are done analogously in Cosmic image builds do not remove '/lib/modules', hence this fix is introduced there, as well.

* Xenial is not affected.

* Test builds were carried out for Cosmic and Bionic with the expected results.

[Regression Potential]

 * This is a fix to a regression. The existence of the directory had previously been ensured, but the mkdir call got lost in recent re-factoring. See also:

https://bazaar.launchpad.net/~ubuntu-core-dev/livecd-rootfs/bionic-proposed/revision/1678

https://bazaar.launchpad.net/~ubuntu-core-dev/livecd-rootfs/trunk/revision/1681

 * Packaging tools should not take offense at the existence of a directory, even if it was not part of a package. So potential for unforseeable regressions is very low.

===ORIGINAL BUG DESCRIPTION===

Let me first start with saying MAAS is *not* using iSCSI anymore and is *NOT* in this case either.

For some reason now using enlistment, commissioning, and deploying the ephemeral environment will block for 1 min 30 seconds waiting for the iSCSI daemon to succeed, which it never does.

This increases the boot time drastically.

Related branches

no longer affects: open-iscsi
Blake Rouse (blake-rouse) wrote :

Here is the output log:

http://paste.ubuntu.com/p/KfTWC5ghwR/

You can't really see the iSCSI error in the console log.

Scott Moser (smoser) wrote :

See bug 1543204 for more information.

The change that caused this regression was in ubuntu released bionic images.
Between 20180831 and 20180911 the /lib/modules directory disappeared.

That means that 'copymods' cannot copy modules from initramfs into the
root filesystem, and without that, open-iscsi is behaving oddly.

We can/should make open-iscsi not block in this case as whatever it is
waiting for is probably not going to ever arrive (possibly a kernel module?).

But we want/need the /lib/modules directory in the images or copymods
can't really do what it does.

Changed in open-iscsi (Ubuntu):
status: New → Confirmed
importance: Undecided → Low
Changed in cloud-images:
status: New → Triaged
importance: Undecided → High
Scott Moser (smoser) wrote :

I want to be clear and state that there will be other fallout from not having any modules in the ephemeral environment. Just fixing open-iscsi to not run or not fail doesn't solve the problem.

Also note that if you boot the ephemeral environment without 'ro' on the command line, then this particular case will not be a problem, but that isn't really a solution either.

Scott Moser (smoser) wrote :

just for reference, here is iscsid status when it fails.
the error is 'can not create NETLINK_ISCSI socket'.

I'm confused as to why, but it will start later in boot fine.

$ systemctl status iscsid.service --no-pager --full
● iscsid.service - iSCSI initiator daemon (iscsid)
   Loaded: loaded (/lib/systemd/system/iscsid.service; enabled; vendor preset: enabled)
   Active: failed (Result: timeout) since Wed 2018-09-19 10:20:28 UTC; 2min 38s ago
     Docs: man:iscsid(8)

Sep 19 10:18:57 ubuntu systemd[1]: Starting iSCSI initiator daemon (iscsid)...
Sep 19 10:18:57 ubuntu iscsid[758]: iSCSI logger with pid=765 started!
Sep 19 10:18:57 ubuntu systemd[1]: iscsid.service: Failed to parse PID from file /run/iscsid.pid: Invalid argument
Sep 19 10:18:57 ubuntu iscsid[765]: iSCSI daemon with pid=767 started!
Sep 19 10:18:57 ubuntu iscsid[765]: can not create NETLINK_ISCSI socket
Sep 19 10:20:28 ubuntu systemd[1]: iscsid.service: Start operation timed out. Terminating.
Sep 19 10:20:28 ubuntu systemd[1]: iscsid.service: Failed with result 'timeout'.
Sep 19 10:20:28 ubuntu systemd[1]: Failed to start iSCSI initiator daemon (iscsid).

Francis Ginther (fginther) wrote :

It appears that the reason for lack of `/lib/modules` is the result of removal of the kernel image and modules happening in a slightly different order then typical. In the case of 20180911, the kernel image was removed first, then the kernel modules. This resulted in complete clean-up of `/lib/modules`. This order appears to be arbitrary and we just got lucky (or unlucky) this time.

We can make image build changes to ensure that `/lib/modules` is present in the bionic and xenial squashfs. However, is there a different solution we should be pursuing for cosmic that doesn't require this directory to be present?

Finally, what is the urgency of getting this issue resolved.

Blake Rouse (blake-rouse) wrote :

Urgency is high, this slows down every booting machine of MAAS by 1:30 seconds, that is a long boot time just to get the OS to install or to commission the machine.

Scott Moser (smoser) wrote :

Urgency is critical.
Customer deployments that use modules will be broken. This would include vfat, zfs... many things.

I believe this is the root cause of Christian's maas deployment with console log at
 http://paste.ubuntu.com/p/BMS4dbW4XD/

Changed in cloud-images:
importance: High → Critical
Scott Moser (smoser) wrote :

I put a pull request to upstream open-iscsi to improve its failure path here : https://github.com/open-iscsi/open-iscsi/pull/127

description: updated
Francis Ginther (fginther) wrote :

Change not needed for xenial, livecd-rootfs is already creating '/lib/modules' on the rootfs tarball.

Changed in livecd-rootfs (Ubuntu Xenial):
status: New → Invalid
Tobias Koch (tobijk) on 2018-09-20
description: updated
Tobias Koch (tobijk) on 2018-09-20
description: updated
description: updated
description: updated
Tobias Koch (tobijk) wrote :

debdiff for Cosmic.

description: updated
description: updated
description: updated
Tobias Koch (tobijk) wrote :

debdiff for Bionic.

Tobias Koch (tobijk) on 2018-09-20
description: updated

The attachment "debdiff for Cosmic." seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Scott Moser (smoser) on 2018-09-20
Changed in open-iscsi (Ubuntu Xenial):
importance: Undecided → Low
status: New → Confirmed
Changed in open-iscsi (Ubuntu Bionic):
importance: Undecided → Low
status: New → Confirmed
description: updated
Scott Moser (smoser) wrote :

Hi,
Can you please add a gating check on image publication that /lib/modules directory exists?

Robert C Jennings (rcj) wrote :

smoser,

The cloud-initramfs-copymods package is in the server seed and it depends on /lib/modules existing for RO root FS. So it can own the directory and we'll fix this in the least magic way possible.

https://code.launchpad.net/~rcj/cloud-initramfs-tools/+git/cloud-initramfs-tools/+merge/355409

Changed in livecd-rootfs (Ubuntu Cosmic):
status: New → Fix Committed
Changed in livecd-rootfs (Ubuntu Bionic):
status: New → In Progress
Dimitri John Ledkov (xnox) wrote :

All the things sound great here, imho we should do all of them.

Steve Langasek (vorlon) on 2018-09-20
summary: - [2.5] iSCSI systemd services fails and blocks for 1 min 30 secconds
+ [2.5] iSCSI systemd services fails and blocks for 1 min 30 seconds
tags: added: id-5ba344692475d642386a6bcb
Scott Moser (smoser) on 2018-09-20
Changed in cloud-initramfs-tools (Ubuntu Cosmic):
importance: Undecided → High
status: New → Confirmed
Changed in cloud-initramfs-tools (Ubuntu Bionic):
importance: Undecided → High
status: New → Confirmed
Changed in cloud-initramfs-tools (Ubuntu Xenial):
importance: Undecided → High
status: New → Confirmed

Hello Blake, or anyone else affected,

Accepted livecd-rootfs into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/livecd-rootfs/2.525.9 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in livecd-rootfs (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-initramfs-tools - 0.43ubuntu1

---------------
cloud-initramfs-tools (0.43ubuntu1) cosmic; urgency=medium

  [ Robert Jennings ]
  * copymods: Take ownership of lib/modules (LP: #1792905)
  * debian/control: Update Vcs-* to point to git.

 -- Scott Moser <email address hidden> Thu, 20 Sep 2018 08:43:53 -0400

Changed in cloud-initramfs-tools (Ubuntu Cosmic):
status: Confirmed → Fix Released
Scott Moser (smoser) on 2018-09-20
Changed in cloud-initramfs-tools (Ubuntu Xenial):
assignee: nobody → Scott Moser (smoser)
status: Confirmed → In Progress
Changed in cloud-initramfs-tools (Ubuntu Bionic):
assignee: nobody → Scott Moser (smoser)
status: Confirmed → In Progress
Scott Moser (smoser) wrote :

I uploaded to
 * cosmic 0.43ubuntu1
 * xenial 0.27ubuntu1.5
 * bionic 0.40ubuntu1.1

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package livecd-rootfs - 2.540

---------------
livecd-rootfs (2.540) cosmic; urgency=medium

  * Ensure /lib/modules exists in root tarballs and sqashfs.
    (LP: #1792905)

 -- Tobias Koch <email address hidden> Thu, 20 Sep 2018 09:38:34 +0200

Changed in livecd-rootfs (Ubuntu Cosmic):
status: Fix Committed → Fix Released
Scott Moser (smoser) wrote :

This is still broken in bionic images 20180927.
I realize it is just 'fix committed', but we really need the fix.
customer MAAS deployments are delayed by ~90 seconds in the best case scenario and will fail installation in other cases.

What all needs to happen to get an updated bionic image with the fix?

Adam Conrad (adconrad) wrote :

Hello Blake, or anyone else affected,

Accepted cloud-initramfs-tools into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-initramfs-tools/0.40ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-initramfs-tools (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in cloud-initramfs-tools (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed-xenial
Adam Conrad (adconrad) wrote :

Hello Blake, or anyone else affected,

Accepted cloud-initramfs-tools into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-initramfs-tools/0.27ubuntu1.6 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Scott Moser (smoser) wrote :

For the cloud-initramfs-tools upload, I'm provding the following as a SRU template.
Rather than updating the description above.

=== Begin SRU Tempate cloud-initramfs-tools ===
[Impact]
The designed purpose of cloud-initramfs-copymods is to copy modules
from the initramfs to /lib/modules/<uname -r> if there are not any modules
there. The reasoning is that if there are no modules in the root
filesystem then it is better to have whatever modules were in the initramfs
than no modules at all.

This is accomplished by mounting a tmpfs over /lib/modules and then
putting the contents there.

The problem as seen in this bug is that if there is no /lib/modules directory
in the root image, then we can't mount the tmpfs over the top.

The solution provided for this package was to add the directory to
debian/dirs. Thus, making the directory created and "owned" by
cloud-initramfs-copymods package. Wherever it is installed, the /lib/modules
directory will then be present.

[Test Case]
The simplest test case is to just install the package and then:

  $ dpkg -S /lib/modules | fmt --width=1 | grep cloud-initramfs-copymods

Expected output is:

  $ dpkg -S /lib/modules | fmt --width=1 | grep cloud-initramfs-copymods
  cloud-initramfs-copymods,

if cloud-initramfs-copymods is not listed (with or without the trailing ',')
then the fix is not present.

[Regression Potential]
There should be extremely low potential here for regression.
There are already multiple packages (linux-*) that include /lib/modules
in their list of files so simply having multiple packages own that path
is known not to be a problem.

=== End SRU Tempate cloud-initramfs-tools ===

Scott Moser (smoser) wrote :

I've verified the cloud-initramfs-tools fix (cloud-initramfms-copymods) is in place in xenial, see attached.

Scott Moser (smoser) wrote :

verified cloud-initramfs-tools in bionic. see attached.

tags: added: verification-done-xenial
removed: verification-needed-xenial
tags: added: verification-done-bionic
removed: verification-needed-bionic
Scott Moser (smoser) wrote :

I've marked this verification-done-bionic based on the fix in cloud-initramfs-tools.
I'm not sure what the right tag-status is to indicate that the livecd-rootfs change has not actually been verified.

Anyone who feels motivated can fix that, just please justify.

Francis Ginther (fginther) wrote :

verification-done-bionic is also complete for livecd-rootfs. Steps taken to test:

 * Get the appropriate source package:
$ pull-lp-source livecd-rootfs bionic-proposed
pull-lp-source: Downloading livecd-rootfs version 2.525.9
pull-lp-source: Downloading livecd-rootfs_2.525.9.tar.xz from archive.ubuntu.com (0.098 MiB)
dpkg-source: info: extracting livecd-rootfs in livecd-rootfs-2.525.9
dpkg-source: info: unpacking livecd-rootfs_2.525.9.tar.xz

 * Execute an image build:
$ sudo -E old-fashioned-image-build --series bionic

 * Extract the resulting squashfs:
$ unsquashfs -d /tmp/squashfs/bionic livecd.ubuntu-cpc.squashfs

 * Check for /lib/modules
$ ls -la /tmp/squashfs/bionic/lib/modules
total 8
drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 2 18:47 .
drwxr-xr-x 20 ubuntu ubuntu 4096 Oct 2 18:47 ..

The verification of the Stable Release Update for livecd-rootfs has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package livecd-rootfs - 2.525.9

---------------
livecd-rootfs (2.525.9) bionic; urgency=medium

  * Ensure /lib/modules exists in root tarballs and sqashfs.
    (LP: #1792905)

 -- Tobias Koch <email address hidden> Thu, 20 Sep 2018 09:30:34 +0200

Changed in livecd-rootfs (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in maas:
status: Triaged → Invalid
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-initramfs-tools - 0.40ubuntu1.1

---------------
cloud-initramfs-tools (0.40ubuntu1.1) bionic; urgency=medium

  [ Robert Jennings ]
  * copymods: Take ownership of lib/modules (LP: #1792905)
  * debian/control: Update Vcs-* to point to git.

 -- Scott Moser <email address hidden> Thu, 20 Sep 2018 09:29:41 -0400

Changed in cloud-initramfs-tools (Ubuntu Bionic):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-initramfs-tools - 0.27ubuntu1.6

---------------
cloud-initramfs-tools (0.27ubuntu1.6) xenial; urgency=medium

  [ Robert Jennings ]
  * copymods: Take ownership of lib/modules (LP: #1792905)
  * debian/control: Update Vcs-* to point to git.

 -- Scott Moser <email address hidden> Thu, 20 Sep 2018 09:39:52 -0400

Changed in cloud-initramfs-tools (Ubuntu Xenial):
status: Fix Committed → Fix Released
Scott Moser (smoser) on 2019-01-07
Changed in open-iscsi (Ubuntu):
status: Confirmed → Invalid
Changed in open-iscsi (Ubuntu Xenial):
status: Confirmed → Invalid
Changed in open-iscsi (Ubuntu Bionic):
status: Confirmed → Invalid
Changed in open-iscsi (Ubuntu Cosmic):
status: Confirmed → Invalid
Changed in cloud-images:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers