timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests on LP Infra]

Bug #1732028 reported by Scott Moser on 2017-11-13
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
open-iscsi (Ubuntu)
Undecided
Unassigned
systemd (Ubuntu)
Medium
Unassigned

Bug Description

This issue keeps cropping up. It shows itself in open-iscsi autopkg tests.
I think it might just be "really slow system". It seems the timeout is only
 1 minute 30 seconds for the disk to appear, and in a happy run you
might see something very close:

[K[ [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for dev-disk…-UEFI.device (1min 29s / 1min 32s)
[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for dev-disk…-UEFI.device (1min 30s / 1min 32s)
[K[[0;32m OK [0m] Found device VIRTUAL-DISK UEFI.
         Mounting /boot/efi...

---
There is information on the open-iscsi tests at [1].
  [1] https://git.launchpad.net/~usd-import-team/ubuntu/+source/open-iscsi/tree/debian/tests/README-boot-test.md

The tests set up an iscsi target and boot a kvm guest off that read-only root with overlayroot.

# get open-iscsi source
$ apt-get source open-iscsi
$ cd open-iscsi-2.0.874/

$ sudo apt-get install -qy simplestreams tgt qemu-system-x86 \
     cloud-image-utils distro-info
$ cd open-iscsi-2.0.874/

## we're now mostly following debian/tests/README-boot-test.md
# download the image and get kernel/initrd
$ PATH=$PWD/debian/tests:$PATH
$ get-image bionic.d bionic
$ sudo `which patch-image` \
    --kernel=bionic.d/kernel --initrd=bionic.d/initrd bionic.d/disk.img

$ tgt-boot-test -v bionic.d/disk.img bionic.d/kernel bionic.d/initrd
....

Success is being able to log in with 'ubuntu' and 'passw0rd'.
Failure as seen in the log is dropping into an emergency shell.

Once inside (this was successful) you'll see a mostly sane system.
Some things to note:
a.) tgt-boot-test boots without kvm enabled. This is because using
 kvm with qemu in nested virt would cause system lockups. Its slower
but more reliable to go wtihout.
b.) under bug 1723183 I made overlayroot comment out the root filesystem
from the rendered /etc/fstab. That was because systemd got confused and
assumed that /media/root-ro had to be on top of /.
c.) you can enable or disable kvm by setting _USE_KVM=0 or _USE_KVM=1
   in your environment.

$ grep -v "^# " /etc/fstab
#
#
#LABEL=cloudimg-rootfs /media/root-ro/ ext4 ro,defaults,noauto 0 0
/media/root-ro/ / overlay lowerdir=/media/root-ro/,upperdir=/media/root-rw/overl
ay/,workdir=/media/root-rw/overlay-workdir/_ 0 0
LABEL=UEFI /boot/efi vfat defaults 0 0 # overlayroot:fs-unsupported

$ sudo blkid
/dev/sda1: LABEL="cloudimg-rootfs" UUID="7b1980bd-9102-4356-8df0-ec7a0c062411" TYPE="ext4" PARTUUID="c0b5ace0-4703-4667-babb-3d38137cab88"
/dev/sda15: LABEL="UEFI" UUID="B177-3CC9" TYPE="vfat" PARTUUID="0ab0b9fd-2c28-4724-857a-1559f0cf76ea"
/dev/sda14: PARTUUID="221662d6-cab0-4290-ba1c-e72acf2bf193"

$ cat /run/systemd/generator/local-fs.target.requires/boot-efi.mount
# Automatically generated by systemd-fstab-generator

[Unit]
SourcePath=/etc/fstab
Documentation=man:fstab(5) man:systemd-fstab-generator(8)
Before=local-fs.target

[Mount]
Where=/boot/efi
What=/dev/disk/by-label/UEFI
Type=vfat

Related bugs:
 * bug 1680197: Zesty deployments failing sporadically
 * bug 1723183: transient systemd ordering issue when using overlayroot
 * bug 1666573: transient systemd ordering cycle in boot with overlayroot ver read-only open-iscsi root

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: systemd 234-2ubuntu12
ProcVersionSignature: User Name 4.13.0-16.19-generic 4.13.4
Uname: Linux 4.13.0-16-generic x86_64
ApportVersion: 2.20.7-0ubuntu4
Architecture: amd64
Date: Mon Nov 13 21:06:36 2017
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
ProcEnviron:
 TERM=vt220
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: nomodeset iscsi_initiator=maas-enlist iscsi_target_name=tgt-boot-test-7xuhwl iscsi_target_ip=10.0.12.2 iscsi_target_port=3260 iscsi_initiator=maas-enlist ip=::::maas-enlist:BOOTIF ro net.ifnames=0 BOOTIF_DEFAULT=eth0 root=/dev/disk/by-path/ip-10.0.12.2:3260-iscsi-tgt-boot-test-7xuhwl-lun-1-part1 overlayroot=tmpfs console=ttyS0 ds=nocloud-net;seedfrom=http://10.0.12.2:32600/
SourcePackage: systemd
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: 1.10.2-1ubuntu1
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-artful
dmi.modalias: dmi:bvnSeaBIOS:bvr1.10.2-1ubuntu1:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-artful:cvnQEMU:ct1:cvrpc-i440fx-artful:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-artful
dmi.sys.vendor: QEMU

Related branches

Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :
Changed in systemd (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Scott Moser (smoser) wrote :

Note that this failure could just be due to slowness at this point.
The open-iscsi tests that are running are running in nested qemu with kvm disabled. So it is quite slow to boot.

We seem to be getting usually around 40 seconds for that to arrive. My statement is based on
messages like:

[ **] A start job is running for dev-disk…label-UEFI.device (34s / 1min 30s)
[ OK ] Found device VIRTUAL-DISK UEFI.

There isn't a good report on exactly what time it arrived other than the last 'XXs' second listed'
perhaps we could/should collect the journal from inside the vm into artifacts.

Scott Moser (smoser) wrote :

Is there a way to bump that 1m30s timeout from the kernel command line? Even image modification would be ok, as we're already modifying the image to insert the deb.

Scott Moser (smoser) wrote :

It looks like:
 x-systemd.device-timeout=
might be useful
https://www.freedesktop.org/software/systemd/man/systemd.mount.html

we'd have to update the /etc/fstab entry in 'patch-image' of the open-iscsi test, but we could do that and just bump it to 6m or something.

As some evidence, that this might have just been slow, here is one run that took 1m11 seconds:
 https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-bionic/bionic/amd64/o/open-iscsi/20171112_225036_bf82c@/log.gz

Scott Moser (smoser) wrote :

Well, I tried updating /etc/fstab
 https://code.launchpad.net/~smoser/ubuntu/+source/open-iscsi/+git/open-iscsi/+ref/fix/get-journal-publish-artifacts

but that doesnt help. It still fails and goes on after 90 seconds. I verified this by just changing the LABEL= to a different string that would never be there. It should have waited 6m, but didnt.

Scott Moser (smoser) wrote :

Bummer.
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-bionic/bionic/amd64/o/open-iscsi/20171122_204418_8ac74@/log.gz
shows

[ 43.793658] systemd[1]: media-root\x2dro.mount: Found ordering cycle on -.mount/start
[ 43.801914] systemd[1]: media-root\x2dro.mount: Found dependency on media-root\x2dro.mount/start
[ 43.812260] systemd[1]: media-root\x2dro.mount: Unable to break cycle starting with media-root\x2dro.mount/start
[ 43.822135] systemd[1]: Requested transaction contains an unfixable cyclic ordering dependency: Resource deadlock avoided

Yet, the *same* set of packages
 https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-bionic/bionic/amd64/o/open-iscsi/20171122_211728_e853a@/log.gz

this time had no mention of 'ordering cycle' and took 1m7 seconds for uefi device to appear.
So I really do think the failures are a matter of simply needing more time.

However, the 'ordering cycle' is concerning as I had thought that it shoudl have been fixed.

summary: - transient boot fail with overlayroot
+ transient boot fail with overlayroot [open-iscsi dep8 tests]
summary: - transient boot fail with overlayroot [open-iscsi dep8 tests]
+ transient boot fail with overlayroot [open-iscsi autopkg tests]

CE: Moving over some of the discussion that started on a related MP [1].

Smoser:
I put together some doc on how to run the https://hackmd.io/E0ydu7Y7QEe-kroPb6-OOA

That should help you to possibly recreate outside of the adt test harness.

[1]: https://code.launchpad.net/~paelzer/ubuntu/+source/open-iscsi/+git/open-iscsi/+merge/349799

I could confirm that I can run the guest that way and it uses the intended root=/dev/disk/by-path/ip-10.0.12.2:3260-iscsi-tgt-boot-test-b7X8g2-lun-1-part1.

The only and unfortunate difference to the issue when run on LP infra stays, that it works all of the time :-/
Note: I ran most of them in background after verifying once how they behave. But wanted to make sure they complete reproducibly.
a - manual login, no user data case 3/3
b - user data collect data to disk and shut down 3/3
c - user data collect just shut down 3/3
d - user data collect data to disk and shut down, no tty (nohup CMD > log 2>&1) 3/3
e - non interactive like autopkgtest would run it 3/3
f - non interactive like autopkgtest would run it forcing KVM mode 3/3

Next I was parsing all the logs that the fails on LP accumulated recently.
I found this errors whichi is interesting:
  iscsistart: initiator reported error (15 - session exists)
I realized this is present on ALL logs that we gathered.
But after thinking I had a lead on this I realized that the good cases had those messages as well.
Also found the mentioned "ordering cycle on media-root\x2dro.mount/start"
In good as well as bad logs.
So neither of these "is it"

Essentially the boot around the iscsi root has these steps with some noise in between - looking for differences in good/bad cases. They start the same even with sharing a few errors that seem to be red herrings:
[...] (early boot)
all logs (id changes) - Logging into tgt-boot-test-o3PlsL 10.1.1.2:3260,1
all logs - mounted filesystem with ordered data
on 7/17 logs (also good) - Found ordering cycle ...
only all bad cases - Dependency failed for Local File Systems
only all bad cases - Timed out waiting for device (devices change)
only all bad cases - Started Emergency Shell

That matches this bug here.

There also was a slight remaining uncertainty if this might be the detection of KVM being broken and running in nested KVm on Launchpad.
But I was able to confirm that it runs as intended in TCG mode:
   should_try_kvm = no. virt=kvm (nested kvm is finicky). set _USE_KVM=1 to force.

[1]: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-cosmic-ci-train-ppa-service-3325/cosmic/amd64/o/open-iscsi/20180719_104631_271d7@/log.gz

FYI: In Cosmic this currently blocks (and will block more soon due to python bound on it): qemu, debconf, python3, targetcli-fb, netifaces

It also is no more transient, but happens always on LP infrastructure recently (~20 reruns now).
Knowing it likely is "too slow" even being unsure why LP in particular is affected we can try to recreate knowing that.
I checked CPU consumption in:
- KVM mode (super low)
- TCG mode (~1.2 CPUs)

Adapting title as well ...

summary: - transient boot fail with overlayroot [open-iscsi autopkg tests]
+ timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests]
summary: - timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests]
+ timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests on
+ LP Infra]

It does not consume "a lot" of CPU but lets try still to slow it down into the error.
$ sudo apt install cgroup-tools
$ sudo cgcreate -g cpu:/cpulimited
# you might want to ensure that cpu.cfs_period_us: 100000
# e.g. Set 10% as hard limit
$ echo 10000 | sudo tee /sys/fs/cgroup/cpu,cpuacct/cpulimited/cpu.cfs_quota_us
# This was done on the Host, all tests run in the qemu guest (as on LP), move qemu in those cgroups
$ for task in $(ls -1 /proc/$(pidof qemu-system-x86_64)/task/); do echo $task | sudo tee -a /sys/fs/cgroup/cpu,cpuacct/cpulimited/tasks; done;

I tested different shares using TCG in the guest.
50% seemed to be a tradeoff that isn't too slow to totally fail or having you wait too much.
It gets slower, TCG&Emulation blocked at 50%, vcpus at ~12%

With that I was able to hit this locally:
[ 432.218071] systemd[1]: Timed out waiting for device dev-disk-by\x2dlabel-UEFI.device.
[ TIME ] Timed out waiting for device dev-disk-by\x2dlabel-UEFI.device.
[DEPEND] Dependency failed for /boot/efi.
[DEPEND] Dependency failed for Local File Systems.
[...]
You are in emergency mode. After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, "systemctl default" or "exit"
to boot into default mode.
Press Enter for maintenance
(or press Control-D to continue):

From here going step by step down to the smallest test (this was full autopkgtest).
And there we can again start trying tweaks as in comment #5 and comment #6 but hopefully iterate faster on them.

Running just with tgt-boot-test in interactive mode, no user-data and no extra disk.
Issue still reproducible with that.

Saving the image to try different modifications while being able to get back to the current state.

Then I started where Scott already experimented. The timeouts.
I was trying the big hammer first setting all of them.

# Inside:
sudo mount-image-callback --system-mounts disk.img /bin/bash

/etc/fstab:
LABEL=cloudimg-rootfs / ext4 defaults,x-systemd.device-timeout=600,x-systemd.mount-timeout=600,_netdev 0 0

$ mkdir etc/systemd/user.conf.d
$ echo DefaultTimeoutStartSec=600s > etc/systemd/user.conf.d/longtimeout.conf

I (accidentally) verified that without CPU limit this modified root disk would boot fine.
And will now run it with 40% over lunchtime (a bit less to more surely trigger it - and since I don't "wait" for it to complete lower % won't hurt).

But that still ran into the issues.
At least I have an emergency shell there interactively now that I can take a look before the next tries.
Also not only disk fails but also (probably dependency) Timed out waiting for device dev-ttyS0.device.

Scott Moser (smoser) wrote :
description: updated
Scott Moser (smoser) wrote :

i attached as reference
 https://hackmd.io/E0ydu7Y7QEe-kroPb6-OOA
at some point we can improve the doc in the debian/tests directory with that.

I worked on the two messages and related boot messages:
1. the ordering cycle is about media-root, maybe the config is not yet perfect
2. timeouts on
     dev-ttyS0.device
     dev-disk-by\x2dlabel-UEFI.device
Hmm, that is:
  Dependency failed for /boot/efi.
  Dependency failed for Local File Systems.
  local-fs.target: Job local-fs.target/start failed

I checked for network coming up and was first confused between
  systemd[1]: Failed to start Network Service.
And network in general coming up very late.
In a log:
  eth0: DHCPv4 address 10.0.12.15/24
  systemd-networkd[529]: eth0: Configured
In one of my cases ~10 minutes later than I got into the emergency console.

Afterwards boot-efi.mount autocompleted just fine btw.

Early boot is a swarm of red herrings:
I was checking more like
  Started Remount Root and Kernel File Systems.
before
  Started Network Service
It seems the latter just depends on time synchronization and therefore waits.
  Reached target Network is Online.
is before it as it should be.

And as trivial as that may sound, the network seems to be related at least.
Of all the logs only the good cases reach:
  Reached target Network is Online
But that is only reachable after local-fs which in the bad case fails - so it is ok to not be reached if the root is missing.

I also checked if the ordering changed e.g. compared to Bionic?
But it seemed - at least for network init similar in old bad&good cases.

Arr - early boot dependencies are lovely :-)
Following a discussion about fstab modifications next ...

In a discussion with smoser we decided to try just clearing /etc/fstab - it might be sufficient for our test intention and at the same time avoid the issue.
With that done we still have the following failing with timeouts:
  [ TIME ] Timed out waiting for device dev-ttyS0.device.
  [FAILED] Failed to start Journal Service.

Not sure why the first happens, but the second is due to the lack of a root FS.

In my slowed down environments it boots REALLY slow then, like 30 minutes.
Several services are unforgiving of the root not being there.
Especially snap that was getting into a loop of Starting/Stopping/... and by that did not let it complete.

The former try was too hard (intentionally), so I stepped back to just replace the labels with the (in this case) known partition IDs /dev/sda1 and /dev/sda15.
This I first booted non slowed to check if the config works at all.
It works

Note: non slowed the TCG-system is at 4*100% for the vcpus and 400% for TCG. ~800% vs ~98% is a slowdown by 1/8 which matches what I see as effective slowdown.

A few more tries together with Scott found that disabling the fstab entry for EFI, but keeping the main root as-is.
Again we verified this config is valid (boot without CPU limit) and then we set the limit.
With that finally it seems to work.
Lets code that up in open-iscsi and put it to a PPA to test on LP infra against it.

Due to snapd still lopoing in this case (bug 1782602)
The sum of changes is:
$ systemctl disable snapd.seeded.service
$ systemctl disable snapd.service

cat /etc/fstab
LABEL=cloudimg-rootfs / ext4 defaults 0 0
#LABEL=UEFI /boot/efi vfat defaults 0 0

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package open-iscsi - 2.0.874-5ubuntu5

---------------
open-iscsi (2.0.874-5ubuntu5) cosmic; urgency=medium

  * Harden dep8 tests against effects due to slow execution on Launchpad
    infrastructure (LP: #1732028).
    - debian/tests/patch-image: remove problematic fstab entries
    - debian/tests/tgt-boot-test: ran xkvm in verbose mode

 -- Christian Ehrhardt <email address hidden> Thu, 19 Jul 2018 18:22:39 +0200

Changed in open-iscsi (Ubuntu):
status: New → Fix Released

2.0.874-5ubuntu5 migrated and had the first good new run.
Retriggering all those currently blocked on it ...

\o/
All worked on the first run now, this gives some confidence that the flakyness is gone for now.

2.0.874-5ubuntu5 debconf/1.5.68 2018-07-20 06:12:24 pass
2.0.874-5ubuntu5 targetcli-fb/2.1.43-2 2018-07-20 06:05:57 pass
2.0.874-5ubuntu5 qemu/1:2.12+dfsg-3ubuntu3 2018-07-20 06:05:48 pass
2.0.874-5ubuntu5 python3-defaults/3.6.6-1 2018-07-20 06:05:44 pass
2.0.874-5ubuntu5 netifaces/0.10.4-1build1 2018-07-20 06:04:45 pass

If said runtime increase of the test is found to be affecting e.g. Bionic SRUs as well blocking on opne-iscsi more than in the past then we will need to consider backporting the fix.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers