Ubuntu
debian-installer package

Unable to network boot Ubuntu 16.04 installer normally on Briggs

Bug #1615021 reported by bugproxy on 2016-08-19

This bug affects 1 person

	Status	Importance	Assigned to
busybox (Ubuntu)	Fix Released	Undecided	Unassigned
Xenial	Won't Fix	Undecided	Unassigned
Yakkety	Fix Released	Undecided	Unassigned
debian-installer (Ubuntu)	Invalid	Undecided	Taco Screen team
Xenial	Invalid	Undecided	Unassigned
Yakkety	Invalid	Undecided	Taco Screen team
systemd (Ubuntu)	Fix Released	Undecided	Martin Pitt
Xenial	Fix Released	Undecided	Martin Pitt
Yakkety	Fix Released	Undecided	Martin Pitt

Bug Description

== Comment: #7 - Guilherme Guaglianoni Piccoli <email address hidden> - 2016-08-19 10:08:07 ==
The normal procedure to perform a Netboot installation of Ubuntu 16.04 is to download the latest vmlinux and initrd.gz files available, and kexec them with no parameters (at least in ppc64el).

We're experiencing a strange issue in which the installer freezes before menus are showed. The system hangs in the point specified below, right after the i40e driver initialization:

[ 11.052832] i40e 0002:01:00.0 enP2p1s0f0: renamed from eth0
[ 11.073976] i40e 0002:01:00.1 enP2p1s0f1: renamed from eth1
[ 11.117799] i40e 0002:01:00.2 enP2p1s0f2: renamed from eth2
[ 11.225745] i40e 0002:01:00.3 enP2p1s0f3: renamed from eth3
***HANG***

The most difficult part in this issue is that it seems to be a timing issue/race condition, and many debug trials end up by avoiding the issue reproduction (heisenbug).

We were successful though in getting logs by booting the kernel with the command-line "BOOT_DEBUG=2" and by changing the initrd in order to enable systemd debug; only the files "init" and "start-udev" were changed in initrd, both attached here.

We've attached here a saved screen session that shows the entire boot process until it gets flooded with lots of messages like:

"starting '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules'
'/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules'(err) 'failed to execute '/bin/readlink' '/bin/readlink /etc/
udev/rules.d/80-net-setup-link.rules': No such file or directory'

seq 3244 queued, 'add' 'pci_bus'
starting '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules'
passed 408 byte device to netlink monitor 0x1003cfe8020seq 3236 running'/bin/readlink /etc/udev/rules.d/80-net-setup-l
ink.rules'(err) 'failed to execute '/bin/readlink' '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules': No such
file or directory'
'/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules'(err) 'failed to execute '/bin/readlink' '/bin/readlink /etc/
udev/rules.d/80-net-setup-link.rules': No such file or directory'
Process '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules' failed with exit code 2.
PROGRAM '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules' /lib/udev/rules.d/73-usb-net-by-mac.rules:6
passed device to netlink monitor 0x1003d01f730
"

Then it keeps hanged in this stage. We re-tested it by changing the file 73-usb-net-by-mac.rules in initrd, replacing " /etc/udev/rules.d/80-net-setup-link.rules" to "/lib/udev/rules.d/80-net-setup-link.rules", since the former does not exist whereas the latter does. Same issue were observed!

Notice that if we boot the installer with command-line "net.ifnames=0" or "net.ifnames=1", the problem does not reproduces anymore.

We want to ask Canonical's help in investigating this issue.
Thanks,

Guilherme

SRU INFORMATION for systemd
===========================

Test case:
* Check what happens for uevents on devices which are not USB network interfaces:
udevadm test /sys/devices/virtual/mem/null
udevadm test /sys/class/net/lo

With the current version these will run

PROGRAM '/bin/readlink /etc/udev/rules.d/80-net-setup-link.rules' /lib/udev/rules.d/73-usb-net-by-mac.rules:6

which is pointless. With the proposed version these should be gone.

* Ensure that the rule still works as intended by connecting an USB network device that has a permanent MAC address (e. g. Android tethering uses a temporary MAC): You should get a MAC-based name like "enx12345678" for it. Now disconnect it again, disable ifnames with

sudo ln -s /dev/null /etc/udev/rules.d/80-net-setup-link.rules

and reconnect the device. You should now get a kernel name like "usb0" for it.

* Regression potential: Errors in the rule could break persistent naming - or its disabling - of USB network interfaces. Running the above test carefully is important to ensure this keeps working. This has little to no actual effect on anything else on the system (aside from a performance impact and spamming logs), so overall the regression potential is low.

See original description

Tags:

Revision history for this message

bugproxy (bugproxy) wrote on 2016-08-19: screen session output

screen session output Edit (416.0 KiB, text/plain)

Default Comment by Bridge

tags:

added: architecture-ppc64le bugnameltc-145180 severity-high targetmilestone-inin16041

Revision history for this message

bugproxy (bugproxy) wrote on 2016-08-19: init (modified on initrd)

init (modified on initrd) Edit (679 bytes, text/plain)

Default Comment by Bridge

Revision history for this message

bugproxy (bugproxy) wrote on 2016-08-19: start-udev (modified on initrd)

start-udev (modified on initrd) Edit (531 bytes, text/plain)

Default Comment by Bridge

Changed in ubuntu:
assignee:	nobody → Taco Screen team (taco-screen-team)
affects:	ubuntu → systemd (Ubuntu)

Breno Leitão (breno-leitao) on 2016-08-23

Changed in systemd (Ubuntu):
status:	New → Confirmed

Revision history for this message

Steve Langasek (vorlon) wrote on 2016-08-23:

Examining the initrd shows that readlink is provided as /usr/bin/readlink -> /bin/busybox, not as /bin/readlink where systemd expects it (and where it's shipped on an installed system). This is a bug in debian-installer's construction of that image - though gee it would be nice if systemd didn't require hard-coded paths to everything.

There's no guarantee that fixing the bug that's causing this error message will fix the underlying problem preventing your boot, but it will at least fix the message spam.

affects:

systemd (Ubuntu) → debian-installer (Ubuntu)

Revision history for this message

Steve Langasek (vorlon) wrote on 2016-08-23:

I've thought about this some more, and while the /bin/readlink /usr/bin/readlink in busybox is a bug, fixing this is definitely not going to fix the problem in the installer. In the installer, /etc/udev/rules.d/80-net-setup-link.rules will never exist since this is an admin override; so the readlink command - if it existed - would still return false. I'm reasonably sure the lack of /bin/readlink is not causing the udev rule to behave differently; so it's sufficient to fix this particular issue for 16.10 and later and not SRU it.

What is *more* of an issue is that the structure of /lib/udev/rules.d/73-usb-net-by-mac.rules causes a separate call out to readlink for every single udev event, because the readlink check happens *before* checking the ACTION/SUBSYSTEM/SUBSYSTEMS attributes of the event, unless net.ifnames=0 is set.

So regardless of whether this is the root cause of the install failure, this udev rule is causing hundreds of thousands of extra calls out to /bin/readlink on boot, which should definitely be fixed by reordering these checks.

Martin, can you please look into fixing this for xenial+yakkety?

Changed in systemd (Ubuntu):
assignee:	nobody → Martin Pitt (pitti)
status:	New → Triaged
Changed in busybox (Ubuntu Xenial):
status:	New → Won't Fix
Changed in busybox (Ubuntu Yakkety):
status:	New → Fix Committed
Changed in debian-installer (Ubuntu Xenial):
status:	New → Triaged
Changed in debian-installer (Ubuntu Yakkety):
status:	Confirmed → Triaged
Changed in systemd (Ubuntu Xenial):
status:	New → Triaged
assignee:	nobody → Martin Pitt (pitti)

Martin Pitt (pitti) on 2016-08-24

description:

updated

Revision history for this message

Martin Pitt (pitti) wrote on 2016-08-24:

Thanks for reporting this! Indeed this is a silly rule construction, *brown paperbag*. I fixed this for the next Debian/Yakkety upload in https://anonscm.debian.org/cgit/pkg-systemd/systemd.git/commit/?id=b42e1f8af2 and backported it to Xenial in https://anonscm.debian.org/cgit/pkg-systemd/systemd.git/commit/?h=ubuntu-xenial&id=d244c9acd .

Changed in systemd (Ubuntu Yakkety):
status:	Triaged → Fix Committed
Changed in systemd (Ubuntu Xenial):
status:	Triaged → In Progress

Revision history for this message

bugproxy (bugproxy) wrote on 2016-08-24: Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-08-24 12:34 EDT-------
Thanks very much vorlon and pitti. Pretty nice findings!

But...the issue still persists. I'll summarize the tests I made:

1) Firstly, I changed the link /usr/bin/readlink and, as vorlon predicted, this didn't solve the issue.

2) Then, independently of (1), I applied pitti's patch to xenial's "73-usb-net-by-mac.rules" and...unfortunately it also didn't solve the issue.

What impress me more is the difficult/interference of the simplest debug on the issue! After testing pitti's patch, still with the patch applied, I changed the start-udev load like this:

(before)
SYSTEMD_LOG_LEVEL=notice /lib/systemd/systemd-udevd --daemon --resolve-names=never
(after my change)
SYSTEMD_LOG_LEVEL=debug /lib/systemd/systemd-udevd --daemon --resolve-names=never --debug

Well, the issue reproduced and I didn't see a single extra log message.

After this, I kept both pitti's patch and this systemd debug parameter, but I booted with command-line "BOOT_DEBUG=1". Guess what? I was flooded by messages but the installer showed up. This is really weird for me...I'll attach a screen session of this last trial.

I appreciate any suggestion you have to debug the issue further - by the way, using "net.ifnames=1" workarounds the issue too. Basically, any command-line option seems to solve it, even the simplest debug parameter.

Thanks very much for the help and advice,

Guilherme

Revision history for this message

bugproxy (bugproxy) wrote on 2016-08-24: NEW screen output

NEW screen output Edit (561.9 KiB, text/plain)

------- Comment (attachment only) From <email address hidden> 2016-08-24 12:39 EDT-------

Revision history for this message

Steve Langasek (vorlon) wrote on 2016-08-24:

If this screen output is for a case when the installer *did* show up, I don't think it's going to tell us much about where things have hung in the case that it *didn't* show up.

If there's a particular invocation of udev that lets you reproduce the problem, I suggest sticking with that, and capturing the output of 'udevadm info -e' (possibly by using a fixed delay).

Revision history for this message

Martin Pitt (pitti) wrote on 2016-08-25:

#10

I don't actually know what BOOT_DEBUG does -- I've never seen it before, it does not appear anywhere in my yakkety system, and it's for sure not something the kernel, initramfs-tools, or systemd look at. My best guess is that this is a debian-installer specific debug flag.

So from what I can tell, the readlink path issue is merely a red herring -- it's good to fix it of course, but it's unrelated to the boot failure.

Since this is a heisenbug, it rather seems to me that this is some timing issue -- any extra debugging, or time spent with changing boot parameters in the boot loader will change the behaviour (e. g. make the detection of network devices by the hardware finish earlier).

ATM I'm afraid there isn't enough useful information here yet to understand what's going on -- indeed having a screen output where the problem does happen would be helpful. dmesg logs and "udevadm info -e" as well, as Steve says.

Revision history for this message

bugproxy (bugproxy) wrote on 2016-08-25: Boot log for failed run, no debug options

#11

Boot log for failed run, no debug options Edit (42.8 KiB, text/plain)

------- Comment on attachment From <email address hidden> 2016-08-25 10:51 EDT-------

Added a boot log showing the hang, no boot options (no debug options).

Revision history for this message

Launchpad Janitor (janitor) wrote on 2016-08-25:

#12

This bug was fixed in the package busybox - 1:1.22.0-19ubuntu2

---------------
busybox (1:1.22.0-19ubuntu2) yakkety; urgency=medium

* debian/patches/readlink-in-slash-bin.patch: put readlink in /bin/
like coreutils. Closes LP: #1615021.

-- Steve Langasek <email address hidden> Tue, 23 Aug 2016 12:36:39 -0700

Changed in busybox (Ubuntu Yakkety):
status:	Fix Committed → Fix Released

Revision history for this message

Launchpad Janitor (janitor) wrote on 2016-08-29:

#13

This bug was fixed in the package systemd - 231-5

---------------
systemd (231-5) unstable; urgency=medium

[ Iain Lane ]
* Let graphical-session-pre.target be manually started (LP: #1615341)

  [ Felipe Sateler ]
  * Add basic version of git-cherry-pick
  * Replace Revert-units-add-a-basic-SystemCallFilter-3471.patch with upstream
    patch
  * sysv-generator: better error reporting. (Closes: #830257)

  [ Martin Pitt ]
  * 73-usb-net-by-mac.rules: Test for disabling 80-net-setup-link.rules more
    efficiently. Stop calling readlink at all and just test if
    /etc/udev/rules.d/80-net-setup-link.rules exists -- a common way to
    disable an udev rule is to just "touch" it in /etc/udev/rule.d/ (i. e.
    empty file), and if the rule is customized we cannot really predict anyway
    if the user wants MAC-based USB net names or not. (LP: #1615021)
  * Ship kernel-install (Closes: #744301)
  * Add debian/extra/kernel-install.d/60-initrd.install.
    This kernel-install drop-in copies the initrd of the selected kernel to
    the EFI partition.
  * bootctl: Automatically detect ESP partition.
    This makes bootctl work with Debian's /boot/efi/ mountpoint without having
    to explicitly specify --path.
    Patches cherry-picked from upstream master.
  * systemd.NEWS: Point out that alternatively rcS scripts can be moved to
    rc[2-5]. Thanks to Petter Reinholdtsen for the suggestion!

  [ Michael Biebl ]
  * Enable iptables support (Closes: #787480)
  * Revert "logind: really handle *KeyIgnoreInhibited options in logind.conf"
    The special 'key handling' inhibitors should always work regardless of
    any *IgnoreInhibited settings – otherwise they're nearly useless.
    Update man pages to clarify that *KeyIgnoreInhibited only apply to a
    subset of locks (Closes: #834148)

-- Martin Pitt <email address hidden> Fri, 26 Aug 2016 10:58:07 +0200

Changed in systemd (Ubuntu Yakkety):
status:	Fix Committed → Fix Released

Revision history for this message

bugproxy (bugproxy) wrote on 2016-08-30: Comment bridged from LTC Bugzilla

#14

------- Comment From <email address hidden> 2016-08-30 16:01 EDT-------
pitti/vorlon, thanks for your suggestions. Unfortunately, I wasn't able to get more information by placing a fixed delay in init script - what I did was to execute in background a little script on the beginning of init that waits for 8 seconds and run the command "udevadm info -e".

Problem is that init seems to not being executed, the issue happens first. I added a simple "echo" command as first thing on init, but never saw the message it should print.

Any more suggestions you have are really appreciated.

Thanks,

Guilherme

Revision history for this message

Andy Whitcroft (apw) wrote on 2016-09-07: Please test proposed package

#15

Hello bugproxy, or anyone else affected,

Accepted systemd into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/229-4ubuntu8 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in systemd (Ubuntu Xenial):
status:	In Progress → Fix Committed
tags:	added: verification-needed

Revision history for this message

bugproxy (bugproxy) wrote on 2016-09-07: Comment bridged from LTC Bugzilla

#16

------- Comment From <email address hidden> 2016-09-07 10:25 EDT-------
Waiting for xenial-proposed installer to be updated. Currently, still shows 2016-09-02.

------- Comment From <email address hidden> 2016-09-07 10:26 EDT-------
Err, should be "2016-08-02" as current content date on xenial-proposed.

Revision history for this message

bugproxy (bugproxy) wrote on 2016-09-08:

#17

------- Comment From <email address hidden> 2016-09-08 07:34 EDT-------
Still waiting for updated installer images. http://ports.ubuntu.com/ubuntu-ports/dists/xenial-proposed/main/installer-ppc64el/current/images/netboot/ubuntu-installer/ppc64el/ still showing images from August.

Revision history for this message

bugproxy (bugproxy) wrote on 2016-09-10:

#18

------- Comment From <email address hidden> 2016-09-10 08:29 EDT-------
The installer images for xenial-proposed have not yet been updated.

Revision history for this message

bugproxy (bugproxy) wrote on 2016-09-10: screen session output

#19

screen session output Edit (416.0 KiB, text/plain)

Default Comment by Bridge

Revision history for this message

bugproxy (bugproxy) wrote on 2016-09-10: init (modified on initrd)

#20

init (modified on initrd) Edit (679 bytes, text/plain)

Default Comment by Bridge

Revision history for this message

bugproxy (bugproxy) wrote on 2016-09-10: start-udev (modified on initrd)

#21

start-udev (modified on initrd) Edit (531 bytes, text/plain)

Default Comment by Bridge

Revision history for this message

bugproxy (bugproxy) wrote on 2016-09-10: NEW screen output

#22

NEW screen output Edit (561.9 KiB, text/plain)

------- Comment (attachment only) From <email address hidden> 2016-08-24 12:39 EDT-------

Revision history for this message

bugproxy (bugproxy) wrote on 2016-09-10: Boot log for failed run, no debug options

#23

Boot log for failed run, no debug options Edit (42.8 KiB, text/plain)

------- Comment on attachment From <email address hidden> 2016-08-25 10:51 EDT-------

Added a boot log showing the hang, no boot options (no debug options).

Revision history for this message

bugproxy (bugproxy) wrote on 2016-09-12: Comment bridged from LTC Bugzilla

#24

------- Comment From <email address hidden> 2016-09-12 08:33 EDT-------
Still no new installer images for xenial-proposed.

Revision history for this message

bugproxy (bugproxy) wrote on 2016-09-12:

#25

------- Comment From <email address hidden> 2016-09-12 11:09 EDT-------
Reopening: xenial-proposed installer has not been updated, cannot verify fix. Set back to fixed/verify once http://ports.ubuntu.com/ubuntu-ports/dists/xenial-proposed installer has been updated with fixed systemd to be tested.

Revision history for this message

Martin Pitt (pitti) wrote on 2016-09-12:

#26

Note, there hasn't been any debian installer fix yet, as we don't even understand what's actually happening there. There has just been an SRU to systemd/udev to fix the "No such file or directory" error message in udev rules, but apparently that was not the actual problem.

Revision history for this message

Martin Pitt (pitti) wrote on 2016-09-12:

#27

I ran the test case for systemd on a 16.04.1 desktop live system with an USB ethernet device. I confirm that naming still works as intended, MAC naming can be disabled with the /dev/null symlink, and the readlink calls are gone.

(Again, note that this was merely the side issue, not the main boot problem here.)

tags:

added: verification-done
removed: verification-needed

Revision history for this message

Breno Leitão (breno-leitao) wrote on 2016-09-12:

#28

Martin,

Per previous comment, I understand that this bug is still not fixed, correct?

Revision history for this message

Martin Pitt (pitti) wrote on 2016-09-12: Re: [Bug 1615021] Re: Unable to network boot Ubuntu 16.04 installer normally on Briggs

#29

Breno Leitão [2016-09-12 20:53 -0000]:
> Per previous comment, I understand that this bug is still not fixed,
> correct?

Yes, as it isn't even understood yet.

Revision history for this message

bugproxy (bugproxy) wrote on 2016-09-13: Comment bridged from LTC Bugzilla

#30

------- Comment From <email address hidden> 2016-09-13 14:16 EDT-------
This bug was opened because of a hang being experienced while booting the Ubuntu 16.04 network installer on Briggs & Stratton machines with their X710 ethernet adapters, using the i40e driver.

During investigation, and problem/mistake was found with systemd but is almost-certainly not the cause of the hang. This fixed systemd was supposedly being made available in xenial-proposed repositories, but so far does not seem to have appeared there.

This bug was placed in "verify" state and it started causing email to be sent several times a day reminding me to verify the fix.Since we don't believe that the "fix to systemd" will fix the hang during the installer boot, and since this new systemd has not been pushed out to the xenial-proposed installer after 6 days, I have taken this bug out of "verify" state by re-opening it.

When there actually is something to be tested, and it has made it's way into the xenial-proposed installer, then this bug can be set back to "verify" and I will test the fix.

------- Comment From <email address hidden> 2016-09-13 14:18 EDT-------
I should also ammend my previous comment by saying, if Canonical has some suggestions of how to gather more information in order to help debug this, they should let us know and we can make test runs for them.

Revision history for this message

Steve Langasek (vorlon) wrote on 2016-09-13: Re: [Bug 1615021] Comment bridged from LTC Bugzilla

#31

On Tue, Sep 13, 2016 at 06:20:49PM -0000, bugproxy wrote:
> During investigation, and problem/mistake was found with systemd but is
> almost-certainly not the cause of the hang.

Agreed.

> This fixed systemd was supposedly being made available in xenial-proposed
> repositories, but so far does not seem to have appeared there.

The systemd package is present in the xenial-proposed repository, but no
updated installer image has yet been produced that includes it.

We have had sufficient verification of the systemd change that it will be
released to xenial users for the general problem; we will also update the
debian-installer images as a matter of course.

Based on the feedback from <email address hidden>, it does not appear that the
buggy udev rule is blocking progress on this bug.

> This bug was placed in "verify" state and it started causing email to be
> sent several times a day reminding me to verify the fix.

I don't know why this would be. Our process generates a single message to
the bug when a package is accepted into the -proposed repository, it does
not send daily reminder messages.

> ------- Comment From <email address hidden> 2016-09-13 14:18 EDT-------
> I should also ammend my previous comment by saying, if Canonical has some
> suggestions of how to gather more information in order to help debug this,
> they should let us know and we can make test runs for them.

My previous suggestion to gpiccoli on IRC was to modify the initrd to dump
the state of the udev database at a point after the hang. I haven't seen
such output attached here; does that mean it's not possible to produce such
results because the kernel hard locks? Currently the only debugging
information I've seen is that the /lib/debian-installer/start-udev script
never returns, but that does not mean the kernel has locked up - it only
shows that udev believes it has not finished processing. I would still like
to see a dump of the udev database at the point of the hang, not just a udev
debug log showing processing up to that point.

Is this problem only reproducible with the X710 ethernet adapter? Is this a
removable ethernet adapter, and have you tested what happens if it's
removed? If it's not removable, have you tested what happens if you
blacklist the i40e driver? The ethernet driver may be a complete red
herring, and the problem may be with something that normally happens after
ethernet driver initialization rather than with the ethernet driver itself.

I would also have asked whether this could be an issue with the console
output being redirected to some different device, but since Guilherme
indicated that the problem appeared to be racy, with boot to the installer
sometimes succeeding, that seems unlikely to be the problem.

If you can reproduce this problem with the cloud image from
<http://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-ppc64el-disk1.img>,
that would present additional debugging opportunities since that uses a
standard Ubuntu initramfs instead of the installer initramfs and will
support various 'break=' options to interrupt the boot and introspect the
system state.

On Tue, Sep 13, 2016 at 06:20:49PM -0000, bugproxy wrote:
> During investigation, and problem/mistake was found with systemd but is
> almost-certainly not the cause of the hang.

Agreed.

> This fixed systemd was supposedly being made available in xenial-proposed
> repositories, but so far does not seem to have appeared there.

The systemd package is present in the xenial-proposed repository, but no
updated installer image has yet been produced that includes it.

We have had sufficient verification of the systemd change that it will be
released to xenial users for the general problem; we will also update the
debian-installer images as a matter of course.

Based on the feedback from gpiccoli@br.ibm.com, it does not appear that the
buggy udev rule is blocking progress on this bug.

> This bug was placed in "verify" state and it started causing email to be
> sent several times a day reminding me to verify the fix.

I don't know why this would be.  Our process generates a single message to
the bug when a package is accepted into the -proposed repository, it does
not send daily reminder messages.

> ------- Comment From dougmill@us.ibm.com 2016-09-13 14:18 EDT-------
> I should also ammend my previous comment by saying, if Canonical has some
> suggestions of how to gather more information in order to help debug this,
> they should let us know and we can make test runs for them.

My previous suggestion to gpiccoli on IRC was to modify the initrd to dump
the state of the udev database at a point after the hang.  I haven't seen
such output attached here; does that mean it's not possible to produce such
results because the kernel hard locks?  Currently the only debugging
information I've seen is that the /lib/debian-installer/start-udev script
never returns, but that does not mean the kernel has locked up - it only
shows that udev believes it has not finished processing.  I would still like
to see a dump of the udev database at the point of the hang, not just a udev
debug log showing processing up to that point.

Is this problem only reproducible with the X710 ethernet adapter?  Is this a
removable ethernet adapter, and have you tested what happens if it's
removed?  If it's not removable, have you tested what happens if you
blacklist the i40e driver?  The ethernet driver may be a complete red
herring, and the problem may be with something that normally happens after
ethernet driver initialization rather than with the ethernet driver itself.

Revision history for this message

Martin Pitt (pitti) wrote on 2016-09-14: Update Released

#32

The verification of the Stable Release Update for systemd has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2016-09-14:

#33

This bug was fixed in the package systemd - 229-4ubuntu8

---------------
systemd (229-4ubuntu8) xenial-proposed; urgency=medium

  * Queue loading transient units after setting their properties. Fixes
    starting VMs with libvirt. (LP: #1529079)
  * Connect pid1's stdin/out/err fds to /dev/null also for containers. This
    fixes generators which expect a valid stdout/err fd in some container
    technologies. (LP: #1608953)
  * 73-usb-net-by-mac.rules: Do not run readlink for *every* uevent, and
    merely check if /etc/udev/rules.d/80-net-setup-link.rules exists.
    A common way to disable an udev rule is to just "touch" it in
    /etc/udev/rule.d/ (i. e. empty file), and if the rule is customized we
    cannot really predict anyway if the user wants MAC-based USB net names or
    not. (LP: #1615021)
  * systemd-networkd-resolvconf-update.service: Also pick up DNS servers from
    individual link leases, as they sometimes don't appear in the global
    ifstate. (LP: #1620559)

-- Martin Pitt <email address hidden> Tue, 06 Sep 2016 14:16:29 +0200

Changed in systemd (Ubuntu Xenial):
status:	Fix Committed → Fix Released

Revision history for this message

bugproxy (bugproxy) wrote on 2016-09-15: Comment bridged from LTC Bugzilla

#34

Download full text (3.7 KiB)

------- Comment From <email address hidden> 2016-09-15 17:13 EDT-------
> On Tue, Sep 13, 2016 at 06:20:49PM -0000, bugproxy wrote:
[...]
> Based on the feedback from <email address hidden>, it does not appear that the
> buggy udev rule is blocking progress on this bug.
>
[...]
> > I should also ammend my previous comment by saying, if Canonical has some
> > suggestions of how to gather more information in order to help debug this,
> > they should let us know and we can make test runs for them.
>
> My previous suggestion to gpiccoli on IRC was to modify the initrd to dump
> the state of the udev database at a point after the hang. I haven't seen
> such output attached here; does that mean it's not possible to produce such
> results because the kernel hard locks? Currently the only debugging
> information I've seen is that the /lib/debian-installer/start-udev script
> never returns, but that does not mean the kernel has locked up - it only
> shows that udev believes it has not finished processing. I would still like
> to see a dump of the udev database at the point of the hang, not just a udev
> debug log showing processing up to that point.
>
> Is this problem only reproducible with the X710 ethernet adapter? Is this a
> removable ethernet adapter, and have you tested what happens if it's
> removed? If it's not removable, have you tested what happens if you
> blacklist the i40e driver? The ethernet driver may be a complete red
> herring, and the problem may be with something that normally happens after
> ethernet driver initialization rather than with the ethernet driver itself.
>
> I would also have asked whether this could be an issue with the console
> output being redirected to some different device, but since Guilherme
> indicated that the problem appeared to be racy, with boot to the installer
> sometimes succeeding, that seems unlikely to be the problem.
>
> If you can reproduce this problem with the cloud image from
> <http://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-
> ppc64el-disk1.img>,
> that would present additional debugging opportunities since that uses a
> standard Ubuntu initramfs instead of the installer initramfs and will
> support various 'break=' options to interrupt the boot and introspect the
> system state.

Vorlon, thanks very much for your assistance. In fact, your ideas were useful and we tried many of them. And finally we seem to have figured what's going on hehehe

Firstly, our bad trials:

i) "udev info -e" was impossible to accomplish in a bad boot, because even if I try to run it as one of the first things in init, the system seems still hangs.

ii) Adding modprobe blacklist to any driver makes things work. In fact, I added the command-line "vorlon" and it worked too hehehe

iii) I wasn't able to test this Cloud image - never installed this before, is it a complete functional image? I wondered if it needs to be write directly on the disk, perhaps...

Other bug subscribers

Remote bug watches

debbugs #819988
[done wishlist d-i patch] Edit

Bug watches keep track of this bug in other bug trackers.

Changed in debian-installer (Ubuntu Yakkety):
status:	Triaged → Invalid
Changed in debian-installer (Ubuntu Xenial):
status:	Triaged → Invalid

Ubuntudebian-installer package

Unable to network boot Ubuntu 16.04 installer normally on Briggs

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
debian-installer package