systemd kmod builtin uses out of date kmod context

Bug #1714505 reported by Dan Streetman on 2017-09-01
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd
Fix Released
Unknown
debian-installer (Ubuntu)
Undecided
Dimitri John Ledkov
Trusty
Undecided
Unassigned
Xenial
Undecided
Unassigned
Zesty
Undecided
Unassigned
Artful
Undecided
Unassigned
debian-installer-utils (Ubuntu)
Critical
Dimitri John Ledkov
Trusty
Undecided
Unassigned
Xenial
Undecided
Unassigned
Zesty
Undecided
Unassigned
Artful
Undecided
Unassigned

Bug Description

[Impact]

udev's rules use a built-in 'kmod' instead of the system modprobe/insmod, and this built-in kmod only validates/refreshes its kmod 'context' every 3 seconds (or longer) during event processing.

However, because other parts of the system rely on udev to load modules correctly, it is not acceptable for it to use an out of date module context. For example, during a system installation:

-the system boots with kernel and initrd with a reduced set of modules, not including nvme module
-udevd starts, and creates its kmod module context, which does not include nvme module
-system installer adds 'block-modules' udeb, which adds nvme module to system
-system installer immediately calls hw-detect->update-dev->udevadm trigger
-udevd sees its kmod module context is not more than 3 seconds old, and does not update it
-udevd rule 80-drivers.rules finds NVMe pci modalias and asks kmod builtin to load matching driver
-udevd kmod builtin does not find NVMe pci modalias because its context is out of date

this results in the system installer complaining to the user that it found no disks, even though there is a NVMe drive in the system, and the nvme module is installed in the system.

The fix is to reload udevadm rules, as per upstream recommendation.

[Test Case]

This is reproducable when trying to install using debian-installer and a preseed file that skips all questions, although not on all systems, since other events can cause udevd to reload all its builtins, or the installer may take longer than 3 seconds to call udevadm trigger after installing the nvme module udeb.

Stale context in udevd is expected, and one is supposed to reload rules via udevadm, which is the fix proposed in debian-installer-utils. Removing the previous second test case that does not reload via udevadm.

[Regression Potential]

Additional calls to udevadm reload will cause all udev rules to be re-read correctly. This may lead to new devices discovered/configured/symlink during d-i installer, which were previously skipped/missed.

Dan Streetman (ddstreet) on 2017-09-01
Changed in systemd (Ubuntu):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Dan Streetman (ddstreet)
Dan Streetman (ddstreet) wrote :

note that the test script in the description must be run as root on a system that does have a nvme drive that isn't in use (because the nvme module can't be removed if any nvme drive is in use).

Dan Streetman (ddstreet) wrote :

slight update to test script, add a check to make sure the nvme module was unloaded - if any nvme partitions are mounted it would cause a false positive result.

#!/bin/bash
MOD_DIR=/lib/modules/$( uname -r )/kernel/drivers/nvme/host
modprobe -rq nvme
mv $MOD_DIR/nvme.ko .
depmod -a
sleep 3
udevadm trigger
sleep 1
mv nvme.ko $MOD_DIR/
depmod -a
grep -q nvme /proc/partitions && echo FAIL nvme driver still loaded unmount all nvme partitions && exit 1
udevadm trigger
sleep 3
grep -q nvme /proc/partitions && echo PASS || echo FAIL

Dan Streetman (ddstreet) wrote :

grr, please note line wrap in above comment; script should be:

#!/bin/bash
MOD_DIR=/lib/modules/$( uname -r )/kernel/drivers/nvme/host
modprobe -rq nvme
mv $MOD_DIR/nvme.ko .
depmod -a
sleep 3
udevadm trigger
sleep 1
mv nvme.ko $MOD_DIR/
depmod -a
grep -q nvme /proc/partitions \
  && echo FAIL nvme driver still loaded unmount all nvme partitions \
  && exit 1
udevadm trigger
sleep 3
grep -q nvme /proc/partitions && echo PASS || echo FAIL

Dimitri John Ledkov (xnox) wrote :

I guess this is related to the fact that udevd main loop re-reads configs at most every 3 seconds. I find that 3 seconds is a really long time these days.

I wonder if the 3s timeout around manager_reload(manager) should be removed in udevd.

Dan Streetman (ddstreet) wrote :

yep, having up to 3 seconds worth of stale system data is too much time. Also, I think the builtins can just validate themselves - for example the kmod builtin can trivially call kmod_validate_resource(ctx) every time it's asked to load a module, and I think the other builtins that implement .validate probably also can self-validate too, and remove the udev_builtin_validate() call entirely.

I have a small draft patch just to the kmod builtin that I'm testing:
https://code.launchpad.net/~ddstreet/+git/systemd/+ref/sf149300

I'll take another look next week to see if it can be expanded to cover all the builtins that do validation, and submit something upstream for discussion.

Changed in systemd:
status: Unknown → New
Dan Streetman (ddstreet) on 2017-09-12
Changed in systemd (Ubuntu Zesty):
assignee: nobody → Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Xenial):
assignee: nobody → Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Trusty):
assignee: nobody → Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Zesty):
status: New → In Progress
Changed in systemd (Ubuntu Xenial):
status: New → In Progress
Changed in systemd (Ubuntu Zesty):
importance: Undecided → Medium
Changed in systemd (Ubuntu Xenial):
importance: Undecided → Medium
Dan Streetman (ddstreet) wrote :

opened upstream merge request 6814:
https://github.com/systemd/systemd/pull/6814

Changed in systemd:
status: New → Fix Released
Dimitri John Ledkov (xnox) wrote :

I think we should fix .udeb installation and the d-i to call `udevadm control --reload` after .udeb installation and before (re-)triggering udev.

Changed in systemd (Ubuntu Artful):
status: In Progress → Won't Fix
no longer affects: systemd (Ubuntu Zesty)
no longer affects: systemd (Ubuntu Trusty)
no longer affects: systemd (Ubuntu Xenial)
Changed in systemd (Ubuntu):
status: In Progress → Won't Fix
no longer affects: systemd (Ubuntu Artful)
Dimitri John Ledkov (xnox) wrote :

Here is a sample patch against debian-installer-utils to try.

Do you need a d-i build for testing? If yes, for which release(s)?

tags: added: patch
Dan Streetman (ddstreet) wrote :

> Changed in systemd:
> status: New → Fix Released

for clarification, this is not Fix Released upstream, it has been rejected as Won't Fix upstream, systemd/udev believes it isn't their problem.

Dan Streetman (ddstreet) wrote :

> Here is a sample patch against debian-installer-utils to try.

yes i think it at least needs to go here, to handle this very specific case. It may be better to put it into the if $TRIGGER section, as I don't think it is needed for only the settle case, but either way should work.

This was reported to me, so I'll ask the reporter to test with this change to verify it fixes for them; I have not been able to reproduce it myself.

Dan Streetman (ddstreet) wrote :

Also, this upstream udev bug has caused trouble before, e.g.:
https://bugs.launchpad.net/ubuntu/+source/hw-detect/+bug/1549456

that does a udev reload too, but only when specific devices are found. Hopefully this change to update-dev will cover more cases where this upstream udev bug will cause failures, but since it's a racy 0-3 second window of stale data that udev uses, and they refuse to fix it upstream, it will be really hard to test for and/or reproduce.

Dan Streetman (ddstreet) wrote :

@xnox, the person that reported this to me and who can reproduce it, tested with a initrd including your update-dev patch and it does fix the problem. Can you go ahead and merge it and SRU please?

Changed in debian-installer-utils (Ubuntu):
status: New → Confirmed
milestone: none → ubuntu-17.09
assignee: nobody → Dimitri John Ledkov (xnox)
importance: Undecided → Critical
Tony (toekneemi) on 2017-09-23
Changed in debian-installer-utils (Ubuntu):
status: Confirmed → New
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package debian-installer-utils - 1.113ubuntu2

---------------
debian-installer-utils (1.113ubuntu2) artful; urgency=medium

  * Make sure udevd internal state is up-to-date before calling trigger &
    settle. This insures that newly anna-installed udev rules and kernel
    modules are loaded by udevd for correct processing of the udev
    rules. LP: #1714505.

 -- Dimitri John Ledkov <email address hidden> Sun, 17 Sep 2017 00:41:44 +0100

Changed in debian-installer-utils (Ubuntu):
status: New → Fix Released
no longer affects: debian-installer (Ubuntu Artful)
no longer affects: debian-installer (Ubuntu Zesty)
no longer affects: debian-installer (Ubuntu Xenial)
no longer affects: debian-installer (Ubuntu Trusty)
no longer affects: systemd (Ubuntu Trusty)
no longer affects: systemd (Ubuntu Xenial)
no longer affects: systemd (Ubuntu Zesty)
Changed in debian-installer (Ubuntu):
milestone: none → ubuntu-17.10
status: New → Fix Committed
Dan Streetman (ddstreet) on 2017-10-23
tags: added: sts-sponsor-ddstreet
Dan Streetman (ddstreet) on 2017-11-29
Changed in debian-installer-utils (Ubuntu Artful):
status: New → Fix Released
Changed in debian-installer-utils (Ubuntu Zesty):
assignee: nobody → Dan Streetman (ddstreet)
Changed in debian-installer-utils (Ubuntu Xenial):
assignee: nobody → Dan Streetman (ddstreet)
Changed in debian-installer-utils (Ubuntu Trusty):
assignee: nobody → Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Artful):
status: New → Won't Fix
description: updated
Dimitri John Ledkov (xnox) wrote :

Whilst we can upload fixes for zesty and trusty; neither of those will have a point release anymore to include a fix for this.

What is the rationale behind requesting a fix for trusty? How do you expect it to get onto an .iso? A customized iso?

Changed in debian-installer-utils (Ubuntu Zesty):
status: New → In Progress
Changed in debian-installer-utils (Ubuntu Xenial):
status: New → In Progress
Changed in debian-installer-utils (Ubuntu Trusty):
assignee: Dan Streetman (ddstreet) → nobody
Changed in debian-installer-utils (Ubuntu Xenial):
assignee: Dan Streetman (ddstreet) → nobody
Changed in debian-installer-utils (Ubuntu Zesty):
assignee: Dan Streetman (ddstreet) → nobody
Changed in debian-installer-utils (Ubuntu Trusty):
status: New → Won't Fix
Changed in debian-installer (Ubuntu Trusty):
status: New → Won't Fix
Changed in debian-installer (Ubuntu Artful):
status: New → Fix Released
Changed in debian-installer (Ubuntu Zesty):
status: New → Won't Fix
no longer affects: debian-installer (Ubuntu Bionic)
Changed in debian-installer (Ubuntu):
assignee: nobody → Dimitri John Ledkov (xnox)
milestone: ubuntu-17.10 → none
status: Fix Committed → Fix Released
no longer affects: debian-installer-utils (Ubuntu Bionic)
no longer affects: systemd (Ubuntu)
no longer affects: systemd (Ubuntu Artful)
no longer affects: systemd (Ubuntu Bionic)
Dan Streetman (ddstreet) wrote :

> What is the rationale behind requesting a fix for trusty? How do you expect it to get onto
> an .iso? A customized iso?

Customers use the initrd provided in our archive, not just isos, e.g.:

http://archive.ubuntu.com/ubuntu/dists/trusty-updates/main/installer-amd64/current/images/netboot/ubuntu-installer/amd64/initrd.gz

And updated isos can be produced by debian-installer, e.g.:

http://archive.ubuntu.com/ubuntu/dists/trusty-updates/main/installer-amd64/current/images/netboot/mini.iso

So are you going to SRU this or should I?

Changed in debian-installer-utils (Ubuntu Trusty):
status: Won't Fix → In Progress
Changed in debian-installer (Ubuntu Trusty):
status: Won't Fix → In Progress
Changed in debian-installer (Ubuntu Zesty):
status: Won't Fix → In Progress
Dan Streetman (ddstreet) wrote :

To clarify - if we don't SRU this, what do we tell Ubuntu users (including Canonical customers) who are encountering this problem on trusty, xenial, or zesty? Sorry, we aren't fixing this, you can't use your automated preseed install. Or, sorry, you'll have to upgrade to artful, or wait for the next xenial point release.

Neither sound very friendly to the Ubuntu user base.

Hello Dan, or anyone else affected,

Accepted debian-installer-utils into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/debian-installer-utils/1.113ubuntu1.17.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-zesty to verification-done-zesty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-zesty. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in debian-installer-utils (Ubuntu Zesty):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-zesty
Changed in debian-installer-utils (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed-xenial
Brian Murray (brian-murray) wrote :

Hello Dan, or anyone else affected,

Accepted debian-installer-utils into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/debian-installer-utils/1.113ubuntu1.16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Dan Streetman (ddstreet) wrote :

in zesty lxd container built initrd without -proposed di-utils and initrd with -proposed di-utils:

ubuntu@build-zesty:~/debian-installer-20101020ubuntu504/build$ make USE_UDEBS_FROM_EXTRA="zesty-security zesty-updates" build_netboot
...
ubuntu@build-zesty:~/debian-installer-20101020ubuntu504/build$ cp dest/netboot/ubuntu-installer/amd64/initrd.gz ~/initrd-fail.gz

ubuntu@build-zesty:~/debian-installer-20101020ubuntu504/build$ make reallyclean
...
ubuntu@build-zesty:~/debian-installer-20101020ubuntu504/build$ make build_netboot
...
ubuntu@build-zesty:~/debian-installer-20101020ubuntu504/build$ cp dest/netboot/ubuntu-installer/amd64/initrd.gz ~/initrd-fixed.gz

person who reported this to me tested both; automated install using their preseed configuration fails on system with nvme drive using initrd-fail.gz, succeeds using initrd-fixed.gz

tags: added: verification-done-zesty
removed: verification-needed-zesty
Dan Streetman (ddstreet) wrote :

reporter noted xenial fixed initrd also works.

tags: added: verification-done-xenial
removed: verification-needed-xenial
tags: added: verification-done
removed: verification-needed
Dimitri John Ledkov (xnox) wrote :

Dan, I'm not talking about hypothetical users who hypothetically are encountering any bugs. Of course, we fix all the things.

But, specifically about this bug and trusty, are there actual users who are encountering this bug on trusty?

Specifically, since the hardware in question was released after trusty. And there will be no more point releases for trusty - meaning users of iso's will not get an update. When I say users of ISOs I mean the server.iso from cdimage.ubuntu.com, not the d-i by-products that are published on archive.ubuntu.com those can be updated. It worries me, that we have users reinstalling trusty today, when xenial has been out for so long, and bionic will be generally available in less than 4 months time.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package debian-installer-utils - 1.113ubuntu1.17.04.1

---------------
debian-installer-utils (1.113ubuntu1.17.04.1) zesty; urgency=medium

  * Make sure udevd internal state is up-to-date before calling trigger &
    settle. This insures that newly anna-installed udev rules and kernel
    modules are loaded by udevd for correct processing of the udev
    rules. LP: #1714505.

 -- Dimitri John Ledkov <email address hidden> Sun, 17 Sep 2017 00:41:44 +0100

Changed in debian-installer-utils (Ubuntu Zesty):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for debian-installer-utils has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Dan Streetman (ddstreet) wrote :

> But, specifically about this bug and trusty, are there actual users who are encountering this
> bug on trusty?

No, the user who reported this does not encounter it with trusty, but that of course doesn't mean it does not exist there.

I am not able to reproduce it on my nvme systems with any release, as this is a specific race condition (with the poorly-designed 3 second window where udev uses stale kernel module info) when using fully automated preseeded install onto a nvme system behind a firewall with a proxy.

> When I say users of ISOs I mean the server.iso from cdimage.ubuntu.com

When installing to multiple systems, manual ISO installation is rarely used (by experienced sysadmins), for obvious reasons.

> It worries me, that we have users reinstalling trusty today

Many users - including large companies - still use trusty. In fact some users/companies still use precise, which is why Precise ESM exists.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package debian-installer-utils - 1.113ubuntu1.16.04.1

---------------
debian-installer-utils (1.113ubuntu1.16.04.1) xenial; urgency=medium

  * Make sure udevd internal state is up-to-date before calling trigger &
    settle. This insures that newly anna-installed udev rules and kernel
    modules are loaded by udevd for correct processing of the udev
    rules. LP: #1714505.

 -- Dimitri John Ledkov <email address hidden> Sun, 17 Sep 2017 00:41:44 +0100

Changed in debian-installer-utils (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.