NetworkManager-wait-online.service in 1.28.0-2ubuntu1 fails to start in LXC

Bug #1914062 reported by Balint Reczey
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
network-manager (Ubuntu)
Fix Released
Undecided
Unassigned
systemd (Ubuntu)
Fix Released
High
Balint Reczey

Bug Description

This regresses systemd's autopkgtest because it expects the system in the container to reach running state, but the system ends up in degraded state due to the service failing.

https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-hirsute/hirsute/amd64/s/systemd/20210112_185712_ff570@/log.gz
...

======================================================================
FAIL: test_no_failed (__main__.ServicesTest)
No failed units
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/autopkgtest.fFC3Lw/build.xLc/real-tree/debian/tests/boot-and-services", line 68, in test_no_failed
    self.assertEqual(failed, [])
AssertionError: Lists differ: ['● NetworkManager-wait-online.service loa[42 chars]ine'] != []

First list contains 1 additional elements.
First extra element 0:
'● NetworkManager-wait-online.service loaded failed failed Network Manager Wait Online'

+ []
- ['● NetworkManager-wait-online.service loaded failed failed Network Manager '
- 'Wait Online']

----------------------------------------------------------------------
Ran 23 tests in 4.435s
...

Reproducible locally by installing n-m from -proposed, then restarting the system in the LXC container.

Revision history for this message
Sebastien Bacher (seb128) wrote :
affects: network-manager (Ubuntu) → systemd (Ubuntu)
tags: added: rls-hh-incoming
Revision history for this message
Sebastien Bacher (seb128) wrote :

The regression isn't due to the network-manager update, could foundation help investigating the systemd side of things?

Revision history for this message
Sebastien Bacher (seb128) wrote :

the log also has that warning that seems new

Invalid unit name "●" escaped as "\xe2\x97\x8f" (maybe you should use systemd-escape?).

Revision history for this message
Balint Reczey (rbalint) wrote :

From the log you linked:

Get:263 http://ftpmaster.internal/ubuntu hirsute-proposed/main amd64 network-manager amd64 1.28.0-2ubuntu1 [2002 kB]

affects: systemd (Ubuntu) → network-manager (Ubuntu)
Changed in network-manager (Ubuntu):
importance: Undecided → High
status: New → Triaged
Revision history for this message
Sebastien Bacher (seb128) wrote :
Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :

There is here a change in behavior in lxc/lxd. Running https://paste.ubuntu.com/p/vz7SXcX3K9/:

On hirsute lxd container:

root@hirsute:~# ./test
access errno 13
path is read only: 0
root@hirsute:~# mount | grep 'sysfs on /sys '
sysfs on /sys type sysfs (rw,relatime)

On focal lxd container:

root@focal:~# ./test
path is read only: 1
root@focal:~# mount | grep 'sysfs on /sys '
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime)

(no idea why there are two mounts in focal)

According to https://systemd.io/CONTAINER_INTERFACE/ , /sys should be mounted read-only?

Revision history for this message
Sebastien Bacher (seb128) wrote :

Investigating the issue, udev fails to enumerate devices in the lxc environment when udev_enumerate_add_match_is_initialized is called. The same test program works in a focal instance.

Discussing the issue on the LXD channel it was raised that the systemd udev changes in 247 could be creating problems.

The network-manager 1.26 version works because the udev support was wrongly disabled, which was fixed in that commit
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/78dc57d8

affects: network-manager (Ubuntu) → systemd (Ubuntu)
Revision history for this message
Sebastien Bacher (seb128) wrote :

Reassigning to systemd but it could be a lxc issue

Revision history for this message
Sebastien Bacher (seb128) wrote :

The example works outside of the container or in a focal instance, commenting l30 makes it work on hirsute

Revision history for this message
Sebastien Bacher (seb128) wrote :

l16 is was creates the issue

Revision history for this message
Sebastien Bacher (seb128) wrote :

downgrading systemd to the focal version in the lxc container fixes the issue

tags: added: fr-1102
tags: removed: rls-hh-incoming
Revision history for this message
Balint Reczey (rbalint) wrote :

@seb128 I tried remounting /sys in systemd but it created other issues.
I've fixed udevd to not start in lxc in systemd, but this is all systemd can do. Please detect running udev in network-manager without assuming that rw /sys implies running udev.

@ LXD devs, please check if the init to start is systemd and if so please mount /sys ro to conform to systemd's https://systemd.io/CONTAINER_INTERFACE/ or sort convince systemd upstream to include LXC in the container interface with rw /sys.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Making /sys read-only will break very very many things that we have no intention of breaking.
LXD is also completely init system agnostic and we have no idea what the init system in a particular container may be.

Changed in lxd (Ubuntu):
status: New → Invalid
Revision history for this message
Balint Reczey (rbalint) wrote :

@stgraber nothing should break honoring https://systemd.io/CONTAINER_INTERFACE/ and we are in constant uphill battle with systemd and other upstreams checking /sys's ro status.

Detecting the init system is easy, please do it in LXD. It may make sense to allow the user to override that and force /sys readonly in LXC config, but then it is up to the user to fix failing services, the user cares about, too.

Changed in lxd (Ubuntu):
status: Invalid → New
Revision history for this message
Balint Reczey (rbalint) wrote :

@stgraber or work with systemd upstream to extend the container API.

Revision history for this message
Stéphane Graber (stgraber) wrote :

This is a systemd/udev bug. We're aware of the CONTAINER_INTERFACE and it being wrong doesn't mean we need to change LXD to make it similarly wrong.

LXD containers need to have udevd running to function properly, so you'll need to undo that change.
If there is a bug in how udevd now behaves, that's what needs to be fixed and we can assist you with that.

Changed in lxd (Ubuntu):
status: New → Won't Fix
Changed in network-manager (Ubuntu):
status: New → Invalid
Changed in systemd (Ubuntu):
assignee: nobody → Ubuntu containers team (ubuntu-lxc)
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

We should probably ignore failure to start NetworkManager-wait-online.service in the autopkgtests for now.

Changed in systemd (Ubuntu):
importance: High → Wishlist
Changed in systemd (Ubuntu):
assignee: Ubuntu containers team (ubuntu-lxc) → Christian Brauner (cbrauner)
Revision history for this message
Balint Reczey (rbalint) wrote :

@stgraber Thank you for assisting with aligning udevs's behaviour with LXD. I've added the tests-in-lxd autopkgtests to ensure catching issues in LXD earlier. I'd be happy to add more tests in systemd's autopkgtest to not let udevd regress in LXD.

@xnox I don't like to idea of letting services be failed in default installs in LXD because this would result bad user experience. @seb128 could you please patch network-manager 1.28 to behave in LXC like it did in 1.26 to let it migrate and not fail in LXC with systemd 247 until the udevd behaviour is fixed in LXC?

Revision history for this message
Sebastien Bacher (seb128) wrote :

> @seb128 could you please patch network-manager 1.28 to behave in LXC like it did in 1.26 to let it migrate and not fail in LXC with systemd 247 until the udevd behaviour is fixed in LXC?

alright, I can temporarily revert the commit that made it use udev under lxc

Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :

Just FTR, note that the change in NM upstream was needed so the network-manager snap was able to correctly detect if udevd is running. We currently build the snap from the source package + some patches including this one. But, the long term target is to be able to create the snap by simply staging the network-manager debs. So, please keep this in mind and try to put back this change as soon as is feasible again, so we do not need to remove the change in the deb and then put it back in the snap.

Revision history for this message
Balint Reczey (rbalint) wrote :

@seb128 please also include the fix for n-m tests failing due to MAC changes to let systemd 247.1-4ubuntu1 migrate.

Changed in network-manager (Ubuntu):
status: Invalid → Confirmed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (5.7 KiB)

This bug was fixed in the package systemd - 247.3-1ubuntu2

---------------
systemd (247.3-1ubuntu2) hirsute; urgency=medium

  [ Stéphane Graber ]
  * Revert the change to udevd service and sockets. They must start in LXC.
    LXD containers do have proper uevent handling and actively send
    uevents for udevd to handle.

systemd (247.3-1ubuntu1) hirsute; urgency=medium

  [ Dan Streetman ]
  * d/p/lp1907306/0001-sd-dhcp-client-don-t-log-timeouts-if-already-expired.patch,
    d/p/lp1907306/0002-sd-dhcp-client-track-dhcp4-t1-t2-expire-times.patch,
    d/p/lp1907306/0003-sd-dhcp-client-add-RFC2131-retransmission-details.patch,
    d/p/lp1907306/0004-sd-dhcp-client-simplify-dhcp4-t1-t2-parsing.patch,
    d/p/lp1907306/0005-sd-dhcp-client-correct-dhcpv4-renew-rebind-retransmi.patch,
    d/p/lp1907306/0006-sd-dhcp-client-correct-retransmission-timeout-to-mat.patch,
    d/p/lp1907306/0007-test-network-increase-wait_online-timeout-to-handle-.patch,
    d/p/lp1907306/0008-sd-dhcp-client-fix-renew-rebind-timeout-calculation-.patch:
    Send correct number of dhcpv4 renew and rebind requests
    (LP: #1907306)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=9454c4cb1b85f6f6945a29b6860e0747432318a1
  * d/t/root-unittests:
    Remove any corrupt journal files (LP: #1881947)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=c4f2a65d53972eec7d4cf46facb9f72e989e3af2

  [ Balint Reczey ]
  * Merge to Ubuntu from Debian unstable
    - Dropped changes:
      * test: use modern qemu numa arguments
  * Switch default hierarchy (back) to hybrid again because snapd is not ready
    yet (LP: 1850667)
    Files:
    - debian/rules
    - debian/systemd.NEWS
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=4976b9474aa3b3b2587bb805472b8c37a4574346
  * Drop reverts that used to keep netplan.io's autopkgtest happy
    Files:
    - debian/patches/Revert-network-fix-assertion-when-link-get-carrier.patch
    - debian/patches/Revert-network-prevent-interfaces-to-be-initialized-multi.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=23340d4608eb9f281ecc47f7356b40f2ac8db540
  * Fall back to device name when net_get_name(device) fails again, dropping
    the patch to skip it
    File: debian/patches/Skip-falling-back-to-device-name-when-net_get_name-device.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=34cfe66296463dcc8ad9ebe07add846dd955fedc
  * Don't start udevd service and sockets in LXC.
    LXC mounts /sys in read-write mode unlike other containers. (LP: #1914062)
    File: debian/patches/debian/UBUNTU-Don-t-start-udevd-service-and-sockets-in-LXC.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=fa63ef6891eff6762509b322429687d4d506bdb2

systemd (247.3-1) unstable; urgency=medium

  [ Michael Biebl ]
  * New upstream version 247.3
  * Rebase patches

  [ Ioanna Alifieraki ]
  * systemctl: return error code when scheduled shutdown fails

systemd (247.2-5) unstable; urgency=medium

  [ Matthias Klumpp ]
  * Configure localed to run locale-gen to generate missing local...

Read more...

Changed in systemd (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Stéphane Graber (stgraber) wrote :

Christian submitted https://github.com/systemd/systemd/pull/18559 which got turned into https://github.com/systemd/systemd/pull/18684 and has now been merged in upstream systemd.

We've both tested the resulting systemd and can confirm that /run/udev is now properly populated.
Please cherry-pick those udevd fixes into Ubuntu's systemd.

Changed in systemd (Ubuntu):
assignee: Christian Brauner (cbrauner) → nobody
status: Fix Released → Triaged
importance: Wishlist → High
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package network-manager - 1.30.0-1ubuntu1

---------------
network-manager (1.30.0-1ubuntu1) hirsute; urgency=medium

  * Resynchronize on Debian, remaining changes:
    - Use systemd-resolved instead of dnsmasq
    - debian/control:
      + Depend on isc-dhcp-client instead of recommends
      + Recommend network-manager-pptp
      + Suggest avahi-autoipd for IPv4LL support
    - debian/rules, debian/network-manager.postinst:
      + Don't restart NetworkManager on upgrade but recommend restarting
        the computer
    - debian/rules, debian/network-manager.postinst:
      + Don't install sysvinit scripts or migrate from sysvinit
    - debian/network-manager.postinst:
      + Don't add the netdev group.
      + drop in an empty override file for NetworkManager to manage all
        devices for upgrade from any version, as long as there is no
        netplan configuration yet.
    - debian/default-wifi-powersave-on.conf, debian/rules:
      + Install a config file to enable WiFi powersave
    - Enable build tests
    - Add autopkgtests
    - debian/source_network-manager.py, debian/network-manager.install,
      debian/network-manager.links: Add apport hook
    - Add network-manager-config-connectivity-ubuntu package
    - NetworkManager.conf: disable MAC randomization feature. There is no
      easy way for desktop users to disable this feature yet. And there are
      reports that it doesn't work well with some systems.
    - Update Vcs links to point to Ubuntu branch
    - Add patches. See patch descriptions for more details:
      + Provide-access-to-some-of-NM-s-interfaces-to-whoopsie.patch
      + Update-dnsmasq-parameters.patch
      + Disable-general-with-expect.patch
      + libnm-Check-self-still-NMManager-or-not.patch
      + dns-manager-don-t-merge-split-DNS-search-domains.patch (but disabled)
      + Read-system-connections-from-run.patch
    - debian/tests/urfkill-integration - don't stop/start network manager
    - debian/patches/ubuntu_revert_systemd.patch:
      + temporarly revert an upstream commit that made udev enabled under lxc,
        the new systemd doesn't work there (lp: #1914062)

 -- Sebastien Bacher <email address hidden> Thu, 25 Feb 2021 15:30:59 +0100

Changed in network-manager (Ubuntu):
status: Confirmed → Fix Released
Balint Reczey (rbalint)
Changed in systemd (Ubuntu):
status: Triaged → In Progress
Revision history for this message
Balint Reczey (rbalint) wrote :

@stgraber Thank you. I'm including the originally proposed patch in the next systemd upload and will switch to v248 when it is out to include the full fix. Most likely final v248 will be out in a few weeks.

Changed in systemd (Ubuntu):
assignee: nobody → Balint Reczey (rbalint)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 247.3-1ubuntu4

---------------
systemd (247.3-1ubuntu4) hirsute; urgency=medium

  [ Dimitri John Ledkov ]
  * d/p/debian/UBUNTU-resolved-Mitigate-DVE-2018-0001-by-retrying-NXDOMAIN-with.patch:
    Patch updated to reduce log level to debug
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=299002546ec2d62e7f0dd7d614ba958fc9df83c2

  [ Dan Streetman ]
  * d/p/lp1906331-sd-event-ref-event-loop-while-in-sd_event_prepare-ot.patch:
    Take event reference while processing (LP: #1906331)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=1bc38abcd3b62d317fcb62b72e26d9cb2e35ccf9
  * d/p/lp1917458-udev-rules-add-rule-to-create-dev-ptp_hyperv.patch:
    Create symlink for hyperv-provided ptp device (LP: #1917458)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=8f1ee790ad66395457ca64cb5f8a01fdd8aabe47

  [ Balint Reczey ]
  * Pick proposed patch for not returning early in udevadm (LP: #1914062)
    File: debian/patches/lp1914062-udevadm-don-t-return-early.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=d8c80751a97b0c6c4df972f6f8325293aa1607c4
  * debian/tests/control: Mark systemd-fsckd flaky again.
    As promised in LP: 1915126, until further investigation.
    File: debian/tests/control
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=68fbaab272af81aab29497f7c6a3e4e6e9aa091b

 -- Balint Reczey <email address hidden> Thu, 04 Mar 2021 12:19:05 +0100

Changed in systemd (Ubuntu):
status: In Progress → Fix Released
Mathew Hodson (mhodson)
no longer affects: lxd (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.