ufw 0.36.1-3 introduces ordering cycle, breaking network

Bug #1950039 reported by Julian Andres Klode
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Invalid
Undecided
Unassigned
ufw (Ubuntu)
Fix Released
Critical
Unassigned

Bug Description

                                                                                                             [ 2.065178] systemd[1]: systemd-networkd.service: Found ordering cycle on network-pre.target/start
                                                                                                             [ 2.065276] systemd[1]: systemd-networkd.service: Found dependency on ufw.service/start
                                                                                                             [ 2.065356] systemd[1]: systemd-networkd.service: Found dependency on basic.target/start
                                                                                                             [ 2.065422] systemd[1]: systemd-networkd.service: Found dependency on sockets.target/start
                                                                                                             [ 2.065487] systemd[1]: systemd-networkd.service: Found dependency on cloud-init-hotplugd.socket/star
t
                                                                                                             [ 2.065561] systemd[1]: systemd-networkd.service: Found dependency on sysinit.target/start
                                                                                                             [ 2.065626] systemd[1]: systemd-networkd.service: Found dependency on cloud-init.service/start
                                                                                                             [ 2.065700] systemd[1]: systemd-networkd.service: Found dependency on systemd-networkd-wait-online.se
rvice/start
                                                                                                             [ 2.065795] systemd[1]: systemd-networkd.service: Found dependency on systemd-networkd.service/start
                                                                                                             [ 2.065870] systemd[1]: systemd-networkd.service: Job network-pre.target/start deleted to break ordering cycle starting with systemd-networkd.service/start
                                                                                                             [[0;1;31m SKIP [0m] Ordering cycle found, skipping [0;1;39mNetwork (Pre)[0m

Changed in ufw (Ubuntu):
importance: Undecided → Critical
status: New → Triaged
Andy Whitcroft (apw)
tags: added: block-proposed
tags: added: block-proposed-jammy
Revision history for this message
Julian Andres Klode (juliank) wrote :

The cycle occurs in other places to, e.g. basic.target <-> sysinit.target

[ 12.918119] systemd[1]: basic.target: Found ordering cycle on sysinit.target/start
[ 12.919241] systemd[1]: basic.target: Found dependency on cloud-init.service/start
[ 12.920347] systemd[1]: basic.target: Found dependency on systemd-networkd-wait-online.service/start
[ 12.921679] systemd[1]: basic.target: Found dependency on systemd-networkd.service/start
[ 12.922893] systemd[1]: basic.target: Found dependency on network-pre.target/start
[ 12.924001] systemd[1]: basic.target: Found dependency on ufw.service/start
[ 12.925013] systemd[1]: basic.target: Found dependency on basic.target/start
[ 12.926055] systemd[1]: basic.target: Job cloud-init.service/start deleted to break ordering cycle starting with basic.target/start

Revision history for this message
Julian Andres Klode (juliank) wrote :

The issue may be the removal of DefaultDependencies=no, but I'm not 100% sure - it may also be the additional Before/Wants

Revision history for this message
Jamie Strandboge (jdstrand) wrote (last edit ):

Fyi, the current configuration is the same as firewalld upstream and what is in Debian. Moreover it is following systemd documentation for firewall software so I wonder if the change simply uncovered a latent bug....

Fyi, I won't be able to look at this for a while so if you need to back it out, please do an ubuntu1 upload (though it would be great if someone more familiar with systemd-networkd thought through my latent bug comment).

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I mention firewalld cause while ufw could be reverted, firewalld users would presumably also hit it, as well as any other software that does it. If the ufw change is reverted, IME someone should audit the archive for other occurrences of this pattern and update the units accordingly).

Revision history for this message
Jamie Strandboge (jdstrand) wrote (last edit ):

Also, to be clear, when I say I can't look at the ufw portions 'for a while', I mean ~10 days (doing this from my phone).

Thinking about this, my thinking is this is less about the Before/Wants on network-pre and the removal of DefaultDependencies and more about Before=network being removed (with perhaps nothing else doing that? ie, I don't think this an ufw bug; I think the change uncovered something).

Revision history for this message
Jamie Strandboge (jdstrand) wrote (last edit ):

@juliank - where did you see these errors? I booted with a freshly created autopkgtest jammy vm, installed the package from proposed and it worked fine.

Please see my previous comments-- this does not seem to be a bug in ufw since it is using the documented unit setup that systemd recommends for firewall software (and that other firewall software use, such as firewalld) and this has been in Debian for some time now with no bug reports (indeed, it solved issues). Your initial report shows that lots of other units have the ordering cycle issue that you mentioned so I'm not sure why ufw would be singled out.

So we're all on the same page, this was the change:

-DefaultDependencies=no
-Before=network.target
+Before=network-pre.target
+Wants=network-pre.target

and I'll add this from debian/changelog:
+ - use Before and Wants on network-pre.target. Per systemd documentation,
+ "network-pre.target is a target that may be used to order services
+ before any network interface is configured. Its primary purpose is for
+ usage with firewall services". Because network-pre.target is a passive
+ unit, "services that want to be run before the network is configured
+ should place Before=network-pre.target and also set
+ Wants=network-pre.target to pull it in"
+ - remove DefaultDependencies=no so that we pull in default dependencies
+ for "basic system initialization". While ufw is meant to come up before
+ networking, there is no reason why it shouldn't come up after sysinit.
+ This should help make ufw startup more robust on systems that need
+ something from sysinit.

The ufw unit itself does very little unless ufw is enabled since /lib/ufw/ufw-init exits very quickly when it is not enabled. As such, it seems to me that the ufw upload may have uncovered a latent issue in our early boot (but that wouldn't be a bug in ufw itself) where Ubuntu may not be supporting the documented behavior for network-pre.target.

Finally, it has been a couple of months since this report; is it possible to rerun wherever this was run to see if it is still an issue (as mentioned, no bug reports in Debian and so perhaps things floated in that resolved this (indeed, systemd itself went from 248 to 249))? I would rerun autopkgtests, but they all have passed.

Changed in ufw (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Julian Andres Klode (juliank) wrote (last edit ):

This broke the autopkgtest cloud. Networking did not came up and hence no tests passed anymore. We absolutely must not let this pass until we are sure the problem is resolved, as rebuilding images without it is hard. (so setting it to incomplete is fairly dangerous).

Debian does not use cloud-init usually, so it hardly matters if it works there. I don't care who's to blame here, but this needs to be resolved.

Changed in ufw (Ubuntu):
status: Incomplete → New
Revision history for this message
Julian Andres Klode (juliank) wrote :

I think we can rebuild images on the staging autopkgtest instance with the packages from proposed and run a couple hundred tests there and see if it still breaks.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :
Download full text (4.5 KiB)

@juliank - note I wasn't so much talking about 'blame' as much as understanding, so I apologize if it came across that way. Since I wasn't able to reproduce, I was trying to reason through my thoughts to help the discussion go further since I'm not able to diagnose it myself.

In a nutshell, I have concerns that the ufw service has a side effect that somewhere else in the system is dependent on. That other part of the system should be setup to work without ufw in the mix. I'm also concerned that users might face issues if ufw is purged or if other similarly configured software is installed (eg, firewalld).

With that in mind, it seems odd that a service that does nearly nothing by default would affect the system by having a Before/Wants on network-pre.target.

It also seems odd that going from very little dependencies (DefaultDependencies=no) to have only those for 'basic system initialization' would be a problem since those are not related to networking, etc. Eg, in today's autopkgtest jammy instance that I created with `autopkgtest-buildvm-ubuntu-cloud -r jammy` and rebooting with the proposed -3 of ufw installed:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu Jammy Jellyfish (development branch)
Release: 22.04
Codename: jammy

$ cat /proc/version_signature
Ubuntu 5.13.0-19.19-generic 5.13.14

$ systemctl list-dependencies ufw.service
ufw.service
● ├─system.slice
● ├─network-pre.target
● └─sysinit.target
● ├─apparmor.service
● ├─dev-hugepages.mount
● ├─dev-mqueue.mount
● ├─keyboard-setup.service
● ├─kmod-static-nodes.service
● ├─multipathd.service
● ├─plymouth-read-write.service
○ ├─plymouth-start.service
● ├─proc-sys-fs-binfmt_misc.automount
● ├─setvtrgb.service
● ├─sys-fs-fuse-connections.mount
● ├─sys-kernel-config.mount
● ├─sys-kernel-debug.mount
● ├─sys-kernel-tracing.mount
● ├─systemd-ask-password-console.path
○ ├─systemd-binfmt.service
○ ├─systemd-boot-system-token.service
● ├─systemd-journal-flush.service
● ├─systemd-journald.service
○ ├─systemd-machine-id-commit.service
● ├─systemd-modules-load.service
○ ├─systemd-pstore.service
● ├─systemd-random-seed.service
● ├─systemd-sysctl.service
● ├─systemd-sysusers.service
● ├─systemd-timesyncd.service
● ├─systemd-tmpfiles-setup-dev.service
● ├─systemd-tmpfiles-setup.service
● ├─systemd-udev-trigger.service
● ├─systemd-udevd.service
● ├─systemd-update-utmp.service
● ├─cryptsetup.target
● ├─local-fs.target
● │ ├─-.mount
● │ ├─boot-efi.mount
○ │ ├─systemd-fsck-root.service
● │ └─systemd-remount-fs.service
● ├─swap.target
● └─veritysetup.target

Seeing what depends on ufw, there is very little:
$ systemctl list-dependencies ufw.service --reverse
ufw.service
● └─multi-user.target
● └─graphical.target

I can also say that nothing in this VM depends on network-pre other than ufw:
$ systemctl list-dependencies --reverse network-pre.target
network-pre.target
● └─ufw.service

and that there is not much depending on network.target:
$ systemctl list-dependencies --reverse network.target
network.target
○ ├─netplan-ovs-cleanup.service
● └─systemd-networkd.service

Rebooting with ...

Read more...

Revision history for this message
Jamie Strandboge (jdstrand) wrote (last edit ):

Attached are two 'systemd-analyze plot's for the autopktest jammy system with cloud-init and ufw installed. plot-2.svg is for booting the system with 0.36.1-2 (current jammy) and plot-3.svg is 0.36.1-3 (proposed jammy). Notice how plot-2.svg, ufw and systemd-networkd start quite a bit earlier than in plot-3.svg.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :
Revision history for this message
Jamie Strandboge (jdstrand) wrote :
Revision history for this message
Julian Andres Klode (juliank) wrote :

I don't believe your reproducer is valid - cloud-init is not installed anymore, as autopkgtest-buildvm-ubuntu-cloud removes it when building the VM, whereas it remains on the cloud images, as it's needed there to actually get the IP address during boot.

Removing the DefaultDependencies=no means that ufw.service is After=basic.target After=sysinit.target. cloud-init.service has Before=sysinit.target, so would have to run before ufw.service.

It is After=systemd-networkd-wait-online.service though which is After=systemd-networkd which is After=network-pre.target.

So there's the cycle, we need to run cloud-init After=network-pre (hence After ufw), but determined before that cloud-init needs to run before sysinit.target, which needs to run before ufw.service.

I can't reproduce the issue in a VM right now, though, because um, cloud-init does not seem to start at all during second boot despite seemingly being enabled.

Revision history for this message
Julian Andres Klode (juliank) wrote :

I have added a cloud-init task, to draw cloud-init people in to help get this resolved.

Revision history for this message
Julian Andres Klode (juliank) wrote :

Though arguably I'd expect this to be fixed by removing DefaultDependencies again, if I looked at this correctly.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

> I don't believe your reproducer is valid - cloud-init is not installed anymore, as autopkgtest-buildvm-ubuntu-cloud removes it when building the VM, whereas it remains on the cloud images, as it's needed there to actually get the IP address during boot.

Note, in https://bugs.launchpad.net/ubuntu/+source/ufw/+bug/1950039/comments/9 I installed cloud-init and did some analysis also (but see below).

> Though arguably I'd expect this to be fixed by removing DefaultDependencies again, if I looked at this correctly.

Seems likely, though this change was done to fix an issue people were seeing on stack exchange for Debian/Ubuntu systems related to a race between encrypted filesystems and ufw. I guess I could add back DefaultDependencies=no and add After=local-fs.target, but I'm not sure what this would do in practice since local-fs.target is so close to the end of sysinit anyway (but see below).

In 0.36.1-2, ufw has:
DefaultDependencies=no
Before=network.target

In 0.36.1-3, ufw has (no DefaultDependencies=no):
Before=network-pre.target
Wants=network-pre.target

cloud-init has (among other things):
Before=sysinit.target
Before=network-pre.target
Wants=network-pre.target

AIUI, with 0.36.1-2, ufw will tend to start right away due to DefaultDependencies=no and so will cloud-init so long as it finishes before sysinit. ufw need only finish before network.target, which is after network-pre.target. Eg, ufw and cloud-init race to complete but otherwise their dependencies directly don't affect each other.

With 0.36.1-3, cloud-init starts early and before ufw since it must finish before sysinit.target and ufw cannot start until after sysinit.target is done. Because both must finish before network-pre.target, this pushes network-pre.target after sysinit (and of course, ufw), but other than that, there shouldn't be a problem since we have:

 1. cloud-init starts / finishes
 2. sysinit starts / finishes
 3. ufw starts / finishes
 4. network-pre reached
 5. systemd-networkd starts / finishes
 6. network reached

IME, there is no obvious problem with the dependencies (as they relate to ufw) since cloud-init is allowed to start/finish before sysinit and network-pre just like before. It is just that now network-pre is guaranteed to be after sysinit (which from cloud-init's point of view, shouldn't necessarily be a concern). It is also guaranteed to be after ufw but, unless cloud-init is doing something with ufw such as perhaps enabling ufw and restarting the ufw service, cloud-init shouldn't care cause the ufw service doesn't do anything unless ufw is enabled (and even when it is enabled, it just loads firewall rules).

This makes me want to understand the cloud-init configuration that is in play. Can you share it?

Revision history for this message
Jamie Strandboge (jdstrand) wrote (last edit ):

> This makes me want to understand the cloud-init configuration that is in play. Can you share it?

I'm thinking I should upload:

DefaultDependencies=no
Before=network-pre.target
Wants=network-pre.target local-fs.target
After=local-fs.target

Do you have any objections? This would remove the explicit sysinit from the dependency equation but I think it would otherwise achieve ufw's startup objectives.

Revision history for this message
Julian Andres Klode (juliank) wrote (last edit ):

> cloud-init shouldn't care cause the ufw service doesn't do anything unless ufw is enabled (and even when it is enabled, it just loads firewall rules).

It doesn't care about ufw at all, systemd has noticed a cycle between the units and deleted network-pre.target start job (or cloud-init in #1), causing the network to not come up.

> I'm thinking I should upload:

This sounds like it should fix the issue

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Oh! I missed from the initial report that network-pre was deleted which clears up things considerably on my end (since I wasn't able to reproduce, I didn't see it locally either). :)

Preparing an upload now.

Changed in ufw (Ubuntu):
status: New → Triaged
Changed in cloud-init (Ubuntu):
status: New → Invalid
Revision history for this message
Jamie Strandboge (jdstrand) wrote :
tags: removed: block-proposed block-proposed-jammy
tags: added: block-proposed block-proposed-jammy
tags: removed: block-proposed block-proposed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ufw - 0.36.1-3ubuntu1

---------------
ufw (0.36.1-3ubuntu1) jammy; urgency=medium

  * debian/ufw.service: add back DefaultDependencies=no and instead add
    Wants/After local-fs.target. This will avoid a dependency on sysinit while
    ensuring that filesystems (including cryptsetup) are ready (the reason for
    removing DefaultDependencies=no in the first place). LP: #1950039

 -- Jamie Strandboge <email address hidden> Wed, 05 Jan 2022 15:20:44 +0000

Changed in ufw (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers