So, the root cause is completely clear: dbus.socket starts early, then cloud-init starts which blocks the entire basic.target (early boot) on network operations, thus dbus.service cannot start. nss-resolve already sees dbus.socket and thus can connect (instead of failing fast), and then gets the 25s D-Bus timeout as D-Bus is blocked.
- Moving dbus.service into early boot would be a bold step, and I don't think such a large change is appropriate two weeks before release.
- Rearranging nsswitch.conf and modify it on the fly also sounds like a big no.
- I'd also not like to generally move dbus.socket into late boot, as that would break other services during early boot which queue up a connection to D-Bus.
- So far the cleanest way out of this would be to also make dbus.socket wait for cloud-init.service, as that already blocks dbus.service. I verified that name resolution is then fast again. This also doesn't cause dependency loops as both cloud-init.service and sockets.target run in early boot.
Could you try adding "Before=dbus.socket" to /lib/systemd/system/cloud-init.service and confirm that this helps? (Does for me in a container, but I don't have access to GCE or EC2)
So, the root cause is completely clear: dbus.socket starts early, then cloud-init starts which blocks the entire basic.target (early boot) on network operations, thus dbus.service cannot start. nss-resolve already sees dbus.socket and thus can connect (instead of failing fast), and then gets the 25s D-Bus timeout as D-Bus is blocked.
- Moving dbus.service into early boot would be a bold step, and I don't think such a large change is appropriate two weeks before release.
- Rearranging nsswitch.conf and modify it on the fly also sounds like a big no.
- I'd also not like to generally move dbus.socket into late boot, as that would break other services during early boot which queue up a connection to D-Bus.
- So far the cleanest way out of this would be to also make dbus.socket wait for cloud-init.service, as that already blocks dbus.service. I verified that name resolution is then fast again. This also doesn't cause dependency loops as both cloud-init.service and sockets.target run in early boot.
Could you try adding "Before= dbus.socket" to /lib/systemd/ system/ cloud-init. service and confirm that this helps? (Does for me in a container, but I don't have access to GCE or EC2)