runs to early, causes dependency loops

Bug #1576333 reported by Scott Moser on 2016-04-28
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
pollinate (Ubuntu)
Critical
Martin Pitt
Trusty
Undecided
Unassigned
Xenial
Critical
Martin Pitt
Yakkety
Critical
Martin Pitt

Bug Description

The first half of the original bug report got fixed in bug 1578833, but pollinate still runs to early.

SRU INFORMATION
===============
Impact: Causes service failures during boot when using NFS mounts, regression from bug 1578833
Reproducer:
 - sudo apt-get install -y nfs-common pollinate
 - echo "1.2.3.4:/foo /mnt nfs defaults,nofail 0 0" | sudo tee -a /etc/fstab
   (This will start network-online.target during early boot)
 - sudo reboot
 - Confirm that "sudo journalctl -b -p warning" shows a dependency loop, and most probably "systemctl status network-online.target" is not running.
 - Upgrade to the proposed pollinate update, reboot.
 - Confirm that there is no dependency loop any more and "systemctl status network-online.target" is active.
 - Confirm that "systemctl status pollinate" is "enabled" (it will have "start: condition failed", but that is intended).
 - Confirm that /etc/systemd/system/network.target.wants/pollinate.service does not exist any more.

Regression potential: Low. This merely changes when pollinate.service gets activated, and network.target is too early (nothing should actually be started by network.target, this is mostly meant for shutdown). The main thing that can go wrong is that the upgrade still leaves the old /etc/systemd/system/network.target.wants/pollinate.service symlink behind (the maintainer scripts have to clean that up).

Original bug report:

pollinate runs too early on some of the instances i launch. Basically nothing guarantees that it will have network access when it attempts to run.

failure looks something like:
$ lxc launch xenial x1
$ sleep 4
$ lxc exec x1 systemctl status pollinate
● pollinate.service - Seed the pseudo random number generator on first boot
   Loaded: loaded (/lib/systemd/system/pollinate.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Thu 2016-04-28 17:16:03 UTC; 1min 17s ago
  Process: 86 ExecStart=/usr/bin/pollinate (code=exited, status=0/SUCCESS)
 Main PID: 86 (code=exited, status=0/SUCCESS)

Apr 28 17:16:03 ubuntu systemd[1]: Starting Seed the pseudo random number generator on first boot...
Apr 28 17:16:03 ubuntu pollinate[106]: client sent challenge to [https://entropy.ubuntu.com/]
Apr 28 17:16:03 ubuntu pollinate[86]: <13>Apr 28 17:16:03 pollinate[86]: client sent challenge to [https://entropy.ubuntu.com/]
Apr 28 17:16:03 ubuntu pollinate[149]: [432B blob data]
Apr 28 17:16:03 ubuntu pollinate[86]: Apr 28 17:16:03 ubuntu <13>Apr 28 17:16:03 pollinate[86]: WARNING: Network communication failed [0]\n % Total % Received % Xferd Average Speed Time Time Time Current
Apr 28 17:16:03 ubuntu pollinate[86]: Dload Upload Total Spent Left Speed
Apr 28 17:16:03 ubuntu pollinate[86]: [139B blob data]
Apr 28 17:16:03 ubuntu pollinate[86]: 17:16:03.859980 * Closing connection 0
Apr 28 17:16:03 ubuntu pollinate[86]: curl: (6) Could not resolve host: entropy.ubuntu.com
Apr 28 17:16:03 ubuntu systemd[1]: Started Seed the pseudo random number generator on first boot.

This seems like it might work:
# diff -u /lib/systemd/system/pollinate.service.dist /lib/systemd/system/pollinate.service
--- /lib/systemd/system/pollinate.service.dist 2016-04-28 17:19:10.807971336 +0000
+++ /lib/systemd/system/pollinate.service 2016-04-28 17:19:17.839874541 +0000
@@ -2,6 +2,7 @@
 Description=Seed the pseudo random number generator on first boot
 DefaultDependencies=no
 After=sysinit.target
+After=network.target
 Before=ssh.service

 [Service]

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: pollinate 4.15-0ubuntu1 [modified: usr/bin/pollinate]
ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
Uname: Linux 4.4.0-18-generic x86_64
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
Date: Thu Apr 28 16:39:17 2016
PackageArchitecture: all
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
SourcePackage: pollinate
UpgradeStatus: No upgrade log present (probably fresh install)

Scott Moser (smoser) wrote :
Changed in pollinate (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Martin Pitt (pitti) wrote :

> +After=network.target

network.target is mostly just relevant for a correct shutdown order, thus it's irrelevant. You want "After=network-online.target" instead (see man systemd.special). But if you do this, you have to give up on running this during early boot (DefaultDependencies=no) as the networking can only be brought up much later.

While we are on cleaning up the unit, pollinate.service has

[Install]
WantedBy=network.target

while this is harmless (as long as you don't also specify Before=network.target), it's misleading to a reader. Better hook it into multi-user.target.

Martin Pitt (pitti) wrote :

This was actually attempted to get fixed in https://launchpad.net/ubuntu/+source/pollinate/4.18-0ubuntu1, see commit http://bazaar.launchpad.net/~pollinate/pollinate/trunk/revision/306 .

However, when we did this we were missing the "WantedBy=network.target" which will still attempt to start it early, and cause dependency loops when network-online.target gets activated. This can be seen in the recent nfs-utils test regression in https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-yakkety/yakkety/amd64/n/nfs-utils/20160518_225715@/log.gz:

May 18 22:55:25 adt systemd[1]: pollinate.service: Found ordering cycle on pollinate.service/start
May 18 22:55:25 adt systemd[1]: pollinate.service: Found dependency on network-online.target/start
May 18 22:55:25 adt systemd[1]: pollinate.service: Found dependency on network.target/start
May 18 22:55:25 adt systemd[1]: pollinate.service: Found dependency on pollinate.service/start
May 18 22:55:25 adt systemd[1]: pollinate.service: Breaking ordering cycle by deleting job network-online.target/start
May 18 22:55:25 adt systemd[1]: network-online.target: Job network-online.target/start deleted to break ordering cycle starting with pollinate.service/start

We missed that when verifying bug 1578833 as in a cloud image nothing actually pulls in network-online.target. But this is the case with NFS mounts in /etc/fstab.

This can be reproduced with installing pollinate and nfs-common, and adding this to fstab:

   echo "1.2.3.4:/foo /mnt nfs defaults,nofail 0 0" | sudo tee -a /etc/fstab

Changed in pollinate (Ubuntu):
importance: Medium → Critical
assignee: nobody → Martin Pitt (pitti)
status: Confirmed → In Progress
Changed in pollinate (Ubuntu Xenial):
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → Martin Pitt (pitti)
Martin Pitt (pitti) wrote :

Bumping to critical as this is a regression in the recent Xenial SRU.

summary: - runs to early
+ runs to early, causes dependency loops
Martin Pitt (pitti) on 2016-05-19
description: updated
Martin Pitt (pitti) on 2016-05-19
description: updated
Changed in pollinate (Ubuntu Yakkety):
status: In Progress → Fix Committed
Martin Pitt (pitti) on 2016-05-19
Changed in pollinate (Ubuntu Xenial):
status: Triaged → In Progress
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pollinate - 4.18-0ubuntu2

---------------
pollinate (4.18-0ubuntu2) yakkety; urgency=medium

  * debian/pollinate.service: Move installation from network.target to
    multi-user.target. network.target is too early and causes dependency loops
    with e. g. NFS. (LP: #1576333)
  * debian/pollinate.preinst: Clean up old enablement symlink on upgrade. This
    needs to be kept until after 18.04 LTS.

 -- Martin Pitt <email address hidden> Thu, 19 May 2016 09:40:15 +0200

Changed in pollinate (Ubuntu Yakkety):
status: Fix Committed → Fix Released

Hello Scott, or anyone else affected,

Accepted pollinate into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/pollinate/4.18-0ubuntu2~16.04 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in pollinate (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed
Martin Pitt (pitti) wrote :

http://autopkgtest.ubuntu.com/packages/n/nfs-utils/yakkety/amd64/ passes again (same fix in yakkety).

I ran the test case in a VM with the xenial-proposed package and confirm that both network-online.target and pollinate.service run correctly, and there are no dependency loops any more.

tags: added: verification-done
removed: verification-needed

The verification of the Stable Release Update for pollinate has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Martin Pitt (pitti) wrote :

Releasing this early as this is a regression in xenial-updates.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pollinate - 4.18-0ubuntu2~16.04

---------------
pollinate (4.18-0ubuntu2~16.04) xenial-proposed; urgency=medium

  * debian/pollinate.service: Move installation from network.target to
    multi-user.target. network.target is too early and causes dependency loops
    with e. g. NFS. (LP: #1576333)
  * debian/pollinate.preinst: Clean up old enablement symlink on upgrade. This
    needs to be kept until after 18.04 LTS.

 -- Martin Pitt <email address hidden> Thu, 19 May 2016 09:45:48 +0200

Changed in pollinate (Ubuntu Xenial):
status: Fix Committed → Fix Released

Hello Scott, or anyone else affected,

Accepted pollinate into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/pollinate/4.21-0ubuntu1~14.04 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in pollinate (Ubuntu Trusty):
status: New → Fix Committed
tags: removed: verification-done
tags: added: verification-needed
Dustin Kirkland  (kirkland) wrote :

I've tested this on precise, trusty, and xenial and confirm that it works as designed. Thanks.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :
Download full text (4.1 KiB)

This bug was fixed in the package pollinate - 4.21-0ubuntu1~14.04

---------------
pollinate (4.21-0ubuntu1~14.04) trusty-proposed; urgency=medium

  [ Dustin Kirkland ]
  * pollinate:
    - fix broken printing of binary data, this was breaking check_pollen
      nagios scripts on the server

  [ Junien Fridrick ]
  * entropy.ubuntu.com.pem:
    - simplify CA cert to just the DigiCert chain (drop GoDaddy)

pollinate (4.20-0ubuntu1) yakkety; urgency=medium

  * debian/control:
    - drop the anerd references, hasn't existed in basically forever
    - update description
    - add dummy | dh-apparmor dependency to get this building on precise,
      where dh-systemd doesn't exist
    - drop run-one dependency, no longer needed
    - make the bsdutils dependency (for logger) explicit, add epoch
  * debian/rules:
    - use systemd, when possible
  * pollinate:
    - fix breakage on older (trusty, precise) Ubuntu, where logger does not
      support --id=[ID]; check version of bsdutils (provides logger) to
      ensure that it's at least ubuntu wily
    - cloud-init version string
  * debian/pollinate.service, debian/pollinate.upstart:
    - improve the init messages logged

pollinate (4.19-0ubuntu1) yakkety; urgency=medium

  [ Martin Pitt ]
  * debian/pollinate.service: Move installation from network.target to
    multi-user.target. network.target is too early and causes dependency loops
    with e. g. NFS. (LP: #1576333)
  * debian/pollinate.preinst: Clean up old enablement symlink on upgrade. This
    needs to be kept until after 18.04 LTS.

pollinate (4.18-0ubuntu1) yakkety; urgency=medium

  * debian/pollinate.service:
    - move to later in boot, after network starts, but before ssh starts

pollinate (4.17-0ubuntu1) yakkety; urgency=medium

  * debian/pollinate.service:
    - use the right flag file for LP: #1578833

pollinate (4.16-0ubuntu1) yakkety; urgency=medium

  [ Martin Pitt ]
  * Don't run pollinate.service in containers (as containers can't and should
    not write the host's random pool) and when we already have a saved random
    seeds (i. e. only on first boot). (LP: #1578833)
  * Bump Standards-Version to 3.9.8 (no changes needed).

  [ Dustin Kirkland ]
  * pollinate: use timeout(1) to limit curl, related to LP: #1578833

pollinate (4.15-0ubuntu1) xenial; urgency=medium

  * pollinate: LP: #1555362
    - log the right pid

pollinate (4.14-0ubuntu1) xenial; urgency=medium

  * pollinate, pollinate.1: LP: #1554152
    - change the failure mode of pollinate, so as to more cleanly
      tolerate network failures
    - add a --strict option to re-enable the previous behavior,
      ie, strictly exit non-zero if pollinate fails for any reason
    - we've always promised that pollinate would operate on a best-effort
      basis, improving the prng seeding when possible, but failing
      gracefully when not possible; as such, we've made good on the first
      half of that promise, however, the latter half has proven
      troublesome; this is due to the fact that if pollinate exits
      non-zero, then its callers (cloud-init, maas, etc.) may well
      interpret the behavior strictly as a failure to boot the system,
      when in ...

Read more...

Changed in pollinate (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers