pollinate fails in many circumstances, cloud-init reports that failure, maas reports node failed deployment

Bug #1554152 reported by Scott Moser on 2016-03-07
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
cloud-init
Medium
Unassigned
cloud-init (Ubuntu)
Medium
Unassigned
pollinate (Ubuntu)
Critical
Dustin Kirkland 
Trusty
Undecided
Unassigned

Bug Description

cloud-init runs pollinate via 'cc_seed_random.py' config job.

Some points
a.) in addition to seeding via pollinate seed_random will seed the random device with data from the datasource if it is provided (azure and openstack provide a random seed for this purpose)
b.) we really want seed_random to run before ssh , so that keys are generated with good entropy in place.
c.) seed_random runs early via 'init_modules' mostly to accomplish 'b'. Unfortunately, network is not guaranteed at this point if the datasource is a 'local' datasource (such as config drive).
e.) in many cases pollinate will not have access to https://entropy.ubuntu.com (due to firewall or disconnected)
f.) in xenial, cloud-init reports events to maas as they occur, and when this module fails, it reports that.
g.) maas marks nodes as failed deployment when cloud-init reports failure

End result, if you dont have access to entropy.ubuntu.com, then you fail deployment.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: cloud-init 0.7.7~bzr1176-0ubuntu1
ProcVersionSignature: Ubuntu 4.4.0-10.25-generic 4.4.3
Uname: Linux 4.4.0-10-generic x86_64
NonfreeKernelModules: ufs qnx4 hfsplus hfs minix ntfs msdos
ApportVersion: 2.20-0ubuntu3
Architecture: amd64
Date: Mon Mar 7 17:30:00 2016
PackageArchitecture: all
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: cloud-init
UpgradeStatus: No upgrade log present (probably fresh install)

Related branches

Scott Moser (smoser) wrote :
tags: added: kanban-cross-team landscape
Andreas Hasenack (ahasenack) wrote :

I think the pollinate job is also ignoring https_proxy.

Andreas Hasenack (ahasenack) wrote :

Sorry, pressed enter too quickly:

root@maaslds:/var/log/maas/proxy# grep entropy access.log
root@maaslds:/var/log/maas/proxy# dig +short entropy.ubuntu.com
91.189.94.10
91.189.94.24
root@maaslds:/var/log/maas/proxy# grep 91.189.94.10 access.log
root@maaslds:/var/log/maas/proxy# grep 91.189.94.24 access.log
root@maaslds:/var/log/maas/proxy#

The proxy recorded no access for the entropy site ("entropy.ubuntu.com")

tags: removed: kanban-cross-team
Dustin Kirkland  (kirkland) wrote :

bzr commit -m '* pollinate, pollinate.1: LP: #1554152
  - change the failure mode of pollinate, so as to more cleanly
    tolerate network failures
  - add a --strict option to re-enable the previous behavior,
    ie, strictly exit non-zero if pollinate fails for any reason
  - we've always promised that pollinate would operate on a best-effort
    basis, improving the prng seeding when possible, but failing
    gracefully when not possible; as such, we've made good on the first
    half of that promise, however, the latter half has proven
    troublesome; this is due to the fact that if pollinate exits
    non-zero, then its callers (cloud-init, maas, etc.) may well
    interpret the behavior strictly as a failure to boot the system,
    when in fact that's not the case; instead, we'll clearly print
    a warning to syslog, and we'll retry the seeding on next pollinate
    service start (e.g. a reboot); moreover, we'll carry a --strict
    flag in the case that users want to opt into the previous behavior' --fixes 'lp:1554152'
Committing to: /srv/media/src/pollinate/pollinate/
modified pollinate
modified pollinate.1
modified debian/changelog
Committed revision 293.

Changed in pollinate (Ubuntu):
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Dustin Kirkland  (kirkland)
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pollinate - 4.14-0ubuntu1

---------------
pollinate (4.14-0ubuntu1) xenial; urgency=medium

  * pollinate, pollinate.1: LP: #1554152
    - change the failure mode of pollinate, so as to more cleanly
      tolerate network failures
    - add a --strict option to re-enable the previous behavior,
      ie, strictly exit non-zero if pollinate fails for any reason
    - we've always promised that pollinate would operate on a best-effort
      basis, improving the prng seeding when possible, but failing
      gracefully when not possible; as such, we've made good on the first
      half of that promise, however, the latter half has proven
      troublesome; this is due to the fact that if pollinate exits
      non-zero, then its callers (cloud-init, maas, etc.) may well
      interpret the behavior strictly as a failure to boot the system,
      when in fact that's not the case; instead, we'll clearly print
      a warning to syslog, and we'll retry the seeding on next pollinate
      service start (e.g. a reboot); moreover, we'll carry a --strict
      flag in the case that users want to opt into the previous behavior

 -- Dustin Kirkland <email address hidden> Tue, 13 Oct 2015 10:16:12 -0700

Changed in pollinate (Ubuntu):
status: Fix Committed → Fix Released
Scott Moser (smoser) on 2016-03-08
Changed in cloud-init:
status: New → Confirmed
Changed in cloud-init (Ubuntu):
status: New → Confirmed
Changed in cloud-init:
importance: Undecided → Medium
Changed in cloud-init (Ubuntu):
importance: Undecided → Medium
Scott Moser (smoser) on 2016-03-09
Changed in cloud-init:
status: Confirmed → Fix Committed
no longer affects: maas
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.7~bzr1182-0ubuntu1

---------------
cloud-init (0.7.7~bzr1182-0ubuntu1) xenial; urgency=medium

  * New upstream snapshot.
    * systemd changes enforcing intended ordering (cloud-init-local.service
      before networking and cloud-init.service before it comes up).
    * when reading dmidecode data, return found but unset value as "" rather
      than failing to decode that value.
    * add default user to 'lxd' group and create groups when necessary
      (LP: #1539317)
    * No longer run pollinate in seed_random (LP: #1554152)
    * Enable BigStep data source.

 -- Scott Moser <email address hidden> Mon, 14 Mar 2016 09:58:56 -0400

Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Released

Hello Scott, or anyone else affected,

Accepted pollinate into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/pollinate/4.21-0ubuntu1~14.04 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in pollinate (Ubuntu Trusty):
status: New → Fix Committed
tags: added: verification-needed
Scott Moser (smoser) wrote :

This is fixed in cloud-init 0.7.7

Changed in cloud-init:
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (4.1 KiB)

This bug was fixed in the package pollinate - 4.21-0ubuntu1~14.04

---------------
pollinate (4.21-0ubuntu1~14.04) trusty-proposed; urgency=medium

  [ Dustin Kirkland ]
  * pollinate:
    - fix broken printing of binary data, this was breaking check_pollen
      nagios scripts on the server

  [ Junien Fridrick ]
  * entropy.ubuntu.com.pem:
    - simplify CA cert to just the DigiCert chain (drop GoDaddy)

pollinate (4.20-0ubuntu1) yakkety; urgency=medium

  * debian/control:
    - drop the anerd references, hasn't existed in basically forever
    - update description
    - add dummy | dh-apparmor dependency to get this building on precise,
      where dh-systemd doesn't exist
    - drop run-one dependency, no longer needed
    - make the bsdutils dependency (for logger) explicit, add epoch
  * debian/rules:
    - use systemd, when possible
  * pollinate:
    - fix breakage on older (trusty, precise) Ubuntu, where logger does not
      support --id=[ID]; check version of bsdutils (provides logger) to
      ensure that it's at least ubuntu wily
    - cloud-init version string
  * debian/pollinate.service, debian/pollinate.upstart:
    - improve the init messages logged

pollinate (4.19-0ubuntu1) yakkety; urgency=medium

  [ Martin Pitt ]
  * debian/pollinate.service: Move installation from network.target to
    multi-user.target. network.target is too early and causes dependency loops
    with e. g. NFS. (LP: #1576333)
  * debian/pollinate.preinst: Clean up old enablement symlink on upgrade. This
    needs to be kept until after 18.04 LTS.

pollinate (4.18-0ubuntu1) yakkety; urgency=medium

  * debian/pollinate.service:
    - move to later in boot, after network starts, but before ssh starts

pollinate (4.17-0ubuntu1) yakkety; urgency=medium

  * debian/pollinate.service:
    - use the right flag file for LP: #1578833

pollinate (4.16-0ubuntu1) yakkety; urgency=medium

  [ Martin Pitt ]
  * Don't run pollinate.service in containers (as containers can't and should
    not write the host's random pool) and when we already have a saved random
    seeds (i. e. only on first boot). (LP: #1578833)
  * Bump Standards-Version to 3.9.8 (no changes needed).

  [ Dustin Kirkland ]
  * pollinate: use timeout(1) to limit curl, related to LP: #1578833

pollinate (4.15-0ubuntu1) xenial; urgency=medium

  * pollinate: LP: #1555362
    - log the right pid

pollinate (4.14-0ubuntu1) xenial; urgency=medium

  * pollinate, pollinate.1: LP: #1554152
    - change the failure mode of pollinate, so as to more cleanly
      tolerate network failures
    - add a --strict option to re-enable the previous behavior,
      ie, strictly exit non-zero if pollinate fails for any reason
    - we've always promised that pollinate would operate on a best-effort
      basis, improving the prng seeding when possible, but failing
      gracefully when not possible; as such, we've made good on the first
      half of that promise, however, the latter half has proven
      troublesome; this is due to the fact that if pollinate exits
      non-zero, then its callers (cloud-init, maas, etc.) may well
      interpret the behavior strictly as a failure to boot the system,
      when in ...

Read more...

Changed in pollinate (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers