pollinate fails in many circumstances, cloud-init reports that failure, maas reports node failed deployment

Bug #1554152 reported by Scott Moser
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
Medium
Unassigned
cloud-init (Ubuntu)
Fix Released
Medium
Unassigned
pollinate (Ubuntu)
Fix Released
Critical
Dustin Kirkland 
Trusty
Fix Released
Undecided
Unassigned

Bug Description

cloud-init runs pollinate via 'cc_seed_random.py' config job.

Some points
a.) in addition to seeding via pollinate seed_random will seed the random device with data from the datasource if it is provided (azure and openstack provide a random seed for this purpose)
b.) we really want seed_random to run before ssh , so that keys are generated with good entropy in place.
c.) seed_random runs early via 'init_modules' mostly to accomplish 'b'. Unfortunately, network is not guaranteed at this point if the datasource is a 'local' datasource (such as config drive).
e.) in many cases pollinate will not have access to https://entropy.ubuntu.com (due to firewall or disconnected)
f.) in xenial, cloud-init reports events to maas as they occur, and when this module fails, it reports that.
g.) maas marks nodes as failed deployment when cloud-init reports failure

End result, if you dont have access to entropy.ubuntu.com, then you fail deployment.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: cloud-init 0.7.7~bzr1176-0ubuntu1
ProcVersionSignature: Ubuntu 4.4.0-10.25-generic 4.4.3
Uname: Linux 4.4.0-10-generic x86_64
NonfreeKernelModules: ufs qnx4 hfsplus hfs minix ntfs msdos
ApportVersion: 2.20-0ubuntu3
Architecture: amd64
Date: Mon Mar 7 17:30:00 2016
PackageArchitecture: all
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: cloud-init
UpgradeStatus: No upgrade log present (probably fresh install)

Related branches

Revision history for this message
Scott Moser (smoser) wrote :
tags: added: kanban-cross-team landscape
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I think the pollinate job is also ignoring https_proxy.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Sorry, pressed enter too quickly:

root@maaslds:/var/log/maas/proxy# grep entropy access.log
root@maaslds:/var/log/maas/proxy# dig +short entropy.ubuntu.com
91.189.94.10
91.189.94.24
root@maaslds:/var/log/maas/proxy# grep 91.189.94.10 access.log
root@maaslds:/var/log/maas/proxy# grep 91.189.94.24 access.log
root@maaslds:/var/log/maas/proxy#

The proxy recorded no access for the entropy site ("entropy.ubuntu.com")

tags: removed: kanban-cross-team
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

bzr commit -m '* pollinate, pollinate.1: LP: #1554152
  - change the failure mode of pollinate, so as to more cleanly
    tolerate network failures
  - add a --strict option to re-enable the previous behavior,
    ie, strictly exit non-zero if pollinate fails for any reason
  - we've always promised that pollinate would operate on a best-effort
    basis, improving the prng seeding when possible, but failing
    gracefully when not possible; as such, we've made good on the first
    half of that promise, however, the latter half has proven
    troublesome; this is due to the fact that if pollinate exits
    non-zero, then its callers (cloud-init, maas, etc.) may well
    interpret the behavior strictly as a failure to boot the system,
    when in fact that's not the case; instead, we'll clearly print
    a warning to syslog, and we'll retry the seeding on next pollinate
    service start (e.g. a reboot); moreover, we'll carry a --strict
    flag in the case that users want to opt into the previous behavior' --fixes 'lp:1554152'
Committing to: /srv/media/src/pollinate/pollinate/
modified pollinate
modified pollinate.1
modified debian/changelog
Committed revision 293.

Changed in pollinate (Ubuntu):
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Dustin Kirkland  (kirkland)
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pollinate - 4.14-0ubuntu1

---------------
pollinate (4.14-0ubuntu1) xenial; urgency=medium

  * pollinate, pollinate.1: LP: #1554152
    - change the failure mode of pollinate, so as to more cleanly
      tolerate network failures
    - add a --strict option to re-enable the previous behavior,
      ie, strictly exit non-zero if pollinate fails for any reason
    - we've always promised that pollinate would operate on a best-effort
      basis, improving the prng seeding when possible, but failing
      gracefully when not possible; as such, we've made good on the first
      half of that promise, however, the latter half has proven
      troublesome; this is due to the fact that if pollinate exits
      non-zero, then its callers (cloud-init, maas, etc.) may well
      interpret the behavior strictly as a failure to boot the system,
      when in fact that's not the case; instead, we'll clearly print
      a warning to syslog, and we'll retry the seeding on next pollinate
      service start (e.g. a reboot); moreover, we'll carry a --strict
      flag in the case that users want to opt into the previous behavior

 -- Dustin Kirkland <email address hidden> Tue, 13 Oct 2015 10:16:12 -0700

Changed in pollinate (Ubuntu):
status: Fix Committed → Fix Released
Scott Moser (smoser)
Changed in cloud-init:
status: New → Confirmed
Changed in cloud-init (Ubuntu):
status: New → Confirmed
Changed in cloud-init:
importance: Undecided → Medium
Changed in cloud-init (Ubuntu):
importance: Undecided → Medium
Scott Moser (smoser)
Changed in cloud-init:
status: Confirmed → Fix Committed
no longer affects: maas
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.7~bzr1182-0ubuntu1

---------------
cloud-init (0.7.7~bzr1182-0ubuntu1) xenial; urgency=medium

  * New upstream snapshot.
    * systemd changes enforcing intended ordering (cloud-init-local.service
      before networking and cloud-init.service before it comes up).
    * when reading dmidecode data, return found but unset value as "" rather
      than failing to decode that value.
    * add default user to 'lxd' group and create groups when necessary
      (LP: #1539317)
    * No longer run pollinate in seed_random (LP: #1554152)
    * Enable BigStep data source.

 -- Scott Moser <email address hidden> Mon, 14 Mar 2016 09:58:56 -0400

Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello Scott, or anyone else affected,

Accepted pollinate into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/pollinate/4.21-0ubuntu1~14.04 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in pollinate (Ubuntu Trusty):
status: New → Fix Committed
tags: added: verification-needed
Revision history for this message
Scott Moser (smoser) wrote :

This is fixed in cloud-init 0.7.7

Changed in cloud-init:
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (4.1 KiB)

This bug was fixed in the package pollinate - 4.21-0ubuntu1~14.04

---------------
pollinate (4.21-0ubuntu1~14.04) trusty-proposed; urgency=medium

  [ Dustin Kirkland ]
  * pollinate:
    - fix broken printing of binary data, this was breaking check_pollen
      nagios scripts on the server

  [ Junien Fridrick ]
  * entropy.ubuntu.com.pem:
    - simplify CA cert to just the DigiCert chain (drop GoDaddy)

pollinate (4.20-0ubuntu1) yakkety; urgency=medium

  * debian/control:
    - drop the anerd references, hasn't existed in basically forever
    - update description
    - add dummy | dh-apparmor dependency to get this building on precise,
      where dh-systemd doesn't exist
    - drop run-one dependency, no longer needed
    - make the bsdutils dependency (for logger) explicit, add epoch
  * debian/rules:
    - use systemd, when possible
  * pollinate:
    - fix breakage on older (trusty, precise) Ubuntu, where logger does not
      support --id=[ID]; check version of bsdutils (provides logger) to
      ensure that it's at least ubuntu wily
    - cloud-init version string
  * debian/pollinate.service, debian/pollinate.upstart:
    - improve the init messages logged

pollinate (4.19-0ubuntu1) yakkety; urgency=medium

  [ Martin Pitt ]
  * debian/pollinate.service: Move installation from network.target to
    multi-user.target. network.target is too early and causes dependency loops
    with e. g. NFS. (LP: #1576333)
  * debian/pollinate.preinst: Clean up old enablement symlink on upgrade. This
    needs to be kept until after 18.04 LTS.

pollinate (4.18-0ubuntu1) yakkety; urgency=medium

  * debian/pollinate.service:
    - move to later in boot, after network starts, but before ssh starts

pollinate (4.17-0ubuntu1) yakkety; urgency=medium

  * debian/pollinate.service:
    - use the right flag file for LP: #1578833

pollinate (4.16-0ubuntu1) yakkety; urgency=medium

  [ Martin Pitt ]
  * Don't run pollinate.service in containers (as containers can't and should
    not write the host's random pool) and when we already have a saved random
    seeds (i. e. only on first boot). (LP: #1578833)
  * Bump Standards-Version to 3.9.8 (no changes needed).

  [ Dustin Kirkland ]
  * pollinate: use timeout(1) to limit curl, related to LP: #1578833

pollinate (4.15-0ubuntu1) xenial; urgency=medium

  * pollinate: LP: #1555362
    - log the right pid

pollinate (4.14-0ubuntu1) xenial; urgency=medium

  * pollinate, pollinate.1: LP: #1554152
    - change the failure mode of pollinate, so as to more cleanly
      tolerate network failures
    - add a --strict option to re-enable the previous behavior,
      ie, strictly exit non-zero if pollinate fails for any reason
    - we've always promised that pollinate would operate on a best-effort
      basis, improving the prng seeding when possible, but failing
      gracefully when not possible; as such, we've made good on the first
      half of that promise, however, the latter half has proven
      troublesome; this is due to the fact that if pollinate exits
      non-zero, then its callers (cloud-init, maas, etc.) may well
      interpret the behavior strictly as a failure to boot the system,
      when in ...

Read more...

Changed in pollinate (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.