cloud-init hangs on boot as Python waits for sufficient randomness to start

Bug #1584147 reported by Dan Watkins
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Python
Fix Released
Unknown
cloud-images
Fix Released
High
Unassigned
python3.5 (Debian)
Fix Released
Unknown
python3.5 (Ubuntu)
Fix Released
High
Unassigned

Bug Description

If Python 3.5 starts before the kernel has initialised its RNG, it will hang[0] until there is sufficient entropy for it to randomise dictionary keys.

As cloud-init uses Python 3.5, this means that cloud-init will hang until this happens (meaning that the instance won't be properly initialised).

I have seen this cause a hang of up to 7 minutes on an OpenStack cloud.

Upstream python bug is issue 26839 [1].

[0] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=822431
[1] https://bugs.python.org/issue26839

Revision history for this message
Dan Watkins (oddbloke) wrote :

A partial workaround for this is to set PYTHONHASHSEED in the environment of the cloud-init systemd units, which means that Python won't block at _startup_ for randomness.

Unfortunately, importing the "random" module (which, e.g., the tempfile module does) still seems to cause some sort of hang.

Revision history for this message
Scott Moser (smoser) wrote :

for some reference, i booted a vm with a modified cloud-init-local.service and an additional file /usr/bin/python-import-logger . see below for those files.

The first line in the file below is the 'cp /proc/uptime' that runs before the python program (meant to show the uptime before python starts at all). The thing to note is that PYTHONHASHSEED does improve immediate startup, but 'import random' basically causes a huge delay even if it is set. The reason to point that out is that cloud-init uses tempfile and tempfile imports random.

after boot, catting /run/python-logger.log shows:
4.74 1.23
[4.84] start sys
[4.84] end sys (0.00)
[4.84] start random
[31.20] end random (26.36)
[31.20] start os
[31.20] end os (0.00)
[31.20] start tempfile
[31.22] end tempfile (0.02)
That was with args: /usr/bin/python-import-logger
  PYTHONHASHSEED=0

and

4.68 1.23
[27.13] start sys
[27.13] end sys (0.00)
[27.13] start random
[27.16] end random (0.03)
[27.16] start os
[27.16] end os (0.00)
[27.16] start tempfile
[27.18] end tempfile (0.02)
That was with args: /usr/bin/python-import-logger
  PYTHONHASHSEED=none-set

$ cat /lib/systemd/system/cloud-init-local.service
[Unit]
Description=Initial cloud-init job (pre-networking)
DefaultDependencies=no
Wants=local-fs.target
Wants=network-pre.target
After=local-fs.target
Conflicts=shutdown.target
Before=network-pre.target
Before=shutdown.target

[Service]
Type=oneshot
#Environment=PYTHONHASHSEED=0
ExecStart=/bin/cp /proc/uptime /run/python-logger.log
ExecStart=/usr/bin/python3 /usr/bin/python-import-logger
ExecStart=/usr/bin/cloud-init init --local
ExecStart=/bin/touch /run/cloud-init/network-config-ready
RemainAfterExit=yes
TimeoutSec=0

# Output needs to appear in instance console output
StandardOutput=journal+console

[Install]
WantedBy=cloud-init.target

$ cat /usr/bin/python-import-logger
#!/usr/bin/python3

LOG="/run/python-logger.log"
MODS=("sys", "random", "os", "tempfile")
start_fmt = "[%3.2f] start %s"
end_fmt = "[%3.2f] end %s (%2.2f)"

def read_up():
    with open("/proc/uptime", "r") as ufp:
        return float(ufp.read().partition(" ")[0])

def msg(comment, fp=None):
    if fp:
        fp.write(comment + "\n")
    print(comment)

with open(LOG, "a") as fp:
    s = read_up()
    msg(start_fmt % (s, "baseline"))
    e = read_up()
    msg(end_fmt % (e, "baseline", e - s))
    for name in MODS:
        s = read_up()
        msg(start_fmt % (read_up(), name), fp)
        __import__(name)
        e = read_up()
        msg(end_fmt % (e, name, e - s), fp)

    import sys
    import os
    msg("That was with args: %s" % ' '.join(sys.argv), fp)
    msg(" PYTHONHASHSEED=%s" % os.environ.get("PYTHONHASHSEED", "none-set"), fp)

Revision history for this message
Scott Moser (smoser) wrote :

Just for reference, the same patching was done to xenial daily image and
$ cat /run/python-logger.log
4.23 1.22
[4.40] start sys
[4.40] end sys (0.00)
[4.40] start random
[4.51] end random (0.11)
[4.51] start os
[4.51] end os (0.00)
[4.51] start tempfile
[4.53] end tempfile (0.02)
That was with args: /usr/bin/python-import-logger
  PYTHONHASHSEED=none-set

So both the basic startup hit and the 'import random' hit are new in yakkety.

Scott Moser (smoser)
Changed in cloud-images:
status: New → Confirmed
importance: Undecided → High
Changed in python3.5 (Debian):
status: Unknown → Confirmed
Scott Moser (smoser)
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-init (Ubuntu):
status: New → Confirmed
Changed in python:
status: Unknown → Fix Committed
Revision history for this message
Scott Moser (smoser) wrote :

It appears that python issue 26839 [1] should be fixed in python 3.5.2 under changeset 9de508dc4837 [2] and on tip at f477ce0d2345 [3].

The duplicate issue 26839 [4] I think fixes my issue with 'import random' also.

So, can this be cherry picked to debian in 2.5.1 ? Or would we wait till 2.5.2, its quite painful now in yakkety.

--
[1] https://bugs.python.org/issue25420
[2] https://hg.python.org/cpython/rev/9de508dc4837
[3] https://hg.python.org/cpython/rev/f477ce0d2345
[4] https://bugs.python.org/issue26839

Revision history for this message
Scott Moser (smoser) wrote :

bah. above i clearly meant 3.5.2 rather than 2.5.2.
per https://www.python.org/dev/peps/pep-0478/ 3.5.2 is due
 3.5.2 final: Sunday, June 26, 2016

So i expect that end of June this is fix-released in ubuntu.

Scott Moser (smoser)
Changed in python3.5 (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Changed in python3.5 (Debian):
status: Confirmed → Fix Released
Changed in python:
status: Fix Committed → Fix Released
Revision history for this message
Scott Moser (smoser) wrote :

This is fixed in current yakkety at
$ dpkg-query --show python3.5
python3.5 3.5.1-16

I booted a cloud instance (serial: 20160611-164137) and cloud-init was up in 11 seconds, where previously that was much longer.

Changed in python3.5 (Ubuntu):
status: Confirmed → Fix Released
Changed in cloud-images:
status: Confirmed → Fix Released
no longer affects: cloud-init (Ubuntu)
Revision history for this message
Scott Moser (smoser) wrote :

i also marked the cloud-images task as 'fix-released' as it is now fixed in a daily image and was not broken in any released image (only ever broken in yakkety), so its as released as it can be.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.