avoid service start hang due to random changes

Bug #1787366 reported by Christian Ehrhardt  on 2018-08-16
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
chrony (Debian)
Fix Released
Unknown
chrony (Ubuntu)
Medium
Unassigned
Bionic
Medium
Unassigned

Bug Description

[Impact]

 * backport upstream fix to avoid issues with newer kernels handling of
   getrandom calls

 * The original symptom will appear as a very slow boot

 * The Bionic kernel itself doesn't have the changes yet that made this
   much more common to show up in cloud environments, but when the cosmic
   kernel (>=4.17) will be available as HWE it will also affect Bionic.

[Test Case]

 * The actual testcase is just "start the service", but there is more to
   it

 * The more complex part on the test is the condition under which this
   becomes an issue, which is in low entropy environments.
   Simply depleting the pool with things like "cat /dev/random" isn't
   enough. Most reports we were on booting in google cloud environments.
   I had some luck with just KVM guest on a slow system with the new
   kernels, but it isn't a trivial on/off verification.
   For now the best I can recommend is to use the mainline 4.17 kernels
   from [5] and iterate booting on them, afterwards check the startup
   times (other entropy sensitive cases might be affected as well by
   this).
   On this I had some issues with other slow jobs in my env, so I disabled
   others that showed up in "systemd-analyze critical-chain" until I found
   chrony to be the one that takes long.
   But even that helped only to show a slow in 1/5 cases, not sure yet
   what to do better to recreate.

[Regression Potential]

 * The change itself only adds "one more" case to the conditions that
   let it fall back to urandom. Never the less this can be considered a
   security risk as discussed in the linked mail threads.
   To be sure on that I added security as an extra reviewer on the first
   MP for this before pushing it into any release.
   See [4] for the ack by Seth.
   Other than that

[Other Info]

 * This header tries to be comprehensive, but from the chrony ML entries
   and the Debian bug many further links are available on the backgrounds
   of this

----

Started in a discussion at [1] And eventually finalized in [2] and a commit at [3]

We need to avoid systems hanging due to the long delay on start especially with kernel >=4.17 IIRC.
Since this will soon be released with Cosmic and HWE Kernels for Bionic we don't want cloud instances to suddenly initialize much slower.

TL;DR: The fallback always was to urandom, it just got a new case to do so, which is not being able to deliver enough entropy.

Since this has a rather low but potential security drawback [2] I also will ping the security people to check and [n]ack this.

[1]: https://listengine.tuxfamily.org/chrony.tuxfamily.org/chrony-users/2018/04/msg00036.html
[2]: https://listengine.tuxfamily.org/chrony.tuxfamily.org/chrony-users/2018/05/msg00060.html
[3]: https://git.tuxfamily.org/chrony/chrony.git/commit/?id=7c5bd948bb7e21fa0ee22f29e97748b2d0360319
[4]: https://code.launchpad.net/~paelzer/ubuntu/+source/chrony/+git/chrony/+merge/353232/comments/919347
[5]: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17/

Related branches

This will be in the coming version of chrony, so if we really nack this for security we should ring a bell there and rekindle the discussion.

Changed in chrony (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Changed in chrony (Ubuntu Bionic):
importance: Undecided → Medium

I have added the ubuntu-security team on the MP that I just added for the ack/nack discussion to be had on this.

https://code.launchpad.net/~paelzer/ubuntu/+source/chrony/+git/chrony/+merge/353232

tags: added: bitesize

Acks are in, pushed for Cosmic

Changed in chrony (Ubuntu Bionic):
status: New → Triaged
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package chrony - 3.3-2ubuntu2

---------------
chrony (3.3-2ubuntu2) cosmic; urgency=medium

  * - d/p/lp-1787366-fall-back-to-urandom.patch: avoid hangs when starting
      the service on newer kernels by falling back to urandom.
      (LP: #1787366, Closes: #906276)

 -- Christian Ehrhardt <email address hidden> Thu, 16 Aug 2018 11:48:38 +0200

Changed in chrony (Ubuntu):
status: Triaged → Fix Released

Complete in Cosmic, preparing Bionic fix to be available before any >=4.17 HWE kernels.

description: updated
description: updated
Changed in chrony (Debian):
status: Unknown → Fix Released

Cosmic has the fix, SRU Template ready, MP reviewed, all complete - uploaded to Bionic for review by the SRU team.

Changed in chrony (Ubuntu Bionic):
status: Triaged → In Progress

Hello , or anyone else affected,

Accepted chrony into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/chrony/3.2-4ubuntu4.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in chrony (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
Robie Basak (racb) wrote :

Thank you for checking with the security team. That was going to be my first question :)

I did another install / upgrade test from Proposed to be sure - but there was no issue as expected.
As outlined on the testcases, it is hard to get a reliable positive result.
I tried to deplete the entropy of my system and restarted chrony (4/4).
As well as installing the mainline kernel and restarting the guest a few times (5/5).

In none of these cases it was starting slow - as outlined this isn't a 100% confirmation, but I assume this is as good as we can get.

I'd call it verified, but would ask for 14 instead of the usual 7 days of SRU maturing period.
After all the real crit case will only be when HWE >=4.17 will be released for Bionic and I think that has a bit time as well.

tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic
Download full text (4.5 KiB)

Since this is a bit uncertain due to its nature, some log of the testing.

# INSTALL
# apt install chrony
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  chrony
0 upgraded, 1 newly installed, 0 to remove and 1 not upgraded.
Need to get 203 kB of archives.
After this operation, 509 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 chrony amd64 3.2-4ubuntu4.2 [203 kB]
Fetched 203 kB in 0s (1422 kB/s)
Selecting previously unselected package chrony.
(Reading database ... 39456 files and directories currently installed.)
Preparing to unpack .../chrony_3.2-4ubuntu4.2_amd64.deb ...
Unpacking chrony (3.2-4ubuntu4.2) ...
Processing triggers for ureadahead (0.100.0-20) ...
Setting up chrony (3.2-4ubuntu4.2) ...
Creating '_chrony' system user/group for the chronyd daemon…

Creating config file /etc/chrony/chrony.conf with new version

Creating config file /etc/chrony/chrony.keys with new version
Created symlink /etc/systemd/system/chronyd.service → /lib/systemd/system/chrony.service.
Created symlink /etc/systemd/system/multi-user.target.wants/chrony.service → /lib/systemd/system/chrony.service.
Processing triggers for systemd (237-3ubuntu10.3) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Processing triggers for ureadahead (0.100.0-20) ...

# Upgrade:
The following packages will be upgraded:
  chrony
1 upgraded, 0 newly installed, 0 to remove and 28 not upgraded.
Need to get 203 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 chrony amd64 3.2-4ubuntu4.2 [203 kB]
Fetched 203 kB in 0s (620 kB/s)
(Reading database ... 60023 files and directories currently installed.)
Preparing to unpack .../chrony_3.2-4ubuntu4.2_amd64.deb ...
Unpacking chrony (3.2-4ubuntu4.2) over (3.2-4ubuntu4.1) ...
Processing triggers for ureadahead (0.100.0-20) ...
Setting up chrony (3.2-4ubuntu4.2) ...
locale: Cannot set LC_ALL to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
Processing triggers for systemd (237-3ubuntu10.3) ...
Processing triggers for man-db (2.8.3-2) ...

# The boots I checked were all ok - "systemctl status chronyd" being ok, and "systemd-analyze critical-chain" did not show chrony up at the top.
All looked more or less like this example:
systemctl status chrony
● chrony.service - chrony, an NTP client/server
   Loaded: loaded (/lib/systemd/system/chrony.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2018-08-23 07:19:51 UTC; 10s ago
     Docs: man:chronyd(8)
           man:chronyc(1)
           man:chrony.conf(5)
  Process: 694 ExecStartPost=/usr/lib/chrony/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
  Process: 637 ExecStart=/usr/lib/systemd/scripts/chronyd-starter.sh $DAEMON_OPTS (code=exited, status=0/SUCCESS)
 Main PID: 693 (chronyd)
    Tasks: 1 (limit: 547)
   CGroup: /system.slice/chrony.service
        ...

Read more...

Łukasz Zemczak (sil2100) wrote :

I think it should be safe to land it now.

The verification of the Stable Release Update for chrony has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package chrony - 3.2-4ubuntu4.2

---------------
chrony (3.2-4ubuntu4.2) bionic; urgency=medium

  * d/p/lp-1787366-fall-back-to-urandom.patch: avoid hangs when starting
    the service on newer kernels by falling back to urandom.
    (LP: #1787366, Closes: #906276)

 -- Christian Ehrhardt <email address hidden> Mon, 20 Aug 2018 11:36:18 +0200

Changed in chrony (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.