avoid service start hang due to random changes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
chrony (Debian) |
Fix Released
|
Unknown
|
|||
chrony (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Bionic |
Fix Released
|
Medium
|
Unassigned |
Bug Description
[Impact]
* backport upstream fix to avoid issues with newer kernels handling of
getrandom calls
* The original symptom will appear as a very slow boot
* The Bionic kernel itself doesn't have the changes yet that made this
much more common to show up in cloud environments, but when the cosmic
kernel (>=4.17) will be available as HWE it will also affect Bionic.
[Test Case]
* The actual testcase is just "start the service", but there is more to
it
* The more complex part on the test is the condition under which this
becomes an issue, which is in low entropy environments.
Simply depleting the pool with things like "cat /dev/random" isn't
enough. Most reports we were on booting in google cloud environments.
I had some luck with just KVM guest on a slow system with the new
kernels, but it isn't a trivial on/off verification.
For now the best I can recommend is to use the mainline 4.17 kernels
from [5] and iterate booting on them, afterwards check the startup
times (other entropy sensitive cases might be affected as well by
this).
On this I had some issues with other slow jobs in my env, so I disabled
others that showed up in "systemd-analyze critical-chain" until I found
chrony to be the one that takes long.
But even that helped only to show a slow in 1/5 cases, not sure yet
what to do better to recreate.
[Regression Potential]
* The change itself only adds "one more" case to the conditions that
let it fall back to urandom. Never the less this can be considered a
security risk as discussed in the linked mail threads.
To be sure on that I added security as an extra reviewer on the first
MP for this before pushing it into any release.
See [4] for the ack by Seth.
Other than that
[Other Info]
* This header tries to be comprehensive, but from the chrony ML entries
and the Debian bug many further links are available on the backgrounds
of this
----
Started in a discussion at [1] And eventually finalized in [2] and a commit at [3]
We need to avoid systems hanging due to the long delay on start especially with kernel >=4.17 IIRC.
Since this will soon be released with Cosmic and HWE Kernels for Bionic we don't want cloud instances to suddenly initialize much slower.
TL;DR: The fallback always was to urandom, it just got a new case to do so, which is not being able to deliver enough entropy.
Since this has a rather low but potential security drawback [2] I also will ping the security people to check and [n]ack this.
[1]: https:/
[2]: https:/
[3]: https:/
[4]: https:/
[5]: http://
Related branches
- Andreas Hasenack: Approve
- Canonical Server packageset reviewers: Pending requested
- git-ubuntu developers: Pending requested
-
Diff: 73 lines (+51/-0)3 files modifieddebian/changelog (+8/-0)
debian/patches/lp-1787366-fall-back-to-urandom.patch (+42/-0)
debian/patches/series (+1/-0)
- Andreas Hasenack: Approve
- Seth Arnold (community): Approve
- Canonical Server: Pending requested
- git-ubuntu developers: Pending requested
-
Diff: 69 lines (+49/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/lp-1787366-fall-back-to-urandom.patch (+41/-0)
debian/patches/series (+1/-0)
tags: | added: bitesize |
description: | updated |
description: | updated |
Changed in chrony (Debian): | |
status: | Unknown → Fix Released |
This will be in the coming version of chrony, so if we really nack this for security we should ring a bell there and rekindle the discussion.