NetworkManager takes very long to start, or times out, blocked on RNG

Bug #1622893 reported by Martin Pitt on 2016-09-13
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Auto Package Testing
Undecided
Martin Pitt
gnutls28 (Ubuntu)
Undecided
Martin Pitt
network-manager (Debian)
Fix Released
Unknown
network-manager (Ubuntu)
Undecided
Unassigned

Bug Description

Since a few days ago, NetworkManager.service takes awfully long to start, or even times out on failure and then gets restarted. This happens in a 16.10 desktop amd64 installation in QEMU, or e. g. in the systemd "boot-smoke" autopkgtest where every boot takes > 1:30 minutes due to the NM timeout.

In the journal there is no actual logging from /usr/sbin/NetworkManager yet, just the start timeout.

ProblemType: Bug
DistroRelease: Ubuntu 16.10
Package: network-manager 1.2.2-0ubuntu8
ProcVersionSignature: Ubuntu 4.4.0-9136.55-generic 4.4.16
Uname: Linux 4.4.0-9136-generic x86_64
ApportVersion: 2.20.3-0ubuntu7
Architecture: amd64
CurrentDesktop: i3
Date: Tue Sep 13 10:05:05 2016
EcryptfsInUse: Yes
IfupdownConfig:
 # interfaces(5) file used by ifup(8) and ifdown(8)
 auto lo
 iface lo inet loopback

 source-directory interfaces.d
NetworkManager.conf:
 [main]
 plugins=ifupdown,keyfile,ofono

 [ifupdown]
 managed=false
NetworkManager.state:
 [main]
 NetworkingEnabled=true
 WirelessEnabled=true
 WWANEnabled=true
 WimaxEnabled=true
SourcePackage: network-manager
UpgradeStatus: No upgrade log present (probably fresh install)
nmcli-nm:
 RUNNING VERSION STATE STARTUP CONNECTIVITY NETWORKING WIFI-HW WIFI WWAN-HW WWAN
 running 1.2.2 connected started full enabled enabled enabled enabled enabled

Martin Pitt (pitti) wrote :
Martin Pitt (pitti) wrote :

Same often happens on shutdown.

strace shows what happens:

14:04:49.135589 getrandom(0x7ffdc3857ecf, 1, GRND_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
14:04:49.135620 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 0}, ru_stime={0, 8000}, ru_maxrss=8344, ru_ixrss=0, ru_idrss=0, ru_isrss=0, ru_minflt=509, ru_majflt=17, ru_nswap=0, ru_inblock=4048, ru_oublock=0, ru_msgsnd=0, ru_msgrcv=0, ru_nsignals=0, ru_nvcsw=969, ru_nivcsw=20}) = 0
14:04:49.135640 getrandom("\20@\251\341k\216T!\215\230\23\5\236\r\v~\253\360\321d\370_\263\227\364\376\20;}\314\271\217", 32, 0) = 32
14:05:16.540470 getrandom("FQ\212\356\17\17\17\323\316\230\7\275X1\3557\226\373U4F\\ sHOz\3\244$\226\270", 32, 0) = 32
14:05:16.540565 getrandom("+\262\232k\304\234\375R", 8, 0) = 8

So it's blocking on the getrandom() call. dmesg shows that the RNG pool only gets initialized 30 seconds into the boot.

This happens even before NM gets to write its first "NetworkManager .. is starting" log message.

summary: - NetworkManager takes very log to start, or times out
+ NetworkManager takes very log to start, or times out, blocked on RNG
Martin Pitt (pitti) wrote :

Downgrading libgnutls30 from 3.5.3 (current yakkety) to the previous upstream version 3.5.2 (https://launchpad.net/ubuntu/+source/gnutls28/3.5.2-2ubuntu4/) fixes this. So apparently libgnutls now does a blocking getrandom() on startup, which is rather unfriendly as you cannot use the library during early boot any more.

Changed in gnutls28 (Ubuntu):
status: New → Triaged
Martin Pitt (pitti) wrote :

I tried various workarounds for autopkgtest testbeds -- we *really* don't care for good random numbers there, and this causes awful delays in various tests.

https://github.com/openstack-infra/project-config/blob/master/nodepool/elements/initialize-urandom/static/usr/local/bin/initialize-urandom.py has a very good explanation of how the urandom pool is initialized. Based on that, this unit drop-in speeds up urandom initialization by fake entropy:

# /etc/systemd/system/systemd-random-seed.service.d/fakeentropy.conf
[Service]
ExecStart=/usr/bin/perl -E 'open $$f, "/bin/bash" or die; open $$rnd, ">/dev/random" or die; for ($$i = 0; $$i < 10; ++$$i) {read $$f, $$d, 64; ioctl $$rnd, 0x40085203, pack("ii", 64*8, 64) . $$d}'

Martin Pitt (pitti) on 2016-09-13
Changed in auto-package-testing:
status: New → In Progress
assignee: nobody → Martin Pitt (pitti)
importance: Undecided → Critical
importance: Critical → High
Martin Pitt (pitti) wrote :
Changed in auto-package-testing:
importance: High → Undecided
status: In Progress → Fix Released
summary: - NetworkManager takes very log to start, or times out, blocked on RNG
+ NetworkManager takes very long to start, or times out, blocked on RNG
Martin Pitt (pitti) on 2016-09-16
tags: added: bootspeed regression-release

Another possibility is to split the gnutls random generator initialization to preinitialization (open fds, check for getrandom()), and initialization (read from the previous detected sources). That would allow gnutls to be used on early boot without causing any delay. The drawback would be some performance penalty on multi-threaded environments on the usage of the random generator.

I have patch at: https://gitlab.com/gnutls/gnutls/merge_requests/111/
if you are interested for a review or comments.

Martin Pitt (pitti) wrote :

Nice, thanks Nikos!

Hi Martin,
 Have you tried if the patch resolves the issue?

Changed in network-manager (Debian):
status: Unknown → New
Florian W. (florian-will) wrote :

Concerning the "bootspeed" tag, the NM issue would be less noticable if "rc-local.service" was removed from "After=" in plymouth-quit.service. That way, my boot-to-graphical-login is a few seconds faster because plymouth is only visible for a very very short time instead of waiting up to 8 seconds for the network to come up. I did that using systemd override, and now my boot on kernel 4.4 is almost as fast as it was in 16.04.

Of course, technically the full boot including nmbd etc. still takes the same time because NM is still slow, but I don't really care if nmbd is up when I type my password into lightdm. The only thing that matters to me is that I can type my password and then fire up Chromium, by which time the network generally is ready to use in my case.

But maybe there's a good reason for some people to have rc.local executed (and therefor network ready to use) before starting lightdm.

Changed in network-manager (Debian):
status: New → Fix Released
Martin Pitt (pitti) on 2016-11-14
Changed in network-manager (Ubuntu):
status: New → Invalid
Changed in gnutls28 (Ubuntu):
status: Triaged → In Progress
assignee: nobody → Martin Pitt (pitti)
Martin Pitt (pitti) on 2016-11-14
Changed in gnutls28 (Ubuntu):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :
Download full text (3.1 KiB)

This bug was fixed in the package gnutls28 - 3.5.6-4ubuntu1

---------------
gnutls28 (3.5.6-4ubuntu1) zesty; urgency=medium

  * Merge with Debian. Remaining changes:
    - debian/patches/disable_global_init_override_test.patch: disable failing
      test.
    - debian/patches/add-openssl-test-link.patch: add link for libssl
  * New upstream version avoids getrandom() at initialization which caused
    NetworkManager to hang at boot. (LP: #1622893)

gnutls28 (3.5.6-4) unstable; urgency=medium

  * Pull 40_01_sockets-only-use-gnutls_bye-on-a-valid-socket-sessio.patch
   40_02_gnutls-cli-debug-terminate-sessions-which-cannot-be-.patch from
   upstream git master. The latter fixes a gnutls-cli-debug segfault.
   Closes: #844061

gnutls28 (3.5.6-2) unstable; urgency=low

  * Upload to unstable.
  * Bump libtasn1-6-dev b-d to >= 4.9 to support OIDs with elements that are
    longer than 32-bits. (Upstream GIT commit
    fcdb461e935dbdc0892241a35be7499116f22a67).

gnutls28 (3.5.6-1) experimental; urgency=low

  * New upstream version.
    + Drop superfluous patches (40_gnutls_certificate_set_key_apifixup.diff
      41_Reverted-the-behavior-of-sending-a-status-request-ex.patch).
    + Update symbol file.

gnutls28 (3.5.5-6) unstable; urgency=medium

  * Upload to unstable.

gnutls28 (3.5.5-5) experimental; urgency=medium

  * 41_Reverted-the-behavior-of-sending-a-status-request-ex.patch from
    https://gitlab.com/gnutls/gnutls/merge_requests/128 - Fix compatibility
    issue with GnuTLS 3.3 clients. Closes: #841723
  * Bump symbol dependency info for multiple
    gnutls_certificate_(set|get)_*_key* functions. If
    %GNUTLS_CERTIFICATE_API_V2 is set these functions will return a
    non-negative return code on success instead of 0 for success and negative
    numbers for failure.
  * Add b-d on openssl (for testsuite).

gnutls28 (3.5.5-4) unstable; urgency=medium

  * Upload to unstable.
  * Refresh 40_gnutls_certificate_set_key_apifixup.diff from master branch.

gnutls28 (3.5.5-3) experimental; urgency=medium

  * 40_gnutls_certificate_set_key_apifixup.diff: Fix ABI breakage introduced
    in 3.5.5.

gnutls28 (3.5.5-2) unstable; urgency=medium

  * Upload to unstable.

gnutls28 (3.5.5-1) experimental; urgency=medium

  * New upstream version.
    + Update symbol file.

gnutls28 (3.5.4-2) unstable; urgency=medium

  * Upload to unstable.

gnutls28 (3.5.4-1) experimental; urgency=medium

  * New upstream version.
    + Drop superfluous patches:
      35_gnutls-cli-print-Handshake-was-completed.patch
      36_gnutls-cli-fixed-the-behavior-when-starttls-or-start.patch
      37_openssl-format-fix-from-openconnect.patch
      39_ocsptool-corrected-bug-in-session-establishment.patch
      40_ocsp-corrected-the-comparison-of-the-serial-size-in-.patch
      45_01-tests-enhance-the-DTLS-window-unit-test-to-account-f.patch
      45_02-dtls-ensure-that-the-DTLS-window-doesn-t-get-stalled.patch
      45_03-tests-mini-dtls-record-modified-expected-order-to-ac.patch
      45_04-Import-DTLS-sliding-window-validation-from-OpenConne.patch
    + Update symbol file.
  * Add b-d on softhsm2 for pkcs11 tests.

 -- Martin Pitt <email address hidden> Mon, 14 Nov 201...

Read more...

Changed in gnutls28 (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.