nginx service fails after libssl update due to low entropy at boot

Bug #1835464 reported by Dietmar May
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nginx (Ubuntu)
Opinion
Undecided
Unassigned
Bionic
Opinion
Undecided
Unassigned
openssl (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Invalid
Undecided
Unassigned

Bug Description

After updating libssl and related packages, nginx will no longer autostart at system boot.

Immediately after boot, nginx.service is in a failed state.

# service nginx status
● nginx.service - A high performance web server and a reverse proxy server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
   Active: failed (Result: timeout) since Fri 2018-08-24 21:27:51 UTC; 32min ago
     Docs: man:nginx(8)

systemd[1]: Starting A high performance web server and a reverse proxy server...
systemd[1]: nginx.service: Start-pre operation timed out. Terminating.
systemd[1]: nginx.service: Failed with result 'timeout'.
systemd[1]: Failed to start A high performance web server and a reverse proxy server.

The service can be manually started after boot.

# service nginx start
# service nginx status
● nginx.service - A high performance web server and a reverse proxy server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2018-08-24 22:02:06 UTC; 2s ago
     Docs: man:nginx(8)
  Process: 2704 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
  Process: 2703 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
 Main PID: 2705 (nginx)
   CGroup: /system.slice/nginx.service
           ├─2705 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
           └─2706 nginx: worker process

systemd[1]: Starting A high performance web server and a reverse proxy server...
systemd[1]: nginx.service: Failed to parse PID from file /run/nginx.pid: Invalid argument
systemd[1]: Started A high performance web server and a reverse proxy server.

This happens on an ARMHF based microcontroller running ubuntu 18.04.2 raspi server distribution with a stock kernel.org 4.9-181 kernel.

Ubuntu repositories are not accessible from the device, so packages are copied to the device, and apt install is used to upgrade them:

apt install --no-install-recommends $dir/updates/system/*.deb | logger 2>&1

The following is a list of packages that, when upgraded, cause the nginx systemd service to fail to autostart at boot.

201,205c201,205
< ii libpython2.7:armhf 2.7.15-4ubuntu4~18.04 armhf Shared Python runtime library (version 2.7)
< ii libpython2.7-minimal:armhf 2.7.15-4ubuntu4~18.04 armhf Minimal subset of the Python language (version 2.7)
< ii libpython2.7-stdlib:armhf 2.7.15-4ubuntu4~18.04 armhf Interactive high-level object-oriented language (standard library, version 2.7)
< ii libpython3.6-minimal:armhf 3.6.8-1~18.04.1 armhf Minimal subset of the Python language (version 3.6)
< ii libpython3.6-stdlib:armhf 3.6.8-1~18.04.1 armhf Interactive high-level object-oriented language (standard library, version 3.6)
---
> ii libpython2.7:armhf 2.7.15~rc1-1ubuntu0.1 armhf Shared Python runtime library (version 2.7)
> ii libpython2.7-minimal:armhf 2.7.15~rc1-1ubuntu0.1 armhf Minimal subset of the Python language (version 2.7)
> ii libpython2.7-stdlib:armhf 2.7.15~rc1-1ubuntu0.1 armhf Interactive high-level object-oriented language (standard library, version 2.7)
> ii libpython3.6-minimal:armhf 3.6.7-1~18.04 armhf Minimal subset of the Python language (version 3.6)
> ii libpython3.6-stdlib:armhf 3.6.7-1~18.04 armhf Interactive high-level object-oriented language (standard library, version 3.6)
225c225
< ii libssl1.1:armhf 1.1.1-1ubuntu2.1~18.04.2 armhf Secure Sockets Layer toolkit - shared libraries
---
> ii libssl1.1:armhf 1.1.0g-2ubuntu4.3 armhf Secure Sockets Layer toolkit - shared libraries
272c272
< ii openssl 1.1.1-1ubuntu2.1~18.04.2 armhf Secure Sockets Layer toolkit - cryptographic utility
---
> ii openssl 1.1.0g-2ubuntu4.3 armhf Secure Sockets Layer toolkit - cryptographic utility
282,283c282,283
< ii python3.6 3.6.8-1~18.04.1 armhf Interactive high-level object-oriented language (version 3.6)
< ii python3.6-minimal 3.6.8-1~18.04.1 armhf Minimal subset of the Python language (version 3.6)
---
> ii python3.6 3.6.7-1~18.04 armhf Interactive high-level object-oriented language (version 3.6)
> ii python3.6-minimal 3.6.7-1~18.04 armhf Minimal subset of the Python language (version 3.6)

nginx is used primarily as an https front-end for web services on the device.

libssl is the core dependency for all of the packages in the group that, when upgraded, causes nginx to fail.

The nginx configuration includes the following SSL settings:

http {
        ##
        # SSL Settings
        ##

        ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
        ssl_prefer_server_ciphers on;
}

server {
  listen 443 ssl;
  ssl_certificate /etc/certs/cert.crt;
  ssl_certificate_key /etc/certs/cert.key;
  ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
  ssl_ciphers HIGH:!aNULL:!MD5;
}

Revision history for this message
Dietmar May (dietmar.may) wrote :

This appears to be due to openssl requests blocking or failing until sufficient entropy is available for random number generation.

The target device is based on the TI AM335X (Sitara) ARM Cortex A8 SOC. The SOC (system on a chip) has a hardware random number generator, which requires a kernel driver to be built.

Though the kernel driver was being loaded, that's not enough for the hardware RNG to be used by the OS.

After installing the rng-tools package, which connects the hardware RNG / kernel driver to the OS layer, entropy at boot went up 100-fold; and nginx started normally at boot.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1835464/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Paul White (paulw2u)
affects: ubuntu → nginx (Ubuntu)
tags: added: bionic
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@dietmar.may yes, having entropy is needed. I understand you are not using ubuntu kernel, but can we double check that the Ubuntu kernel configs do build the driver for random number generator that you need? What is the config option for it?

(such that you could, in theory, switch to an Ubuntu kernel)

Revision history for this message
Thomas Ward (teward) wrote :

Are we 100% certain this is an NGINX bug and not a kernel or OpenSSL bug? If these issues are entirely OpenSSL Entropy based, nginx isnt necessarily going to be where this fix needs to be...

Thomas Ward (teward)
Changed in nginx (Ubuntu):
status: New → Incomplete
Revision history for this message
Robie Basak (racb) wrote :

Tagging regression-update since the claim here is it was as a consequence of the OpenSSL SRU (regardless of where we determine the bug actually is, it still got exposed by that update).

tags: added: regression-update
Thomas Ward (teward)
Changed in nginx (Ubuntu Bionic):
status: New → Incomplete
Revision history for this message
Dietmar May (dietmar.may) wrote :

@xnox

In my case, this is on a TI AM3352 processor. The key config item is:

CONFIG_HW_RANDOM_OMAP=m

TI's docs indicate that the following is important:

CONFIG_CRYPTO_DEV_OMAP_SHAM=y

And these may be related:

CONFIG_CRYPTO_DEV_OMAP_AES=y
CONFIG_CRYPTO_SHA256_ARM=y
CONFIG_CRYPTO_SHA512_ARM=y

In general, for devices having a hardware random number generator, I believe the following are needed:

CONFIG_HW_RANDOM=m
CONFIG_HW_RANDOM_TPM=m

I started by building an ubuntu kernel for this ARM processor; but after some backported kernel patches broke the ubuntu kernel for my device, I switched to the kernel.org stock 4.9 LTS kernel. Incidentally, that's made it easier to get support from driver developers.

Revision history for this message
Dietmar May (dietmar.may) wrote :

@teward

No, I'm not sure whether it's an nginx bug.

openssl packages were updated; nginx package is at the same version.

Basically, it looks like an openssl call that previously succeeded (and probably gave questionable responses) now has become a blocking call that doesn't return until sufficient entropy is available to ensure a reasonably secure random result.

Where before nginx completed in a timely manner, it appears to be now blocking, and failing to start within the systemd timeout period,

If that's the case (which looks likely), then other services which depend on openssl may time out as well. (tomcat with APR comes to mind as one possibility.)

Revision history for this message
Robie Basak (racb) wrote :

I think understand the problem here, but it isn't clear to me that it's a bug in the openssl update either. It is surely normal and expected that regular updates (including security updates) might result in a greater entropy requirement.

It would be nice if we could arrange things to block for longer without failing when blocked on entropy. I'm not sure that increasing timeouts is really going to help though, if the system fundamentally doesn't have a good entropy source.

So it isn't obvious to me where this needs to be fixed, if anywhere at all but the sysadmin in providing the system with no entropy.

Further discussion welcome.

Thomas Ward (teward)
Changed in nginx (Ubuntu):
status: Incomplete → Opinion
Changed in nginx (Ubuntu Bionic):
status: Incomplete → Opinion
Revision history for this message
Dietmar May (dietmar.may) wrote :

@racb

I'm not sure that I would consider it normal or expected, though, for system services to suddenly stop working due to regular updates, and for a server to suddenly become unreachable and unresponsive just because it was updated.

On the other hand, it's certainly not desirable for a system to silently operate with poor entropy and poor encryption quality.

In my case, this is easily resolved due to the hardware RNG on the TI AM335X chip.

However, AFAIK a Raspberry PI does not have a hardware RNG, nor do many embedded processors / systems - meaning they would have low entropy at boot, and rng-tools most likely won't help.

Without looking at any code, here are a few observations.

Does nginx really need to make this blocking call to openssl when the service starts? or only when the first https request is made to the service? That is, if no https request comes in for 2 min, or 10 min, maybe there would be sufficient entropy by then due to system activity.

Does openssl really need to block on initialization until sufficient entropy exists? Or could it defer that until some subsequent call that does actually need adequate entropy? In other words, would moving this blocking behavior to a different function satisfy the security need that led to its implementation, without potentially blocking systemd services at boot time?

Finally, I have a couple of the same devices that do not exhibit this blocking behavior. I'm not sure exactly why, but the difference appears somehow related to the way updates are applied. I've noticed a file '/.rnd' (from memory) which is used and/or generated by openssl. Looks like this file is used as an entropy seed. Once deleted (and the hardware RNG is not used), the nginx systemd service will start blocking and timing out. Attempts to create this file manually using openssl do not allow the nginx service to start successfully at boot.

Maybe the simple fix is to find the right way to create and manage the /.rnd file on devices with low entropy?

Revision history for this message
Seth Arnold (seth-arnold) wrote :

I read through Bionic's systemd-random-seed.service source (src/random-seed/random-seed.c) and didn't see any references to RNDADDTOENTCNT or RNDADDENTROPY, the ioctl(2)s that are used to indicate to the kernel that added entropy should be used for the random(4) device. Maybe they're hidden behind some abstraction layers, but if so, I didn't spot them.

Does anyone know if this is intentional? Or what reasoning might have lead to this decision?

Thanks

Robie Basak (racb)
tags: added: bionic-openssl-1.1
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@seth this was only added very recently

https://github.com/systemd/systemd/commit/26ded55709947d936634f1de0f43dcf88f594621

Not on by default, and services need to order After=systemd-random-seed.service to guarantee initialized random pool.

Low entropy is an issue, Excessive entropy usage is bad, failing units due to low entropy is also bad.

I don't know if nginx unit needs a retry & backoff to attempt starting later, if and when entropy is there.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@Dietmar May (dietmar.may)
All the kernel config options mentioned are enabled, at least in the Ubuntu 19.10 kernel. And i would have expected them to be on in previous releases too, but didn't check.

I do wonder if ubuntu-drivers-common should detect that hw rng device is available and offer to install rng-tools.

Changed in openssl (Ubuntu):
status: New → Invalid
Changed in openssl (Ubuntu Bionic):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.