nginx.service: Failed to read PID from file /run/nginx.pid: Invalid argument

Bug #1581864 reported by Simon Déziel
92
This bug affects 17 people
Affects Status Importance Assigned to Milestone
nginx (Debian)
Fix Released
Unknown
nginx (Ubuntu)
Fix Released
Low
Thomas Ward
Xenial
Won't Fix
Low
Unassigned
Bionic
Won't Fix
Low
Unassigned
Cosmic
Won't Fix
Low
Unassigned
Disco
Won't Fix
Low
Unassigned
Eoan
Fix Released
Low
Thomas Ward

Bug Description

[Description]

Nginx logs an error when started on a machine with a single CPU:

systemctl start nginx
systemctl status nginx
● nginx.service - A high performance web server and a reverse proxy server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2016-05-14 19:35:03 UTC; 2s ago
  Process: 1303 ExecStop=/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pid (code=exited, status=0/SUCCESS)
  Process: 1307 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
  Process: 1306 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
 Main PID: 1308 (nginx)
    Tasks: 5 (limit: 512)
   Memory: 3.1M
      CPU: 81ms
   CGroup: /system.slice/nginx.service
           ├─1308 nginx: master process /usr/sbin/nginx -g daemon on; master_process on
           ├─1309 nginx: worker process
           ├─1310 nginx: worker process
           ├─1311 nginx: worker process
           └─1312 nginx: worker process

May 14 19:35:03 ngx systemd[1]: Starting A high performance web server and a reverse proxy server...
May 14 19:35:03 ngx systemd[1]: nginx.service: Failed to read PID from file /run/nginx.pid: Invalid argument
May 14 19:35:03 ngx systemd[1]: Started A high performance web server and a reverse proxy server.

Bumping the number of CPU available makes the error disappear. This is reproducible on VMs and containers.

Lastly, the PID file is properly created and matches the PID of the master process.

[Workaround]

  sudo mkdir /etc/systemd/system/nginx.service.d
  printf "[Service]\nExecStartPost=/bin/sleep 0.1\n" | \
    sudo tee /etc/systemd/system/nginx.service.d/override.conf
  sudo systemctl daemon-reload
  sudo systemctl restart nginx

[Additional information]

# lsb_release -rd
Description: Ubuntu 16.04 LTS
Release: 16.04

# apt-cache policy nginx-core
nginx-core:
  Installed: 1.10.0-0ubuntu0.16.04.1
  Candidate: 1.10.0-0ubuntu0.16.04.1
  Version table:
 *** 1.10.0-0ubuntu0.16.04.1 500
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1.9.15-0ubuntu1 500
        500 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: nginx-core 1.10.0-0ubuntu0.16.04.1
ProcVersionSignature: Ubuntu 4.4.0-22.39-generic 4.4.8
Uname: Linux 4.4.0-22-generic x86_64
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
Date: Sat May 14 19:35:21 2016
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
SourcePackage: nginx
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Simon Déziel (sdeziel) wrote :
Revision history for this message
Simon Déziel (sdeziel) wrote :

It seems to be a race between systemd and nginx. As if systemd was expecting the PID file to be populated before nginx had the time to create it.

Workaround:

 printf "[Service]\nExecStartPost=/bin/sleep 0.1\n" > /etc/systemd/system/nginx.service.d/override.conf
 systemctl daemon-reload

Revision history for this message
Simon Déziel (sdeziel) wrote :

In the above workaround, I forgot the first step (mkdir) so here it is again:

Workaround:

 mkdir /etc/systemd/system/nginx.service.d
 printf "[Service]\nExecStartPost=/bin/sleep 0.1\n" > /etc/systemd/system/nginx.service.d/override.conf
 systemctl daemon-reload

Revision history for this message
Thomas Ward (teward) wrote :

Confirmed on Xenial, with 1-CPU systems, that there may be such a race condition.

Given that the process still starts, I'm marking this as Low.

Changed in nginx (Ubuntu):
assignee: nobody → Thomas Ward (teward)
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
Hristian (hristian-carabulea) wrote :

Oct 16 12:11:48 xxx systemd[1]: Starting A high performance web server and a reverse proxy server...
Oct 16 12:11:48 xxx systemd[1]: nginx.service: Failed to read PID from file /run/nginx.pid: Invalid argument
Oct 16 12:11:48 xxx systemd[1]: Started A high performance web server and a reverse proxy server.

A proposed solution did not work:

sudo mkdir /etc/systemd/system/nginx.service.d

printf "[Service]\nExecStartPost=/bin/sleep 0.1\n" > /etc/systemd/system/
-bash: /etc/systemd/system/: Is a directory

nginx.service.d/override.conf
-bash: nginx.service.d/override.conf: No such file or directory

Revision history for this message
Simon Déziel (sdeziel) wrote :

Hi Hristian, the string after the ">" char should be on a single line so that printf's output is directed to the file /etc/systemd/system/nginx.service.d/override.conf.

Revision history for this message
Alex Badics (alex.badics.bb) wrote :

Hi,

We are also affected, and I've looked into the source. In main(), the ngx_create_pidfile() gets called right after the ngx_daemon() function. ngx_daemon of course forks, and the parent process exits immediately with exit(0). This is a pure race condition between nginx and systemd.

This is not a problem (other than the scary log), because the PIDFile option is not needed for determining the main PID of nginx (systemd can correctly "guess" the main pid), it's only for removing the file when the service exits.

As far as I can tell, this daemonization is not needed for the systemd service use-case. The service should be Type=simple, and 'daemon off'. The standard file handles get redirected by systemd anyway, and non-stop upgrade cannot be used in this case either. (See: http://nginx.org/en/docs/faq/daemon_master_process_off.html )

If this is too much of a change, another workaround is removing the PIDFile option, and adding the line:
ExecStopPost=/bin/rm -f /run/nginx.pid

Regards,
Alex

Revision history for this message
Thomas Ward (teward) wrote :

FYI: If we don't have this being a confirmed issue in Debian, and we don't have a bug report up in Debian, ***please do not add a Debian task for this***. I just checked the Debian bug tracker and this is not reported up in Debian.

Has anyone confirmed this is also an issue in Debian?

Revision history for this message
John Navratil (john-navratil) wrote :

I'm seeing this on a Raspberry Pi Model 3 running Raspbian

uname -a:Linux www 4.9.24+ #993 Wed Apr 26 17:56:54 BST 2017 armv6l GNU/Linux

Revision history for this message
John Navratil (john-navratil) wrote :

Edit to previous comment. It's an R-Pi 2 B+

Changed in nginx (Debian):
importance: Undecided → Unknown
status: New → Unknown
Changed in nginx (Debian):
status: Unknown → New
Revision history for this message
Julian (estrotococoa) wrote :

Thank you!!! Simon Déziel (sdeziel) wrote on 2016-05-14: #3

"In the above workaround, I forgot the first step (mkdir) so here it is again:

Workaround:

 mkdir /etc/systemd/system/nginx.service.d
 printf "[Service]\nExecStartPost=/bin/sleep 0.1\n" > /etc/systemd/system/nginx.service.d/override.conf
 systemctl daemon-reload"

This worked for me in Debian 9, on a Linode's VPS

Revision history for this message
Karl König (kallekoenig) wrote :

Thank you, Simon! The workaround worked for me as well.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I checked and this is still happening across the board: xenial, bionic, disco. Didn't check eoan yet, which has a new major version (1.16.0), my eoan cloud image is still downloading.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

and same in eoan, just confirmed.

Revision history for this message
James (jbarfieldsss) wrote :

Because sudo can be different in some systems, this worked best for me:

# Fix for Nginx PID problem in Ubuntu 16.04 in an EC2 instance
sudo systemctl stop nginx
sudo mkdir /etc/systemd/system/nginx.service.d
# build the file in root where no sudo is needed
printf "[Service]\nExecStartPost=/bin/sleep 0.1\n" > override.conf
sudo cp override.conf /etc/systemd/system/nginx.service.d/override.conf
sudo systemctl daemon-reload
sudo systemctl start nginx

Simon Déziel (sdeziel)
description: updated
description: updated
Revision history for this message
Thomas Ward (teward) wrote :

Can anyone confirm this affects the other LTSes and releases as well? It's in my list of things to 'triage' currently and address but I need confirmations on which releases it works in (and I have not been able to replicate as-is even in containers or VMs).

Revision history for this message
James (jbarfieldsss) wrote :

From a replicate perspective:

This was on a EC2 t2.micro with AWS 21.2 Ubuntu Deep Learning AMI that is 16.04, then installing nginx. The error only occurred when running a python+flask+uwsgi service. e.g. a simple https/http server

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    listen 80;
    server_name SERVERNAME;
    ssl_certificate /etc/ssl/certs/mykey.crt;
    ssl_certificate_key /etc/ssl/private/mykey.key;
    ssl_protocols TLSv1.2 TLSv1.1 TLSv1;
    location / {
        include uwsgi_params;
        uwsgi_pass unix:/home/ubuntu/mywsgi_sock.sock;
    }
}

I haven't tried different releases though.

Revision history for this message
TJ (tj) wrote :

This may be a viable workaround.

For a systemd Type=fork unit 'success' is determined when the parent process exits. At this time systemd expects the PIDFile to exist. nginx actually creates the PID file in the forked process.

Presumably on single CPU systems there is a sufficient delay in scheduling for the forked process not to have created the PID file when systemd checks for it.

Bring the creation of the PID file forward into the parent process.

Thomas Ward (teward)
Changed in nginx (Ubuntu Disco):
status: New → Confirmed
Changed in nginx (Ubuntu Cosmic):
status: New → Confirmed
Changed in nginx (Ubuntu Bionic):
status: New → Confirmed
Changed in nginx (Ubuntu Xenial):
status: New → Confirmed
Changed in nginx (Ubuntu Disco):
importance: Undecided → Low
Changed in nginx (Ubuntu Cosmic):
importance: Undecided → Low
Changed in nginx (Ubuntu Bionic):
importance: Undecided → Low
Changed in nginx (Ubuntu Xenial):
importance: Undecided → Low
Revision history for this message
Thomas Ward (teward) wrote :

Thanks for the patch, TJ.

Builds including the patch are currently being run in a testing PPA, located here:

https://launchpad.net/~teward/+archive/ubuntu/nginx-lp1581864

I do not have an ETA on build completion at this time, but feel free to check that PPA for builds. Once builds are complete and available, you may test them.

If tests seemed to work on those packages, please let me know and I will work on getting this into the respective affected releases. Note that the affected releases currently include: Xenial, Bionic, Cosmic, Disco, and Eoan.

Both Ubuntu Precise and Ubuntu Trusty are in ESM and cannot receive SRU(s) to this via the standard SRU process, and are not going to be patched by me.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "Bring PID creation into parent process" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Simon Déziel (sdeziel) wrote :

I tested the PPA build for Bionic (1.14.0-0ubuntu1.3) and it does not work:

# start nginx in background (as it hangs):
$ sudo systemctl start nginx &

# the parent PID is written to the PIDFile:
$ cat /run/nginx.pid
807

# eventually systemctl start fails and status:
$ systemctl status nginx
● nginx.service - A high performance web server and a reverse proxy server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/nginx.service.d
           └─nginx-hardening.conf
   Active: failed (Result: timeout) since Fri 2019-05-31 17:04:59 EDT; 3min 14s ago
     Docs: man:nginx(8)
 Main PID: 710 (code=exited, status=0/SUCCESS)

May 31 17:03:29 rproxy systemd[1]: Starting A high performance web server and a reverse proxy server...
May 31 17:03:29 rproxy systemd[1]: nginx.service: New main PID 807 does not exist or is a zombie.
May 31 17:04:59 rproxy systemd[1]: nginx.service: Start operation timed out. Terminating.
May 31 17:04:59 rproxy systemd[1]: nginx.service: Killing process 808 (nginx) with signal SIGKILL.
May 31 17:04:59 rproxy systemd[1]: nginx.service: Killing process 809 (nginx) with signal SIGKILL.
May 31 17:04:59 rproxy systemd[1]: nginx.service: Failed with result 'timeout'.
May 31 17:04:59 rproxy systemd[1]: Failed to start A high performance web server and a reverse proxy server.

Revision history for this message
Thomas Ward (teward) wrote :

Due to failure, the patch has been removed from the bug, but remains in the PPA.

tags: removed: patch
Revision history for this message
TJ (tj) wrote :

I've developed and tested a different approach. More invasive and probably not something upstream will like the look of!

The approach is to have ngx_daemon() not do the parent process exit() after fork() but to return the child PID. The parent process detects that (a value > 0), sets ngx_pid = child_pid, calls ngx_crate_pidfile(), then does the exit().

I tested with extensive log messages to trace behaviour and confirmed it works.

Jun 01 03:01:51 u1910 systemd[1]: Starting A high performance web server and a reverse proxy server...
Jun 01 03:01:51 u1910 nginx[27728]: nginx: Initialise child_pid=-3
Jun 01 03:01:51 u1910 nginx[27728]: nginx: ngx_daemon() in PID 27728 returned 27729
Jun 01 03:01:51 u1910 nginx[27728]: nginx: Process PID 27728 calling ngx_create_pidfile() with ngx_pid=27729
Jun 01 03:01:51 u1910 nginx[27728]: nginx: Process PID 27728 exiting
Jun 01 03:01:51 u1910 nginx[27728]: nginx: ngx_daemon() in PID 27729 returned 0
Jun 01 03:01:51 u1910 nginx[27728]: nginx: This should be child process PID 27729, parent PID 27728
Jun 01 03:01:51 u1910 systemd[1]: Started A high performance web server and a reverse proxy server.

Revision history for this message
TJ (tj) wrote :

The previous patch (v2) had a problem when the server reloaded. Because fork() wasn't done the tell-tale child_pid (-3) got passed to the PID file! This version should avoid that.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

If this is just about the error message in the logs, such an intrusive patch sounds overkill unless included upstream. Can you perhaps propose it there?

Revision history for this message
Simon Déziel (sdeziel) wrote :

@ahasenack, yes the only problem is the error message due to the bad ordering in PID handling. I think the plan is to test TJ's patch via PPA build to get the green light for upstream submission.

Revision history for this message
Simon Déziel (sdeziel) wrote :

I tested the PPA build for Bionic (1.14.0-0ubuntu1.3p1) and it works! systemd never looses track of the main daemon even through 'service nginx upgrade' cycles.

Revision history for this message
Simon Déziel (sdeziel) wrote :

I tested the PPA build for Eoan (1.16.0-0ubuntu2p1) and it works as well.

Revision history for this message
Thomas Ward (teward) wrote :

Everyone affected: Please test the PPA builds, they contain fixes for Xenial, Bionic, Cosmic, Disco, and Eoan.

If this works for all of the releases, I can consider SRU this specific patch as a distro-specific patch for NGINX here.

Revision history for this message
Thomas Ward (teward) wrote :

I've uploaded this as a distro-patch for Eoan currently, because I haven't heard any complaints from those who've been testing this.

Thomas Ward (teward)
Changed in nginx (Ubuntu Eoan):
status: Confirmed → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nginx - 1.16.0-0ubuntu2

---------------
nginx (1.16.0-0ubuntu2) eoan; urgency=medium

  * d/patches/nginx-fix-pidfile.patch: Fix NGINX PIDfile handling to avoid
    SystemD race condition - thanks to Tj for the patch. (LP: #1581864)

 -- Thomas Ward <email address hidden> Tue, 04 Jun 2019 11:43:30 -0400

Changed in nginx (Ubuntu Eoan):
status: Fix Committed → Fix Released
Revision history for this message
Erik Quaeghebeur (equaeghe) wrote :

Has the patch been proposed upstream? If so, do you have a link to the report? If not, is that still the plan?

Revision history for this message
Alex (quicktrick) wrote :

I've just encountered the same issue in CentOS 8 and nginx/1.19.0. The workaround helped, though it's a placebo.

Steve Langasek (vorlon)
Changed in nginx (Ubuntu Disco):
status: Confirmed → Won't Fix
Revision history for this message
Bryce Harrington (bryce) wrote :

Cosmic is out of support, closing the task.

Changed in nginx (Ubuntu Cosmic):
status: Confirmed → Won't Fix
Changed in nginx (Ubuntu Xenial):
status: Confirmed → Won't Fix
Revision history for this message
Miriam España Acebal (mirespace) wrote :

Hi,

I'm closing the Xenial task because it reached EOSS (End of Standard Support) so I'm afraid it can't be fixed (and this is not a security bug).

The bug is still present in Bionic because it is the only supported series without the patch (nginx-fix-pidfile.patch).

I found a discussion here [1] in a report filed upstream where they decide don't apply it (for anyone interested).

[1] https://trac.nginx.org/nginx/ticket/1897

tags: added: bionic server-todo
removed: xenial
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Planning to have a look as part of next cycles merge.

tags: removed: server-todo
Revision history for this message
Simon Déziel (sdeziel) wrote :

@paelzer, you made me revisit this and I can confirm that Focal+ is fixed, only Bionic is still affected.

Bryce Harrington (bryce)
tags: added: packaging
Changed in nginx (Debian):
status: New → Fix Released
tags: added: server-triage-discuss
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Athos for flagging this,
you are right if we do not fix it now for Bionic it won't be fixed at all.
But OTOH no one except internal people have revisited this for ages, and being only cosmetic (the service is up and running fine) this won't make the last minute (in the release lifecycle) SRU acceptance especially given the discussions around how it was resolved.

Upstream isn't very happy with the resolution [1] and I'd more lean to even challenge/reconsider if we might drop this in new versions - this is 3 years old maybe it is a non issue nowadays.

If someone thinks this is a wrong assumptions please speak up, outline why it is important to be fixed in Bionic in a way that will most likely also convince the SRU team to accept the risk.
Otherwise I'd call this a win't fix for Bionic too.

[1]: https://trac.nginx.org/nginx/ticket/1897

tags: removed: server-triage-discuss
Changed in nginx (Ubuntu Bionic):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.