nginx.service: Failed to read PID from file /run/nginx.pid: Invalid argument

Bug #1581864 reported by Simon Déziel on 2016-05-14
64
This bug affects 12 people
Affects Status Importance Assigned to Milestone
nginx (Debian)
New
Unknown
nginx (Ubuntu)
Status tracked in Eoan
Xenial
Low
Unassigned
Bionic
Low
Unassigned
Cosmic
Low
Unassigned
Disco
Low
Unassigned
Eoan
Low
Thomas Ward

Bug Description

[Description]

Nginx logs an error when started on a machine with a single CPU:

systemctl start nginx
systemctl status nginx
● nginx.service - A high performance web server and a reverse proxy server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2016-05-14 19:35:03 UTC; 2s ago
  Process: 1303 ExecStop=/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pid (code=exited, status=0/SUCCESS)
  Process: 1307 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
  Process: 1306 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
 Main PID: 1308 (nginx)
    Tasks: 5 (limit: 512)
   Memory: 3.1M
      CPU: 81ms
   CGroup: /system.slice/nginx.service
           ├─1308 nginx: master process /usr/sbin/nginx -g daemon on; master_process on
           ├─1309 nginx: worker process
           ├─1310 nginx: worker process
           ├─1311 nginx: worker process
           └─1312 nginx: worker process

May 14 19:35:03 ngx systemd[1]: Starting A high performance web server and a reverse proxy server...
May 14 19:35:03 ngx systemd[1]: nginx.service: Failed to read PID from file /run/nginx.pid: Invalid argument
May 14 19:35:03 ngx systemd[1]: Started A high performance web server and a reverse proxy server.

Bumping the number of CPU available makes the error disappear. This is reproducible on VMs and containers.

Lastly, the PID file is properly created and matches the PID of the master process.

[Workaround]

  sudo mkdir /etc/systemd/system/nginx.service.d
  printf "[Service]\nExecStartPost=/bin/sleep 0.1\n" | \
    sudo tee /etc/systemd/system/nginx.service.d/override.conf
  sudo systemctl daemon-reload
  sudo systemctl restart nginx

[Additional information]

# lsb_release -rd
Description: Ubuntu 16.04 LTS
Release: 16.04

# apt-cache policy nginx-core
nginx-core:
  Installed: 1.10.0-0ubuntu0.16.04.1
  Candidate: 1.10.0-0ubuntu0.16.04.1
  Version table:
 *** 1.10.0-0ubuntu0.16.04.1 500
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1.9.15-0ubuntu1 500
        500 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: nginx-core 1.10.0-0ubuntu0.16.04.1
ProcVersionSignature: Ubuntu 4.4.0-22.39-generic 4.4.8
Uname: Linux 4.4.0-22-generic x86_64
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
Date: Sat May 14 19:35:21 2016
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
SourcePackage: nginx
UpgradeStatus: No upgrade log present (probably fresh install)

Simon Déziel (sdeziel) wrote :
Simon Déziel (sdeziel) wrote :

It seems to be a race between systemd and nginx. As if systemd was expecting the PID file to be populated before nginx had the time to create it.

Workaround:

 printf "[Service]\nExecStartPost=/bin/sleep 0.1\n" > /etc/systemd/system/nginx.service.d/override.conf
 systemctl daemon-reload

Simon Déziel (sdeziel) wrote :

In the above workaround, I forgot the first step (mkdir) so here it is again:

Workaround:

 mkdir /etc/systemd/system/nginx.service.d
 printf "[Service]\nExecStartPost=/bin/sleep 0.1\n" > /etc/systemd/system/nginx.service.d/override.conf
 systemctl daemon-reload

Thomas Ward (teward) wrote :

Confirmed on Xenial, with 1-CPU systems, that there may be such a race condition.

Given that the process still starts, I'm marking this as Low.

Changed in nginx (Ubuntu):
assignee: nobody → Thomas Ward (teward)
importance: Undecided → Low
status: New → Confirmed
Hristian (hristian-carabulea) wrote :

Oct 16 12:11:48 xxx systemd[1]: Starting A high performance web server and a reverse proxy server...
Oct 16 12:11:48 xxx systemd[1]: nginx.service: Failed to read PID from file /run/nginx.pid: Invalid argument
Oct 16 12:11:48 xxx systemd[1]: Started A high performance web server and a reverse proxy server.

A proposed solution did not work:

sudo mkdir /etc/systemd/system/nginx.service.d

printf "[Service]\nExecStartPost=/bin/sleep 0.1\n" > /etc/systemd/system/
-bash: /etc/systemd/system/: Is a directory

nginx.service.d/override.conf
-bash: nginx.service.d/override.conf: No such file or directory

Simon Déziel (sdeziel) wrote :

Hi Hristian, the string after the ">" char should be on a single line so that printf's output is directed to the file /etc/systemd/system/nginx.service.d/override.conf.

Alex Badics (alex.badics.bb) wrote :

Hi,

We are also affected, and I've looked into the source. In main(), the ngx_create_pidfile() gets called right after the ngx_daemon() function. ngx_daemon of course forks, and the parent process exits immediately with exit(0). This is a pure race condition between nginx and systemd.

This is not a problem (other than the scary log), because the PIDFile option is not needed for determining the main PID of nginx (systemd can correctly "guess" the main pid), it's only for removing the file when the service exits.

As far as I can tell, this daemonization is not needed for the systemd service use-case. The service should be Type=simple, and 'daemon off'. The standard file handles get redirected by systemd anyway, and non-stop upgrade cannot be used in this case either. (See: http://nginx.org/en/docs/faq/daemon_master_process_off.html )

If this is too much of a change, another workaround is removing the PIDFile option, and adding the line:
ExecStopPost=/bin/rm -f /run/nginx.pid

Regards,
Alex

Thomas Ward (teward) wrote :

FYI: If we don't have this being a confirmed issue in Debian, and we don't have a bug report up in Debian, ***please do not add a Debian task for this***. I just checked the Debian bug tracker and this is not reported up in Debian.

Has anyone confirmed this is also an issue in Debian?

John Navratil (john-navratil) wrote :

I'm seeing this on a Raspberry Pi Model 3 running Raspbian

uname -a:Linux www 4.9.24+ #993 Wed Apr 26 17:56:54 BST 2017 armv6l GNU/Linux

John Navratil (john-navratil) wrote :

Edit to previous comment. It's an R-Pi 2 B+

Changed in nginx (Debian):
importance: Undecided → Unknown
status: New → Unknown
Changed in nginx (Debian):
status: Unknown → New
Julian (estrotococoa) wrote :

Thank you!!! Simon Déziel (sdeziel) wrote on 2016-05-14: #3

"In the above workaround, I forgot the first step (mkdir) so here it is again:

Workaround:

 mkdir /etc/systemd/system/nginx.service.d
 printf "[Service]\nExecStartPost=/bin/sleep 0.1\n" > /etc/systemd/system/nginx.service.d/override.conf
 systemctl daemon-reload"

This worked for me in Debian 9, on a Linode's VPS

Karl König (kallekoenig) wrote :

Thank you, Simon! The workaround worked for me as well.

Andreas Hasenack (ahasenack) wrote :

I checked and this is still happening across the board: xenial, bionic, disco. Didn't check eoan yet, which has a new major version (1.16.0), my eoan cloud image is still downloading.

Andreas Hasenack (ahasenack) wrote :

and same in eoan, just confirmed.

James (jbarfieldsss) wrote :

Because sudo can be different in some systems, this worked best for me:

# Fix for Nginx PID problem in Ubuntu 16.04 in an EC2 instance
sudo systemctl stop nginx
sudo mkdir /etc/systemd/system/nginx.service.d
# build the file in root where no sudo is needed
printf "[Service]\nExecStartPost=/bin/sleep 0.1\n" > override.conf
sudo cp override.conf /etc/systemd/system/nginx.service.d/override.conf
sudo systemctl daemon-reload
sudo systemctl start nginx

Simon Déziel (sdeziel) on 2019-05-31
description: updated
description: updated
Thomas Ward (teward) wrote :

Can anyone confirm this affects the other LTSes and releases as well? It's in my list of things to 'triage' currently and address but I need confirmations on which releases it works in (and I have not been able to replicate as-is even in containers or VMs).

James (jbarfieldsss) wrote :

From a replicate perspective:

This was on a EC2 t2.micro with AWS 21.2 Ubuntu Deep Learning AMI that is 16.04, then installing nginx. The error only occurred when running a python+flask+uwsgi service. e.g. a simple https/http server

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    listen 80;
    server_name SERVERNAME;
    ssl_certificate /etc/ssl/certs/mykey.crt;
    ssl_certificate_key /etc/ssl/private/mykey.key;
    ssl_protocols TLSv1.2 TLSv1.1 TLSv1;
    location / {
        include uwsgi_params;
        uwsgi_pass unix:/home/ubuntu/mywsgi_sock.sock;
    }
}

I haven't tried different releases though.

TJ (tj) wrote :

This may be a viable workaround.

For a systemd Type=fork unit 'success' is determined when the parent process exits. At this time systemd expects the PIDFile to exist. nginx actually creates the PID file in the forked process.

Presumably on single CPU systems there is a sufficient delay in scheduling for the forked process not to have created the PID file when systemd checks for it.

Bring the creation of the PID file forward into the parent process.

Thomas Ward (teward) on 2019-05-31
Changed in nginx (Ubuntu Disco):
status: New → Confirmed
Changed in nginx (Ubuntu Cosmic):
status: New → Confirmed
Changed in nginx (Ubuntu Bionic):
status: New → Confirmed
Changed in nginx (Ubuntu Xenial):
status: New → Confirmed
Changed in nginx (Ubuntu Disco):
importance: Undecided → Low
Changed in nginx (Ubuntu Cosmic):
importance: Undecided → Low
Changed in nginx (Ubuntu Bionic):
importance: Undecided → Low
Changed in nginx (Ubuntu Xenial):
importance: Undecided → Low
Thomas Ward (teward) wrote :

Thanks for the patch, TJ.

Builds including the patch are currently being run in a testing PPA, located here:

https://launchpad.net/~teward/+archive/ubuntu/nginx-lp1581864

I do not have an ETA on build completion at this time, but feel free to check that PPA for builds. Once builds are complete and available, you may test them.

If tests seemed to work on those packages, please let me know and I will work on getting this into the respective affected releases. Note that the affected releases currently include: Xenial, Bionic, Cosmic, Disco, and Eoan.

Both Ubuntu Precise and Ubuntu Trusty are in ESM and cannot receive SRU(s) to this via the standard SRU process, and are not going to be patched by me.

The attachment "Bring PID creation into parent process" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Simon Déziel (sdeziel) wrote :

I tested the PPA build for Bionic (1.14.0-0ubuntu1.3) and it does not work:

# start nginx in background (as it hangs):
$ sudo systemctl start nginx &

# the parent PID is written to the PIDFile:
$ cat /run/nginx.pid
807

# eventually systemctl start fails and status:
$ systemctl status nginx
● nginx.service - A high performance web server and a reverse proxy server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/nginx.service.d
           └─nginx-hardening.conf
   Active: failed (Result: timeout) since Fri 2019-05-31 17:04:59 EDT; 3min 14s ago
     Docs: man:nginx(8)
 Main PID: 710 (code=exited, status=0/SUCCESS)

May 31 17:03:29 rproxy systemd[1]: Starting A high performance web server and a reverse proxy server...
May 31 17:03:29 rproxy systemd[1]: nginx.service: New main PID 807 does not exist or is a zombie.
May 31 17:04:59 rproxy systemd[1]: nginx.service: Start operation timed out. Terminating.
May 31 17:04:59 rproxy systemd[1]: nginx.service: Killing process 808 (nginx) with signal SIGKILL.
May 31 17:04:59 rproxy systemd[1]: nginx.service: Killing process 809 (nginx) with signal SIGKILL.
May 31 17:04:59 rproxy systemd[1]: nginx.service: Failed with result 'timeout'.
May 31 17:04:59 rproxy systemd[1]: Failed to start A high performance web server and a reverse proxy server.

Thomas Ward (teward) wrote :

Due to failure, the patch has been removed from the bug, but remains in the PPA.

tags: removed: patch
TJ (tj) wrote :

I've developed and tested a different approach. More invasive and probably not something upstream will like the look of!

The approach is to have ngx_daemon() not do the parent process exit() after fork() but to return the child PID. The parent process detects that (a value > 0), sets ngx_pid = child_pid, calls ngx_crate_pidfile(), then does the exit().

I tested with extensive log messages to trace behaviour and confirmed it works.

Jun 01 03:01:51 u1910 systemd[1]: Starting A high performance web server and a reverse proxy server...
Jun 01 03:01:51 u1910 nginx[27728]: nginx: Initialise child_pid=-3
Jun 01 03:01:51 u1910 nginx[27728]: nginx: ngx_daemon() in PID 27728 returned 27729
Jun 01 03:01:51 u1910 nginx[27728]: nginx: Process PID 27728 calling ngx_create_pidfile() with ngx_pid=27729
Jun 01 03:01:51 u1910 nginx[27728]: nginx: Process PID 27728 exiting
Jun 01 03:01:51 u1910 nginx[27728]: nginx: ngx_daemon() in PID 27729 returned 0
Jun 01 03:01:51 u1910 nginx[27728]: nginx: This should be child process PID 27729, parent PID 27728
Jun 01 03:01:51 u1910 systemd[1]: Started A high performance web server and a reverse proxy server.

TJ (tj) wrote :

The previous patch (v2) had a problem when the server reloaded. Because fork() wasn't done the tell-tale child_pid (-3) got passed to the PID file! This version should avoid that.

Andreas Hasenack (ahasenack) wrote :

If this is just about the error message in the logs, such an intrusive patch sounds overkill unless included upstream. Can you perhaps propose it there?

Simon Déziel (sdeziel) wrote :

@ahasenack, yes the only problem is the error message due to the bad ordering in PID handling. I think the plan is to test TJ's patch via PPA build to get the green light for upstream submission.

Simon Déziel (sdeziel) wrote :

I tested the PPA build for Bionic (1.14.0-0ubuntu1.3p1) and it works! systemd never looses track of the main daemon even through 'service nginx upgrade' cycles.

Simon Déziel (sdeziel) wrote :

I tested the PPA build for Eoan (1.16.0-0ubuntu2p1) and it works as well.

Thomas Ward (teward) wrote :

Everyone affected: Please test the PPA builds, they contain fixes for Xenial, Bionic, Cosmic, Disco, and Eoan.

If this works for all of the releases, I can consider SRU this specific patch as a distro-specific patch for NGINX here.

Thomas Ward (teward) wrote :

I've uploaded this as a distro-patch for Eoan currently, because I haven't heard any complaints from those who've been testing this.

Thomas Ward (teward) on 2019-06-21
Changed in nginx (Ubuntu Eoan):
status: Confirmed → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nginx - 1.16.0-0ubuntu2

---------------
nginx (1.16.0-0ubuntu2) eoan; urgency=medium

  * d/patches/nginx-fix-pidfile.patch: Fix NGINX PIDfile handling to avoid
    SystemD race condition - thanks to Tj for the patch. (LP: #1581864)

 -- Thomas Ward <email address hidden> Tue, 04 Jun 2019 11:43:30 -0400

Changed in nginx (Ubuntu Eoan):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.