Cannot restart nginx when listening on UNIX domain sockets

Bug #1957320 reported by Athos Ribeiro
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nginx (Ubuntu)
Fix Released
Low
Unassigned
Jammy
Fix Released
Low
Michał Małoszewski

Bug Description

[Impact]

* Restarting nginx fails on Jammy is listening on unix sockets is configured. It produces the errors in the error.log.

* That issue is caused by the behavior of the custom socket closing code in
  ngx_master_process_cycle().

* The fix is to replace the custom socket closing code in
  ngx_master_process_cycle() by the ngx_close_listening_sockets() call.

[Test Plan]

Make a container for testing:

$ lxc launch ubuntu-daily:jammy jammy-test
$ lxc shell jammy-test

Type in:

# apt install -y nginx

# systemctl stop nginx

Then type in:

# cat << EOF > /etc/nginx/sites-enabled/serve-files
server {
    listen unix:/run/serve-files.socket;
    root /var/www/files;
    location / {
        try_files \$uri =404;
    }
}
EOF

# service nginx start

# service nginx stop

# service nginx restart

Enter the /var/log/nginx/error.log

Example of failed output:

You will be able to see the error logs in the error.log file.

2023/06/01 10:39:02 [emerg] 895#895: bind() to unix:/run/serve-files.socket failed (98: Unknown error)
2023/06/01 10:39:02 [emerg] 895#895: bind() to unix:/run/serve-files.socket failed (98: Unknown error)
2023/06/01 10:39:02 [emerg] 895#895: bind() to unix:/run/serve-files.socket failed (98: Unknown error)
2023/06/01 10:39:02 [emerg] 895#895: bind() to unix:/run/serve-files.socket failed (98: Unknown error)
2023/06/01 10:39:02 [emerg] 895#895: bind() to unix:/run/serve-files.socket failed (98: Unknown error)
2023/06/01 10:39:02 [emerg] 895#895: still could not bind()

Moreover: systemctl status nginx

× nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
    Drop-In: /run/systemd/system/service.d
             └─zzz-lxc-service.conf
     Active: failed (Result: exit-code) since Thu 2023-06-01 10:39:05 UTC; 56s ago
       Docs: man:nginx(8)
    Process: 894 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
    Process: 895 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=1/FAILURE)

Jun 01 10:39:02 jammy-test systemd[1]: Starting A high performance web server and a reverse proxy server...
Jun 01 10:39:02 jammy-test nginx[895]: nginx: [emerg] bind() to unix:/run/serve-files.socket failed (98: Unknown error)
Jun 01 10:39:03 jammy-test nginx[895]: nginx: [emerg] bind() to unix:/run/serve-files.socket failed (98: Unknown error)
Jun 01 10:39:03 jammy-test nginx[895]: nginx: [emerg] bind() to unix:/run/serve-files.socket failed (98: Unknown error)
Jun 01 10:39:04 jammy-test nginx[895]: nginx: [emerg] bind() to unix:/run/serve-files.socket failed (98: Unknown error)
Jun 01 10:39:04 jammy-test nginx[895]: nginx: [emerg] bind() to unix:/run/serve-files.socket failed (98: Unknown error)
Jun 01 10:39:05 jammy-test nginx[895]: nginx: [emerg] still could not bind()
Jun 01 10:39:05 jammy-test systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jun 01 10:39:05 jammy-test systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 01 10:39:05 jammy-test systemd[1]: Failed to start A high performance web server and a reverse proxy server.

Example of successful output:

No errors in the error.log file.

Nginx can be restarted without any problems.

[Where problems could occur]

* After installing nginx, causing the issue and then installing nginx with
  fix or simply re-installing the same version - the nginx fails to upgrade,
  because the former issue that is caused makes the service non-startable and
  the upgrade will trigger a service restart in postinst. Since that fails it
  cannot upgrade.
  If you are affected, just remove the offending socket and restart the
  service - after that (due to the fix) you will not be affected by this
  again.

---------------------------original bug report--------------------------------------------

As discussed in [1], there is a bug in nginx [2] which makes Ubuntu's systemd unit restarts fail when nginx is litening on a unix domain socket. This happens because nginx fails to remove the socket during its shutdown process.

This issue has been reported in Debian in [3], and has been fixed upstream since 1.19.1, with the following patch: [4].

To reproduce the issue, run the following commands from a jammy machine:

# apt install -y nginx
# systemctl stop nginx
# mkdir -p /var/www/files
# echo hello > /var/www/files/hello
# cat << EOF > /run/serve-files.socket
server {
    listen unix:/run/serve-files.socket;
    root /var/www/files;
    location / {
        try_files $uri =404;
    }
}
EOF
# systemctl start nginx

Verify it works with:

# echo -e "GET /hello HTTP/1.0\r\n" | netcat -U /run/serve-files.socket

And restart the service:

# systemctl restart nginx

This will throw an error and the service will end in a failed state.

Verify that the socket file in /run/serve-files.socket was not removed.

Removing the socket file should allow you to restart the service.

[1] https://bugs.launchpad.net/ubuntu/+source/nginx/+bug/1919965
[2] https://trac.nginx.org/nginx/ticket/753
[3] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=821111
[4] http://hg.nginx.org/nginx/rev/7cbf6389194b

Related branches

Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

Adding this to the Ubuntu Server backlog to match LP: #1919965

Changed in nginx (Ubuntu):
importance: Undecided → Low
Revision history for this message
Robie Basak (racb) wrote :

This is presumed fixed in Kinetic (nginx 1.22.0-1ubuntu1) since the upstream release for 1.22.0 (May 2022) was after it was reported fixed upstream in 1.19.1. So I'll swap the task to a Jammy one only.

Changed in nginx (Ubuntu):
status: New → Fix Released
Changed in nginx (Ubuntu Jammy):
importance: Undecided → Low
Changed in nginx (Ubuntu Jammy):
status: New → Triaged
tags: added: bitesize
Revision history for this message
Michał Małoszewski (michal-maloszewski99) wrote :

I am going to take care of that.

Changed in nginx (Ubuntu Jammy):
assignee: nobody → Michał Małoszewski (michal-maloszewski99)
Changed in nginx (Ubuntu Jammy):
status: Triaged → In Progress
description: updated
tags: added: server-todo
description: updated
description: updated
description: updated
description: updated
Revision history for this message
Michał Małoszewski (michal-maloszewski99) wrote :

If someone will test the fix, please:
Do a separate jammy container/vm to test an example of failure and a separate one to test an example of successful behavior.
If we do that in the same container, I mean, recreate failure and then install right away the version where the fix exists, it will break.
So, recreate the failure and then shell into a new VM or container, and right away install the correct version of nginx with the fix and follow the steps from the Test Plan.

description: updated
Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

> If we do that in the same container, I mean, recreate failure and then install right away the version where the fix exists, it will break.

Understanding why this happens seem to be an important step before we provide a fix to this bug. Will this impact the upgrade path for users?

Revision history for this message
Michał Małoszewski (michal-maloszewski99) wrote :

After installing the version of nginx with fix, there is an error: (fyi before installing, the nginx service is stopped to do that)

dpkg: error processing package nginx-core (--configure):
 installed nginx-core package post-installation script subprocess returned error exit status 1
dpkg: dependency problems prevent configuration of nginx:
 nginx depends on nginx-core (<< 1.18.0-6ubuntu14.4~ppa1.1~) | nginx-full (<< 1.18.0-6ubuntu14.4~ppa1.1~) | nginx-light (<< 1.18.0-6ubuntu14.4~ppa1.1~) | nginx-extras (<< 1.18.0-6ubuntu14.4~ppa1.1~); how
ever:
  Package nginx-core is not configured yet.
  Package nginx-full is not installed.
  Package nginx-light is not installed.
  Package nginx-extras is not installed.
 nginx depends on nginx-core (>= 1.18.0-6ubuntu14.4~ppa1) | nginx-full (>= 1.18.0-6ubuntu14.4~ppa1) | nginx-light (>= 1.18.0-6ubuntu14.4~ppa1) | nginx-extras (>= 1.18.0-6ubuntu14.4~ppa1); however:
  Package nginx-core is not configured yet.
  Package nginx-full is not installed.
  Package nginx-light is not installed.
  Package nginx-extras is not installed.

dpkg: error processing package nginx (--configure):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 nginx-core
 nginx
E: Sub-process /usr/bin/dpkg returned an error code (1)

Revision history for this message
Michał Małoszewski (michal-maloszewski99) wrote :

So there is also a problem with dependencies. I think there is a need to file a new bug report with that.

Revision history for this message
Michał Małoszewski (michal-maloszewski99) wrote :

To sum up, there is a need to uninstall the old version of nginx and install a new one.

Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

You got an error in nginx-core postinst script. See

dpkg: error processing package nginx-core (--configure):
 installed nginx-core package post-installation script subprocess returned error exit status 1.

You should investigate what is going on there and ensure this is not a regression caused by the proposed changes. For example, is it a configuration issue with the sample config files that is preventing a service restart? Is it something else?

I don't believe that suggesting package re-installation would be an acceptable SRU outcome.

Revision history for this message
Michał Małoszewski (michal-maloszewski99) wrote :

sure Athos, I will investigate it

description: updated
tags: removed: bitesize
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

We have ran some tests
- 1 - nginx -> nginx+fix
- 2 - nginx -> cause issue -> nginx+fix
- 3 - nginx -> cause issue -> nginx (re-install same)
- 4 - install nginx+fix -> cause issue

#1
- works just fine
- there is no upgrade issue with the proposed fix

#2
- fails to upgrade, because the former issue we caused makes the service non-startable and the upgrade will trigger a service restart in postinst. Since that fails it can not upgrade.

2b) If we remove the opposing file, then service restart or package upgrades work.

#3
- behaves just like #2, so it really isn't a "new" issue by the fix.
- It is instead a consequence of the original issue

#4 shows correctly that with the fix applied, the start, stop, restart works fine -> verified

In theory one could start to think about pre/postinst magic.
That magic would do the rm on the socket that is breaking the (re)start of the service.
But the problem is that the path in question is user defined, we would need to grow the ability to correctly parse an nginx configuration - to then remove a file we have gotten out of that config.

This is dangerous at best and should not be a path forward.

Outcomes from here:
- not providing that fix to users -> people will stay affected and without a fix :-/
- providing that fix to users:
  a) not affected -> it will upgrade and never occur
  b) already affected and aware (like on this bug) -> the upgrade will trigger, the issue will manually be resolved once and then things are good
  c) already affected (and potentially not knowing as it only occurs on restart) -> the issue will trigger on restart and the upgrade will disable the webserver

So this is really tricky, we need to provide a fix to avoid this from affecting people.
But providing the fix might people that are affected but not knowing to lose their function.

BUT - we are NOT making it worse.
Any update to nginx will trigger this problem IF the user has configured it to use sockets.
So this update potentially causing it would at least prevent it from happening again and again e.g. on security updates which are even unattended in the background (and would still lose the server).

I sadly can not think of a great magic, that is also safe as deleting files never is nice.
A potential way could be to do this
1. simple check through config to get candidates
$ grep -v '^\s*#' /etc/nginx/sites-enabled/* | grep -o 'unix:.*;'
unix:/run/serve-files.socket;
2. to ensure this isn't someone playing tricks, check if this is open by nginx for real
$ lsof -a -p $(systemctl show --property MainPID --value nginx) -U /run/serve-files.socket
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nginx 3709 root 8u unix 0x0000000000000000 0t0 1845694 /run/serve-files.socket type=STREA
3. Then - only for that versioned upgrade - remove that file after stop and before start

@SRU team, with that explained how do you feel about providing the upgrade
a) as-is (as any update would cause the issue on systems configured for it, we are biting the bullet and are over it)
b) as-is + magic code (not biting the bullet but potential for a new grenade)
c) any other alternative you'd prefer?

Revision history for this message
Robie Basak (racb) wrote :

Thank you for the excellent investigation and analysis!

Staging the fix would definitely not make it worse for any user, right? However, staging it would mean that new production users won't get the fix and would break on next upgrade.

nginx has already had three updates in Jammy so it seems likely that it will have many more. So not fixing this would be worse.

As you've considered, it would be better if we could have it not break an affected user even the first time, but I agree that does not seem practical.

I wondered about trying to flag the situation via debconf in the preinst, or even blocking the upgrade in case of non-attended installation, but that also seems worse. For example a security update might never land via unattended-upgrades in this case.

Unless someone comes up with something better, I think releasing the SRU as normal while acknowledging the breakage for affected users here is the least worst option.

description: updated
description: updated
description: updated
Revision history for this message
Chris Halse Rogers (raof) wrote : Please test proposed package

Hello Athos, or anyone else affected,

Accepted nginx into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nginx/1.18.0-6ubuntu14.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in nginx (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
Michał Małoszewski (michal-maloszewski99) wrote (last edit ):
Download full text (3.5 KiB)

The fix works, 1.18.0-6ubuntu14.4 in Jammy fixes the bug.

I've created the jammy container using steps from the [Test Plan] section listed above in the Bug Description.

Then I installed nginx:

$ apt install -y nginx

and inside that container I typed in:

$ apt policy nginx

The output:

nginx:
  Installed: 1.18.0-6ubuntu14.3
  Candidate: 1.18.0-6ubuntu14.3
  Version table:
 *** 1.18.0-6ubuntu14.3 500
        500 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
        100 /var/lib/dpkg/status
     1.18.0-6ubuntu14 500
        500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages

Then I repeated the steps from the [Test Plan] section.

I've noticed that nothing has changed there, so the problem still exists:

× nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
    Drop-In: /run/systemd/system/service.d
             └─zzz-lxc-service.conf
     Active: failed (Result: exit-code) since Wed 2023-06-28 19:03:05 UTC; 56s ago
       Docs: man:nginx(8)
    Process: 894 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
    Process: 895 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=1/FAILURE)

Jun 28 19:03:05 jammy-test systemd[1]: Starting A high performance web server and a reverse proxy server...
Jun 28 19:03:05 jammy-test nginx[895]: nginx: [emerg] bind() to unix:/run/serve-files.socket failed (98: Unknown error)
Jun 28 19:03:05 jammy-test nginx[895]: nginx: [emerg] bind() to unix:/run/serve-files.socket failed (98: Unknown error)
Jun 28 19:03:05 jammy-test nginx[895]: nginx: [emerg] bind() to unix:/run/serve-files.socket failed (98: Unknown error)
Jun 28 19:03:05 jammy-test nginx[895]: nginx: [emerg] bind() to unix:/run/serve-files.socket failed (98: Unknown error)
Jun 28 19:03:05 jammy-test nginx[895]: nginx: [emerg] bind() to unix:/run/serve-files.socket failed (98: Unknown error)
Jun 28 19:03:05 jammy-test nginx[895]: nginx: [emerg] still could not bind()
Jun 28 19:03:05 jammy-test systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jun 28 19:03:05 jammy-test systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 28 19:03:05 jammy-test systemd[1]: Failed to start A high performance web server and a reverse proxy server.

The nginx fails to upgrade, because the former issue that is caused makes the service non-startable and the upgrade will trigger a service restart in postinst. Since that fails it cannot upgrade - as expected.
It has already been discussed in the comments above.

Then, I just removed the offending socket file and restarted the service. Then I've upgraded nginx using:

$ apt install nginx=1.18.0-6ubuntu14.4

Later I've typed in:

$ apt policy nginx
to check if the installed version has changed, and we see that we have a new version installed (with a fix).

nginx:
  Installed: 1.18.0-6ubuntu14.4
  Candidate: 1.18.0-6ubuntu14.4
  Version table:
 *** 1.18.0-6ubuntu14.4 500
        500 http:/...

Read more...

tags: added: verification-done verification-done-jammy
removed: verification-needed verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nginx - 1.18.0-6ubuntu14.4

---------------
nginx (1.18.0-6ubuntu14.4) jammy; urgency=medium

  * d/p/lp1957320-jammy-fixed-sigquit-issue-with-unix-sockets.patch:
    Fix SIGQUIT by replacing the custom socket closing code in the
    ngx_process_cycle.c file by calling another function.
    (LP: #1957320)

 -- Michal Maloszewski <email address hidden> Tue, 30 May 2023 19:31:46 +0200

Changed in nginx (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for nginx has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.