needrestart should priorize and wait for network and DNS when restarting services

Bug #2051766 reported by NetVicious
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
needrestart (Ubuntu)
Status tracked in Oracular
Noble
New
Undecided
Simon Chopin
Oracular
New
Undecided
Simon Chopin

Bug Description

I got today the same problem in 2 servers.

In one of they the problem was nginx and on the other server was proftpd.

I didn't touched any config on both servers, I simply tried to start the services another time and it worked, so I think it's a network or DNS related problem.

I think needrestart should have a list of processes (we will call they 'first to restart' services) which can affect the restart of other services. Priorize the restart of those 'first to restart services' and wait for network and DNS before restarting other services which need to be restarted.

IMHO those 'first to restart' services should be, on this report scenario, the services systemd-networkd.service and systemd-resolved.service.

Let's check the logs:

Those are the services needrestart restarted and I got one error on nginx.

---------------------------------
Restarting services...
 /etc/needrestart/restart.d/systemd-manager
 systemctl restart nginx.service packagekit.service ssh.service systemd-journald.service systemd-networkd.service systemd-resolved.service systemd-udevd.service udisks2.service
Job for nginx.service failed because the control process exited with error code.
See "systemctl status nginx.service" and "journalctl -xeu nginx.service" for details.
Service restarts being deferred:
 systemctl restart systemd-logind.service
 systemctl restart user@1000.service
---------------------------------

Checking the log I see the failure when needrestart tried to do this job, and the next block of the log shows my manual start of the service just before the failure.

---------------------------------
Jan 31 08:12:39 ip-172-31-14-249 systemd[1]: Failed to start A high performance web server and a reverse proxy server.
░░ Subject: A start job for unit nginx.service has failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit nginx.service has finished with a failure.
░░
░░ The job identifier is 9587 and the job result is failed.

Jan 31 08:12:50 ip-172-31-14-249 systemd[1]: Starting A high performance web server and a reverse proxy server...
░░ Subject: A start job for unit nginx.service has begun execution
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit nginx.service has begun execution.
---------------------------------

These are the logs for the other server when I got the problem when restarting proftpd. 08:09:06=failure, 08:09:33=ok when restarted manually

---------------------------------
Jan 31 08:09:06 ip-172-31-22-67 systemd[1]: Stopping ProFTPD FTP Server...
░░ Subject: A stop job for unit proftpd.service has begun execution
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A stop job for unit proftpd.service has begun execution.
░░
░░ The job identifier is 26554.
Jan 31 08:09:06 ip-172-31-22-67 proftpd[62832]: Checking syntax of configuration file
Jan 31 08:09:06 ip-172-31-22-67 proftpd[62832]: 2024-01-31 08:09:06,971 ip-172-31-22-67 proftpd[62832]: mod_memcache/0.1: compiled using libmemcached-1.0.18 headers, but linke>
Jan 31 08:09:06 ip-172-31-22-67 proftpd[62832]: 2024-01-31 08:09:06,986 ip-172-31-22-67 proftpd[62832]: warning: unable to determine IP address of 'ip-172-31-22-67'
Jan 31 08:09:06 ip-172-31-22-67 proftpd[62832]: 2024-01-31 08:09:06,986 ip-172-31-22-67 proftpd[62832]: error: no valid servers configured
Jan 31 08:09:06 ip-172-31-22-67 proftpd[62832]: 2024-01-31 08:09:06,986 ip-172-31-22-67 proftpd[62832]: fatal: error processing configuration file '/etc/proftpd/proftpd.conf'
Jan 31 08:09:07 ip-172-31-22-67 systemd[1]: proftpd.service: Control process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ An ExecStartPre= process belonging to unit proftpd.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 1.
Jan 31 08:09:07 ip-172-31-22-67 systemd[1]: proftpd.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ The unit proftpd.service has entered the 'failed' state with result 'exit-code'.
Jan 31 08:09:07 ip-172-31-22-67 systemd[1]: Failed to start ProFTPD FTP Server.
░░ Subject: A start job for unit proftpd.service has failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit proftpd.service has finished with a failure.
░░
░░ The job identifier is 26554 and the job result is failed.
Jan 31 08:09:33 ip-172-31-22-67 systemd[1]: Starting ProFTPD FTP Server...
░░ Subject: A start job for unit proftpd.service has begun execution
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit proftpd.service has begun execution.
░░
░░ The job identifier is 26875.
Jan 31 08:09:33 ip-172-31-22-67 proftpd[63122]: Checking syntax of configuration file
Jan 31 08:09:33 ip-172-31-22-67 proftpd[63122]: 2024-01-31 08:09:33,532 ip-172-31-22-67 proftpd[63122]: mod_memcache/0.1: compiled using libmemcached-1.0.18 headers, but linke>
Jan 31 08:09:33 ip-172-31-22-67 proftpd[63122]: 2024-01-31 08:09:33,541 ip-172-31-22-67 proftpd[63122] ip-172-31-22-67.eu-west-1.compute.internal: 172.31.22.67:21 masquerading>
Jan 31 08:09:33 ip-172-31-22-67 proftpd[63123]: 2024-01-31 08:09:33,591 ip-172-31-22-67 proftpd[63123]: mod_memcache/0.1: compiled using libmemcached-1.0.18 headers, but linke>
Jan 31 08:09:33 ip-172-31-22-67 proftpd[63123]: 2024-01-31 08:09:33,595 ip-172-31-22-67 proftpd[63123] ip-172-31-22-67.eu-west-1.compute.internal: 172.31.22.67:21 masquerading>
Jan 31 08:09:33 ip-172-31-22-67 systemd[1]: Started ProFTPD FTP Server.
░░ Subject: A start job for unit proftpd.service has finished successfully
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit proftpd.service has finished successfully.
░░
░░ The job identifier is 26875.

Revision history for this message
Simon Chopin (schopin) wrote :

Hi, thanks for reporting this issue.

Could you please specify which version of Ubuntu you're using?

Revision history for this message
NetVicious (netvicious) wrote (last edit ):

Sorry for the late answer. I got the same problem today with nginx.

On this box I'm using "Ubuntu 22.04.4 LTS" fully updated.

Log for today problem:

Restarting services...
 systemctl restart acpid.service atd.service chrony.service irqbalance.service multipathd.service nginx.service packagekit.service polkit.service rsyslog.service <email address hidden> snapd.service ssh.service systemd-journald.service systemd-networkd.service systemd-resolved.service systemd-udevd.service udisks2.service
Job for nginx.service failed because the control process exited with error code.
See "systemctl status nginx.service" and "journalctl -xeu nginx.service" for details.
Service restarts being deferred:
 systemctl restart ModemManager.service
 /etc/needrestart/restart.d/dbus.service
 systemctl restart <email address hidden>
 systemctl restart networkd-dispatcher.service
 systemctl restart systemd-logind.service
 systemctl restart unattended-upgrades.service

After that failure I restarted nginx manually and all went ok.

Journal:

Jun 03 09:32:16 ip-172-31-14-249 systemd[1]: Failed to start A high performance web server and a reverse proxy server.
░░ Subject: A start job for unit nginx.service has failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit nginx.service has finished with a failure.
░░
░░ The job identifier is 35305 and the job result is failed.
Jun 03 09:32:37 ip-172-31-14-249 systemd[1]: Starting A high performance web server and a reverse proxy server...
░░ Subject: A start job for unit nginx.service has begun execution
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit nginx.service has begun execution.
░░
░░ The job identifier is 36224.

tail -f /var/log/nginx/error.log
(name of the upstreamserver changed ;-)

2024/06/03 00:54:37 [notice] 136414#136414: signal process started
2024/06/03 09:32:16 [emerg] 140604#140604: host not found in upstream "myupstreamserver.com" in /etc/nginx/sites-enabled/upstreams.conf:14
2024/06/03 09:32:37 [info] 140749#140749: Using 131072KiB of shared memory for nchan in /etc/nginx/nginx.conf:62
2024/06/03 09:37:06 [info] 140769#140769: Using 131072KiB of shared memory for nchan in /etc/nginx/nginx.conf:62
2024/06/03 09:37:09 [notice] 140772#140772: signal process started

As you can see on my previous message, proftpd said it cannot determine IP address of a dns resource. And on this case nginx says something similar.

So I think needrestart doesn't waits till the network and dns are ok before restarting other services which need network and dns operative.

Revision history for this message
Simon Chopin (schopin) wrote :

I'm guessing the glibc security update is the one that caused all those services to restart.

Is that something you can reproduce reliably, e.g. using `apt reinstall libc6`?
I might have an idea that could solve this problem without having to special-case networking service, but that's a theory and I'd need to actually test it out.

The gist of it would be to replace the single `systemctl restart` call by multiple ones. My underlying assumption is that systemd will batch up all those services for restart in parallel (unless there is an explicit ordering in the system files), whereas using dedicated restart calls means that, assuming the services are properly written to signal themselves as ready when they actually are, any service should always have a stable environment to use during its startup.

Changed in needrestart (Ubuntu Noble):
assignee: nobody → Simon Chopin (schopin)
Changed in needrestart (Ubuntu Oracular):
assignee: nobody → Simon Chopin (schopin)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.