systemd RestartSec does not seem to work consistently
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
systemd (Ubuntu) |
Expired
|
Undecided
|
Unassigned |
Bug Description
The unit file for my service xxxx-33005.service:
[Unit]
Description=product xxxx 33005 service
Wants=some.mount
After=some.mount
PartOf=
PartOf=
[Service]
Type=simple
ExecStart=
Restart=on-failure
User=product
Group=product
LimitCORE=infinity
LimitNOFILE=10000
Slice=xxxx.slice
WatchdogSec=120
RestartSec=10
StartLimitInter
StartLimitBurst=10
[Install]
WantedBy=
specifies RestartSec=10, which is also what systemd sees:
$ systemctl show xxxx-33005.service
Type=simple
Restart=on-failure
NotifyAccess=main
RestartUSec=10s
...
However, today I saw in the journalctl log:
$ journalctl -u xxxx-33005.service
...
Aug 15 23:28:29 a-hostname systemd[1]: xxxx-33005.service: Watchdog timeout (limit 2min)!
Aug 15 23:28:29 a-hostname systemd[1]: xxxx-33005.service: Main process exited, code=exited, status=
Aug 15 23:28:29 a-hostname systemd[1]: xxxx-33005.service: Unit entered failed state.
Aug 15 23:28:29 a-hostname systemd[1]: xxxx-33005.service: Failed with result 'exit-code'.
Aug 15 23:28:39 a-hostname systemd[1]: xxxx-33005.service: Service hold-off time over, scheduling restart.
Aug 15 23:28:39 a-hostname systemd[1]: Stopped product xxxx 33005 service.
Aug 15 23:28:39 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:28:45 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:28:45 a-hostname systemd[1]: Stopping product xxxx 33005 service...
Aug 15 23:28:45 a-hostname systemd[1]: Stopped product xxxx 33005 service.
Aug 15 23:33:51 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:33:51 a-hostname xxxx[3629]: Get http://
Aug 15 23:33:51 a-hostname systemd[1]: xxxx-33005.service: Main process exited, code=exited, status=
Aug 15 23:33:51 a-hostname systemd[1]: xxxx-33005.service: Unit entered failed state.
Aug 15 23:33:51 a-hostname systemd[1]: xxxx-33005.service: Failed with result 'exit-code'.
Aug 15 23:34:01 a-hostname systemd[1]: xxxx-33005.service: Service hold-off time over, scheduling restart.
Aug 15 23:34:02 a-hostname systemd[1]: Stopped product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: Started product xxxx 33005 service.
Aug 15 23:34:02 a-hostname systemd[1]: xxxx-33005.service: Start request repeated too quickly.
Aug 15 23:34:02 a-hostname systemd[1]: Failed to start product xxxx 33005 service.
Everything looks to me that systemd is behaving as expected until Aug 15 23:34:02 when it attempts to start xxxx-33005.service not at the rate I specified in RestartSec, but possibly at the default 100ms rate. This excerpt shows both the expected 10s restart rate as well as the unexpected restart rate.
$ lsb_release -rd
Description: Ubuntu 16.04.1 LTS
Release: 16.04
$ uname -a
Linux a-hostname 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ apt-cache policy systemd
systemd:
Installed: 229-4ubuntu16
Candidate: 229-4ubuntu19
Version table:
229-4ubuntu19 500
500 http://
*** 229-4ubuntu16 100
100 /var/lib/
229-4ubuntu10 500
500 http://
229-4ubuntu4 500
500 http://
A minimal working reproducer with e.g. ExecStart= /bin/false would be useful. To remove the uncertainties w.r.t. executables involved. As I do not have access to the masked product.
Is this reproducible in Ubuntu Artful which has much newer systemd?