systemctl reload unbound command times out

Bug #1788622 reported by Alexey Dushechkin
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
unbound (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

 * Due to a bug in the upstream code on a reload command all worker
   processes will exit and the reload will fail. After a while a kill-
   detection kicks and restarts things, but the reload was intended to not
   shred the main process after all.

 * Backport upstream fix to Bionic to unbreak reloading the service

[Test Case]

 Note: can be tested in a LXD container

 * Install unbound
   $ apt get install unbound
 * Reload it
   $ systemctl reload unbound
     Job for unbound.service failed because a timeout was exceeded.
     See "systemctl status unbound.service" and "journalctl -xe" for
     details.
 * You might also see the timeout detection kick in and restarting the service now.
   systemd[1]: unbound.service: State 'stop-sigterm' timed out. Killing.
   systemd[1]: unbound.service: Killing process 13713 (unbound) with signal SIGKILL.
   systemd[1]: unbound.service: Main process exited, code=killed, status=9/KILL
   systemd[1]: unbound.service: Failed with result 'timeout'.
   systemd[1]: Reload failed for Unbound DNS server.

 * This takes a while and can break the service, to reset for a new test
   you might need to start the service again.

[Regression Potential]

 * The code changed is rather small, which is good for a review. I'd think
   if there is a regression it might be around unexpected conditions
   leading to not all workers shutting down in cases one wants to shut
   them down. I wasn't able to trigger such in my tests, but that is what
   would come to my mind.

[Other Info]

 * Test PPA at https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3382

--

Unbound 1.6.7-1ubuntu2.1 on 18.04 (bionic):

$ sudo systemctl reload unbound
Job for unbound.service failed because a timeout was exceeded.
See "systemctl status unbound.service" and "journalctl -xe" for details.

Looks like https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892914 (Debian) and https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=4108 (upstream) to me, with patch available.

Related branches

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (6.5 KiB)

Hi Alexey,
I agree - I can reproduce the error and see just the same.

Bionic - reload hangs and then fails, eventually the service is fully restarted due to the kill after timeout
Cosmic - working fine and reports:

The logs contain a few unrelated warnings as I run it in a container, but one can see the difference. Both logs show
1. install (and start of service)
2. calling systemctl reload

Bionic:
$ journalctl -u unbound
-- Logs begin at Fri 2018-08-10 09:34:15 UTC, end at Fri 2018-08-24 05:21:24 UTC. --
Aug 24 05:19:42 b systemd[1]: unbound.service: Failed to reset devices.list: Operation not permitted
Aug 24 05:19:42 b systemd[1]: Starting Unbound DNS server...
Aug 24 05:19:42 b package-helper[13708]: /var/lib/unbound/root.key does not exist, copying from /usr/share/dns/root.key
Aug 24 05:19:42 b package-helper[13708]: /var/lib/unbound/root.key has content
Aug 24 05:19:42 b package-helper[13708]: success: the anchor is ok
Aug 24 05:19:42 b unbound[13713]: [13713:0] notice: init module 0: subnet
Aug 24 05:19:42 b unbound[13713]: [13713:0] notice: init module 1: validator
Aug 24 05:19:42 b unbound[13713]: [13713:0] notice: init module 2: iterator
Aug 24 05:19:42 b unbound[13713]: [13713:0] info: start of service (unbound 1.6.7).
Aug 24 05:19:42 b systemd[1]: Started Unbound DNS server.
Aug 24 05:19:42 b systemd[1]: unbound.service: Failed to reset devices.list: Operation not permitted
Aug 24 05:19:43 b systemd[1]: unbound.service: Failed to reset devices.list: Operation not permitted
Aug 24 05:19:43 b systemd[1]: unbound.service: Failed to reset devices.list: Operation not permitted
Aug 24 05:19:54 b systemd[1]: Reloading Unbound DNS server.
Aug 24 05:19:54 b unbound[13713]: [13713:0] info: service stopped (unbound 1.6.7).
Aug 24 05:19:54 b unbound-control[13840]: ok
Aug 24 05:19:54 b unbound[13713]: [13713:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Aug 24 05:19:54 b unbound[13713]: [13713:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
Aug 24 05:19:54 b unbound[13713]: [13713:0] notice: Restart of unbound 1.6.7.
Aug 24 05:19:54 b unbound[13713]: [13713:0] notice: init module 0: subnet
Aug 24 05:19:54 b unbound[13713]: [13713:0] notice: init module 1: validator
Aug 24 05:19:54 b unbound[13713]: [13713:0] notice: init module 2: iterator
Aug 24 05:19:54 b unbound[13713]: [13713:0] info: start of service (unbound 1.6.7).
Aug 24 05:21:24 b systemd[1]: unbound.service: State 'stop-sigterm' timed out. Killing.
Aug 24 05:21:24 b systemd[1]: unbound.service: Killing process 13713 (unbound) with signal SIGKILL.
Aug 24 05:21:24 b systemd[1]: unbound.service: Main process exited, code=killed, status=9/KILL
Aug 24 05:21:24 b systemd[1]: unbound.service: Failed with result 'timeout'.
Aug 24 05:21:24 b systemd[1]: Reload failed for Unbound DNS server.
Aug 24 05:21:24 b systemd[1]: unbound.service: Service hold-off time over, scheduling restart.
Aug 24 05:21:24 b systemd[1]: unbound.service: Scheduled restart job, restart counter is at 1.
Aug 24 05:21:24 b systemd[1]: Stopped Unbound DNS server.
Aug 24 05:21:24 b systemd[1]: unbound.service...

Read more...

tags: added: server-next
Changed in unbound (Ubuntu):
status: New → Triaged
Changed in unbound (Ubuntu):
status: Triaged → Fix Released
Changed in unbound (Ubuntu Bionic):
status: New → Triaged
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Tested from PPA and as expected the reload works, is fast and keeps the PID of the main process alive.

root@b:~# systemctl status unbound
● unbound.service - Unbound DNS server
   Loaded: loaded (/lib/systemd/system/unbound.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2018-08-27 13:26:49 UTC; 41s ago
     Docs: man:unbound(8)
  Process: 20105 ExecStartPre=/usr/lib/unbound/package-helper root_trust_anchor_update (code=exited, status=0/SUCCESS)
  Process: 20101 ExecStartPre=/usr/lib/unbound/package-helper chroot_setup (code=exited, status=0/SUCCESS)
 Main PID: 20109 (unbound)
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/unbound.service
           └─20109 /usr/sbin/unbound -d

Aug 27 13:26:49 b systemd[1]: Stopped Unbound DNS server.
Aug 27 13:26:49 b systemd[1]: unbound.service: Failed to reset devices.list: Operation not permitted
Aug 27 13:26:49 b systemd[1]: Starting Unbound DNS server...
Aug 27 13:26:49 b package-helper[20105]: /var/lib/unbound/root.key has content
Aug 27 13:26:49 b package-helper[20105]: success: the anchor is ok
Aug 27 13:26:49 b unbound[20109]: [20109:0] notice: init module 0: subnet
Aug 27 13:26:49 b unbound[20109]: [20109:0] notice: init module 1: validator
Aug 27 13:26:49 b unbound[20109]: [20109:0] notice: init module 2: iterator
Aug 27 13:26:49 b unbound[20109]: [20109:0] info: start of service (unbound 1.6.7).
Aug 27 13:26:49 b systemd[1]: Started Unbound DNS server.
root@b:~# systemctl reload unbound
root@b:~# systemctl status unbound
● unbound.service - Unbound DNS server
   Loaded: loaded (/lib/systemd/system/unbound.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2018-08-27 13:26:49 UTC; 47s ago
     Docs: man:unbound(8)
  Process: 20154 ExecReload=/usr/sbin/unbound-control reload (code=exited, status=0/SUCCESS)
  Process: 20105 ExecStartPre=/usr/lib/unbound/package-helper root_trust_anchor_update (code=exited, status=0/SUCCESS)
  Process: 20101 ExecStartPre=/usr/lib/unbound/package-helper chroot_setup (code=exited, status=0/SUCCESS)
 Main PID: 20109 (unbound)
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/unbound.service
           └─20109 /usr/sbin/unbound -d

Aug 27 13:27:34 b unbound-control[20154]: ok
Aug 27 13:27:34 b unbound[20109]: [20109:0] info: service stopped (unbound 1.6.7).
Aug 27 13:27:34 b unbound[20109]: [20109:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Aug 27 13:27:34 b unbound[20109]: [20109:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
Aug 27 13:27:34 b unbound[20109]: [20109:0] notice: Restart of unbound 1.6.7.
Aug 27 13:27:34 b unbound[20109]: [20109:0] notice: init module 0: subnet
Aug 27 13:27:34 b unbound[20109]: [20109:0] notice: init module 1: validator
Aug 27 13:27:34 b unbound[20109]: [20109:0] notice: init module 2: iterator
Aug 27 13:27:34 b unbound[20109]: [20109:0] info: start of service (unbound 1.6.7).
Aug 27 13:27:34 b systemd[1]: Reloaded Unbound DNS server.

description: updated
tags: added: bitesize
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

MP: https://code.launchpad.net/~paelzer/ubuntu/+source/unbound/+git/unbound/+merge/353786 is up for review to have a 2nd opinion before moving to SRU processing.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

unbound_1.6.7-1ubuntu2.2_source.changes is in B-unapproved for the SRU Team to review.

Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Alexey, or anyone else affected,

Accepted unbound into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/unbound/1.6.7-1ubuntu2.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in unbound (Ubuntu Bionic):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (7.7 KiB)

Reproducing issue in Bionic as-is:
$ systemctl reload unbound
Job for unbound.service failed because a timeout was exceeded.
See "systemctl status unbound.service" and "journalctl -xe" for details.

Log:
Aug 31 05:36:18 b systemd[1]: Reloading Unbound DNS server.
Aug 31 05:36:18 b unbound[827]: [827:0] info: service stopped (unbound 1.6.7).
Aug 31 05:36:18 b unbound[827]: [827:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Aug 31 05:36:18 b unbound-control[1027]: ok
Aug 31 05:36:18 b unbound[827]: [827:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
Aug 31 05:36:18 b unbound[827]: [827:0] notice: Restart of unbound 1.6.7.
Aug 31 05:36:18 b unbound[827]: [827:0] notice: init module 0: subnet
Aug 31 05:36:18 b unbound[827]: [827:0] notice: init module 1: validator
Aug 31 05:36:18 b unbound[827]: [827:0] notice: init module 2: iterator
Aug 31 05:36:18 b unbound[827]: [827:0] info: start of service (unbound 1.6.7).
Aug 31 05:37:01 b systemd-journald[57]: Forwarding to syslog missed 1 messages.
Aug 31 05:37:48 b systemd[1]: unbound.service: State 'stop-sigterm' timed out. Killing.
Aug 31 05:37:48 b systemd[1]: unbound.service: Killing process 827 (unbound) with signal SIGKILL.
Aug 31 05:37:48 b systemd[1]: unbound.service: Main process exited, code=killed, status=9/KILL
Aug 31 05:37:48 b systemd[1]: unbound.service: Failed with result 'timeout'.
Aug 31 05:37:48 b systemd[1]: Reload failed for Unbound DNS server.
Aug 31 05:37:48 b systemd[1]: unbound.service: Service hold-off time over, scheduling restart.
Aug 31 05:37:48 b systemd[1]: unbound.service: Scheduled restart job, restart counter is at 1.
Aug 31 05:37:48 b systemd[1]: Stopped Unbound DNS server.
Aug 31 05:37:48 b systemd[1]: unbound.service: Failed to reset devices.list: Operation not permitted
Aug 31 05:37:48 b systemd[1]: Starting Unbound DNS server...
Aug 31 05:37:48 b package-helper[1031]: /var/lib/unbound/root.key has content
Aug 31 05:37:48 b package-helper[1031]: success: the anchor is ok
Aug 31 05:37:48 b unbound[1035]: [1035:0] notice: init module 0: subnet
Aug 31 05:37:48 b unbound[1035]: [1035:0] notice: init module 1: validator
Aug 31 05:37:48 b unbound[1035]: [1035:0] notice: init module 2: iterator
Aug 31 05:37:49 b unbound[1035]: [1035:0] info: start of service (unbound 1.6.7).
Aug 31 05:37:49 b systemd[1]: Started Unbound DNS server.

Upgrade to Proposed

# apt install unbound unbound-anchor
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
  libfreetype6
Use 'apt autoremove' to remove it.
The following packages will be upgraded:
  unbound unbound-anchor
2 upgraded, 0 newly installed, 0 to remove and 29 not upgraded.
Need to get 603 kB of archives.
After this operation, 2048 B of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic-proposed/universe amd64 unbound-anchor amd64 1.6.7-1ubuntu2.2 [25.0 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic-proposed/universe amd64 unbound amd64 1.6.7-1ubuntu2.2 [578 kB]
Fe...

Read more...

tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package unbound - 1.6.7-1ubuntu2.2

---------------
unbound (1.6.7-1ubuntu2.2) bionic; urgency=medium

  * d/p/lp-1788622-fix-systemd-reload.patch: Fix hang due to all worker threads
    stopping on reload (LP: #1788622)

 -- Christian Ehrhardt <email address hidden> Mon, 27 Aug 2018 14:12:29 +0200

Changed in unbound (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for unbound has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.