ntp deadlock while exiting and never stop

Bug #1063806 reported by PierreF on 2012-10-08
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ntp (Ubuntu)
High
Unassigned

Bug Description

Software version:

* Description: Ubuntu 12.04.1 LTS
* arch : x86_64
* ntp 1:4.2.6.p3+dfsg-1ubuntu3.1
* libc6 2.15-0ubuntu10.2

We hit a strang behavious on some machine, where ntp refuse to quit if stopped right after startup:

When running the following command: /etc/init.d/ntp start && /etc/init.d/ntp stop
ntp still running and ignoring further kill.
In syslog we can see the stop is effectivly taken in account:

Oct 8 12:45:36 app01 ntpd[16713]: ntpd exiting on signal 15

Sending more "kill 16713" do nothing.

We have the "start and just after stop" behavious on our server, because of https://bugs.launchpad.net/nova/+bug/887162.
Short summary: dhcp server don't reply until dhcp client lost the IP and request new IP using broadcast request:

* (AFAIK) when dhcp don't get answer from DHCP and remove the IP is call a hook which restart ntp (/etc/dhcp/dhclient-exit-hooks.d/ntp)
* dhcp retry just after to grab a new IP and get the IP. Interface goes up, a hook restart one more the ntpd (/etc/network/if-up.d/ntpdate)

After digging further, I reproduced the issue on my local machine by using a simple script which start and kill ntpd until we reach the issue. Once the issue reproduced I attached gdb to the process and showed stack (see gdb-stack-dbg.txt). Looking to this stack, it seems that the issue is syslog function which is not re-entrant. When signal SIGTERM occure while "main thread" called syslog, this cause a deadlock.

I also attached gdb-stack.txt using packaged ntpd (so no debuging symbol for ntpd function).
And I attached ntpd_loop.py : the basic script which start and kill ntpd until it hang. You may need to change the short sleep between start/stop and/or run other resource consuming process on other terminal (I usually do a find /).

Changed in ntp (Ubuntu):
importance: Undecided → High
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ntp (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers