ntp deadlock while exiting and never stop

Bug #1063806 reported by PierreF
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ntp (Ubuntu)
Confirmed
High
Unassigned

Bug Description

Software version:

* Description: Ubuntu 12.04.1 LTS
* arch : x86_64
* ntp 1:4.2.6.p3+dfsg-1ubuntu3.1
* libc6 2.15-0ubuntu10.2

We hit a strang behavious on some machine, where ntp refuse to quit if stopped right after startup:

When running the following command: /etc/init.d/ntp start && /etc/init.d/ntp stop
ntp still running and ignoring further kill.
In syslog we can see the stop is effectivly taken in account:

Oct 8 12:45:36 app01 ntpd[16713]: ntpd exiting on signal 15

Sending more "kill 16713" do nothing.

We have the "start and just after stop" behavious on our server, because of https://bugs.launchpad.net/nova/+bug/887162.
Short summary: dhcp server don't reply until dhcp client lost the IP and request new IP using broadcast request:

* (AFAIK) when dhcp don't get answer from DHCP and remove the IP is call a hook which restart ntp (/etc/dhcp/dhclient-exit-hooks.d/ntp)
* dhcp retry just after to grab a new IP and get the IP. Interface goes up, a hook restart one more the ntpd (/etc/network/if-up.d/ntpdate)

After digging further, I reproduced the issue on my local machine by using a simple script which start and kill ntpd until we reach the issue. Once the issue reproduced I attached gdb to the process and showed stack (see gdb-stack-dbg.txt). Looking to this stack, it seems that the issue is syslog function which is not re-entrant. When signal SIGTERM occure while "main thread" called syslog, this cause a deadlock.

I also attached gdb-stack.txt using packaged ntpd (so no debuging symbol for ntpd function).
And I attached ntpd_loop.py : the basic script which start and kill ntpd until it hang. You may need to change the short sleep between start/stop and/or run other resource consuming process on other terminal (I usually do a find /).

Revision history for this message
PierreF (pierre-fersing) wrote :
Revision history for this message
PierreF (pierre-fersing) wrote :
Revision history for this message
PierreF (pierre-fersing) wrote :
Changed in ntp (Ubuntu):
importance: Undecided → High
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ntp (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.