ntpd crashed with SIGABRT (was: ntp crashes everytime the network goes up or down.)

Bug #1567540 reported by Pavlushka
118
This bug affects 17 people
Affects Status Importance Assigned to Milestone
NTP
Fix Released
High
ntp (Ubuntu)
Fix Released
High
Unassigned
Xenial
Fix Released
High
Christian Ehrhardt 

Bug Description

[Impact]

 * In NTP 4.2.8p4 there are several races that can cause a crash on
   startup or on a bit later but still on startup by DNS querying a
   peer.

 * The crash obviously affects users, especially as it seems - due to
   its racy nature - not appear on most, but severely hamstring some
   other users.

 * The details are a bit blurred, but overall there were four fixes
   upstream that address just this "kind of issue" that seemed to
   surface post 4.2.8p4.

[Test Case]

 * Start NTP (service)

 * Expectation: work

 * Failure: Crash

 * Constraints: this is a race, it seems to appear at <0.1% chance to
   all systems I have (or lower - as I just can say it didn't trigger in
   1000 tests). But that matches other reports. OTOH for some systems it
   seems to trigger >50% which also matches the high amount of crash
   reports (close to 20k now) as referred in comment 43

[Regression Potential]

 * Eventually the change is rather invasive as it changes the locking
   scheme of parts of the code - so there surely is some regression
   potential.

 * Fortunately all of this change is upstream and tested there
   quite heavily. Most of it for a few months already.

 * I tested as good as I could and could neither in code nor in test
   find an obvious weakness, and looking at all the crash reports it is
   about time.

[Other Info]

 * While all study of bugs, upstream changes and tests suggest we
   haven't broken anything, still I have to admit that "on my own" I
   can't confirm that it fixed the bug. So we are really dependent on
   the reporters here that seem to have the kind of hardware where it
   "crashes reliably".

--------

ntp crashes every time the network goes up or down while the system is running and also crashes after booting up without network.
---
ApportVersion: 2.20.1-0ubuntu1
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-03-12 (26 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64 (20160224)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+dfsg-3ubuntu4
PackageArchitecture: amd64
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-17-generic root=UUID=306314bc-efcb-4c2d-b0e9-e05ec92ed0f0 ro
ProcVersionSignature: Ubuntu 4.4.0-17.33-generic 4.4.6
Tags: xenial
Uname: Linux 4.4.0-17-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu1
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-03-12 (31 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64 (20160224)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
PackageArchitecture: amd64
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic root=UUID=306314bc-efcb-4c2d-b0e9-e05ec92ed0f0 ro
ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
Tags: xenial
Uname: Linux 4.4.0-18-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu1
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-04-13 (0 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
PackageArchitecture: amd64
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic root=UUID=13f57794-2e19-4a56-836a-94185bba5ec5 ro quiet splash
ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
Tags: xenial
Uname: Linux 4.4.0-18-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu1
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-04-13 (0 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
PackageArchitecture: amd64
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic root=UUID=13f57794-2e19-4a56-836a-94185bba5ec5 ro quiet splash
ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
Tags: xenial
Uname: Linux 4.4.0-18-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-04-14 (3 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
PackageArchitecture: amd64
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-20-generic root=UUID=b9c0528f-e81f-4b08-9b31-032f14f72ccd ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 4.4.0-20.36-generic 4.4.6
Tags: xenial
Uname: Linux 4.4.0-20-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-04-14 (3 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
PackageArchitecture: amd64
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-20-generic root=UUID=b9c0528f-e81f-4b08-9b31-032f14f72ccd ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 4.4.0-20.36-generic 4.4.6
Tags: xenial
Uname: Linux 4.4.0-20-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-04-14 (63 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
PackageArchitecture: amd64
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-25-generic root=UUID=3aea4570-4011-4247-9636-68317385324d ro
ProcVersionSignature: Ubuntu 4.4.0-25.44-generic 4.4.13
Tags: xenial third-party-packages
Uname: Linux 4.4.0-25-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dialout dip lpadmin mail netdev plugdev sambashare sudo
_MarkForUpload: True

Revision history for this message
In , H-murray (h-murray) wrote :

It's not solid, but I've seen three of these so far.
It crashes ballpark of 1 in 5 tries.

FreeBSD 10.1-RELEASE amd64

I haven't seen any troubles like this before 4.3.33
It crashes before it writes anything to the post-switching log file.

May 14 01:32:20 ted3 ntpd[79529]: switching logging to file /var/log/ntp/ntpd.lo
g
May 14 01:32:20 ted3 kernel: pid 79529 (ntpd), uid 0: exited on signal 11 (core
dumped)

Core was generated by `ntpd'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libmd.so.6...done.
Loaded symbols for /lib/libmd.so.6
Reading symbols from /lib/libm.so.5...done.
Loaded symbols for /lib/libm.so.5
Reading symbols from /lib/libthr.so.3...done.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /lib/libc.so.7...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...done.
Loaded symbols for /libexec/ld-elf.so.1
#0 0x000000080119db43 in sbrk () from /lib/libc.so.7
[New Thread 801c06c00 (LWP 100287/ntpd)]
[New Thread 801c06400 (LWP 100169/ntpd)]
(gdb) bt
#0 0x000000080119db43 in sbrk () from /lib/libc.so.7
#1 0x0000000801199aaf in sbrk () from /lib/libc.so.7
#2 0x0000000801184593 in syscall () from /lib/libc.so.7
#3 0x00000008011a5283 in realloc () from /lib/libc.so.7
#4 0x0000000000437285 in ereallocz (ptr=0x80180a140, newsz=32, priorsz=0,
    zero_init=1) at ../../libntp/emalloc.c:43
#5 0x00000000004399c7 in get_worker_context (c=0x801c42100, idx=0)
    at ../../libntp/ntp_intres.c:982
#6 0x0000000000439665 in blocking_getaddrinfo (c=0x801c42100, req=0x801c1b0c0)
    at ../../libntp/ntp_intres.c:327
#7 0x000000000043a5d0 in blocking_child_common (c=<value optimized out>)
    at ../../libntp/ntp_worker.c:288
#8 0x000000000043c619 in blocking_thread (ThreadArg=0x80180a140)
    at ../../libntp/work_thread.c:663
#9 0x0000000800ed74f5 in pthread_create () from /lib/libthr.so.3
#10 0x0000000000000000 in ?? ()
(gdb)

Revision history for this message
In , Stenn (stenn) wrote :

Hal,

This should also duplicate using -stable, I hope...

Anyway, if you could nose around in the stack frames to try and hone in on this that would be great. You might need to compile without optimization, not sure.

I haven't seen this on my freebsd boxes...

Revision history for this message
In , H-murray (h-murray) wrote :

I'm up to 6 core dumps now. All identical.

I've poked around. It doesn't fail in gdb. (or maybe I just haven't
figured out how to make it fail)

I don't have any good ideas. It could be:
  a bug in ntpd that just happens to get triggered in this case
  a bug in the hardware
  a bug in the OS
  a bug in the tool chain
  an operator error

I recompiled things. It gets the same error and objdump of both
versions is identical.

Here is something fishy:
#4 0x0000000000437285 in ereallocz (ptr=0x80180a140, newsz=32, priorsz=0,
    zero_init=1) at ../../libntp/emalloc.c:43
get_worker_context is growing the array of pointers to worker contexts.
I think it's growing it from empty. If so, ptr should be NULL.
The version in memory is NULL.

That address comes from several layers back the call stack:
#8 0x000000000043c619 in blocking_thread (ThreadArg=0x80180a140)
    at ../../libntp/work_thread.c:663

I'll look carefully at the compiled code after some sleep.

Revision history for this message
In , Stenn (stenn) wrote :

Hal,

Have you learned anything new about this?

Revision history for this message
In , H-murray (h-murray) wrote :

> Hal,
> Have you learned anything new about this?

Nope. I'm stumped.

Revision history for this message
In , john.marshall@riverwillow.com.au (john.marshallriverwillow.com.au) wrote :
Download full text (3.3 KiB)

FreeBSD 10.2-RC3
   ntpd 4.3.68

Hal, thanks for mentioning this on the mailing list. I should have spoken up sooner. I've been seeing this for a LONG time (2-3 years?) but I workaround by replacing hostnames with IP addresses in the config file 'server' statements and then forget. Every several months, I look at the config, scratch my head, put the domain names back in, and then remember!

I have been seeing this ONLY on an Intel Xeon E5-2603 (the biggest of our machines). It has two CPU's each with 4 cores and, to me, this smells like a thread problem. This server is now running FreeBSD 10.2-RC3 but I have seen this same problem on this server on earlier versions as well (definitely FreeBSD 10.1 and 9, not sure about 8).

Just now I edited the config file to use domain names for server config and produced this dump. Like Hal, this doesn't happen EVERY time ntpd starts but, for me, it is the rule rather than the exception.

rwsrv08# gdb /usr/sbin/ntpd /ntpd.core
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
...
Core was generated by `ntpd'.
...
#0 0x0000000800678fe2 in _rtld_atfork_post () from /libexec/ld-elf.so.1
[New Thread 801c07400 (LWP 101340/<unknown>)]
[New Thread 801c06400 (LWP 100229/<unknown>)]
(gdb) bt
#0 0x0000000800678fe2 in _rtld_atfork_post () from /libexec/ld-elf.so.1
#1 0x0000000800679349 in _rtld_atfork_post () from /libexec/ld-elf.so.1
#2 0x00000008006749f4 in _rtld_is_dlopened () from /libexec/ld-elf.so.1
#3 0x0000000800673e3a in _rtld_is_dlopened () from /libexec/ld-elf.so.1
#4 0x0000000800670ea0 in dlopen () from /libexec/ld-elf.so.1
#5 0x00000008013e9025 in _nsdbtaddsrc () from /lib/libc.so.7
#6 0x00000008013e37e4 in _nsyyparse () from /lib/libc.so.7
#7 0x00000008013e96a1 in nsdispatch () from /lib/libc.so.7
#8 0x00000008013cd011 in getservbyname () from /lib/libc.so.7
#9 0x00000008013ccf19 in getservbyname () from /lib/libc.so.7
#10 0x00000008013c9a33 in getaddrinfo () from /lib/libc.so.7
#11 0x00000008013c7358 in getaddrinfo () from /lib/libc.so.7
#12 0x00000000004382c6 in blocking_getaddrinfo ()
#13 0x0000000000439190 in blocking_child_common ()
#14 0x000000000043b3b9 in blocking_thread ()
#15 0x00000008010bd7d5 in pthread_create () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
(gdb) q

When this happens, syslog shows...

Aug 18 09:30:24 rwsrv08 ntpd[17587]: ntpd 4.3.68@1.2483-o Fri Aug 7 02:03:11 UTC 2015 (1): Starting
Aug 18 09:30:24 rwsrv08 ntpd[17587]: Command line: /usr/sbin/ntpd -g -w 120 -N -c /data/ntpd/ntp.conf -p /var/run/ntpd.pid
Aug 18 09:30:25 rwsrv08 ntpd[17588]: proto: precision = 1.118 usec (-20)
Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen and drop on 0 v6wildcard [::]:123
Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen and drop on 1 v4wildcard 0.0.0.0:123
Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen normally on 2 GFNX 203.58.93.40:123
Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen normally on 3 GFNX [2001:8000:1000:1801::5001]:123
Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen normally on 4 lo0 [::1]:123
Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listen normally on 5 lo0 127.0.0.1:123
Aug 18 09:30:25 rwsrv08 ntpd[17588]: Listening on routing socket on fd #26 for interface...

Read more...

Revision history for this message
In , Burnicki (burnicki) wrote :

I've just set up a machine with FreeBSD 10.2-RELEASE from scratch, and I don't see any problem with the version of ntpd shipped with FreeBSD.

It's labelled "4.2.8p3-a", but I don't know what the "-a" stands for.

The only problem I encountered is that I had to remove "nopeer" from the "restrict" lines in the shipped ntp.conf file if I wanted to use the "pool" directive, since otherwise no pool servers were added. However, the comments in bug 2152 say this is OK.

Digging through bugzilla I found a few issues where ntpd didn't work correctly due to memory restrictions:

Bug 2362 - mlockall() breaks DNS resolution when using the "files" service in nsswitch.conf
http://bugs.ntp.org/show_bug.cgi?id=2362

Bug 2643 - Server crash with pool directive
http://bugs.ntp.org/show_bug.cgi?id=2643

Bug 2817 - Stop locking ntpd into memory by default
http://bugs.ntp.org/show_bug.cgi?id=2817

For me it smells like all this is somehow related. Can you try if the problem still persists if you add an "rlimit memlock 128" line (or even a higher number) to ntp.conf?

If I find some time I'll try to build ntp-dev on my FreeBSD machine and see if I can duplicate the problem.

On the other hand, Hal reported on the hackers@ list that he also saw this on Fedora 22. I've also installed that Linux version a few days ago and played a bit with it, but didn't encounter any problems with the shipped ntpd, either.

Revision history for this message
In , john.marshall@riverwillow.com.au (john.marshallriverwillow.com.au) wrote :

Martin,

Thanks for looking at this. I'd like to stress that I'm only seeing this on a system with more memory (16GB) and more cores (8) than we have anywhere else.

As you suggested, I tried adding "rlimit memlock 128" to ntp.conf but it made no difference. I then tried "rlimit memlock 256" and it also made no difference.

I am now using:
  FreeBSD 10.2-RELEASE-p1
     ntpd 4.3.70

When ntpd fails, the dump backtrace looks like what I pasted in Comment #5 or like the following. The three backtraces (Hal's + my two) diverge after the blocking_getaddrinfo().

(gdb) bt
#0 0x00000008013ed631 in __h_errno_set () from /lib/libc.so.7
#1 0x00000008013bf90e in __res_vinit () from /lib/libc.so.7
#2 0x00000008013c33b0 in getaddrinfo () from /lib/libc.so.7
#3 0x00000008013e39ef in nsdispatch () from /lib/libc.so.7
#4 0x00000008013c20ec in getaddrinfo () from /lib/libc.so.7
#5 0x000000000043435a in blocking_getaddrinfo ()
#6 0x00000000004352f0 in blocking_child_common ()
#7 0x0000000000437159 in blocking_thread ()
#8 0x00000008010b77d5 in pthread_create () from /lib/libthr.so.3
#9 0x0000000000000000 in ?? ()

Since you mentioned nsswitch.conf in Comment #6, I note that all our servers have "hosts: dns" in nsswitch.conf.

Revision history for this message
In , H-murray (h-murray) wrote :

It (or something very similar) also happens on Linux.

I tried mail to hackers, but the discussion ended up here, so I'll copy the data from that message.
  http://lists.ntp.org/pipermail/hackers/2015-August/007156.html

A few days ago, I tried to add a pool line to a server and got a strange
error message.

14 Aug 13:58:07 ntpd[12618]: error resolving pool 0.fedora.pool.ntp.org:
System error (-11)

It tries again in a few minutes and gets the same error. ...

EAI_SYSTEM (System Error) says look in errno. I added some debugging
printout. errno is always EAGAIN. More printout says it's taking ~15 ms
which is reasonable for a packet exchange over my DSL line. I added a loop
to try a few times. It always gets the same error.

I changed the server lines of local systems from names to IP Addresses. Now
I get:
  16 Aug 02:37:41 ntpd[21377]: fatal out of memory (32 bytes)
That's from the DNS thread creation code getting ready to look up the pool
info.

That's on a 64 bit Fedora 22 system. I got the same sort of thing on another
Fedora box and a Debian box so I'm pretty sure it isn't a simple flaky
hardware box. (But all the problems have been on the same type of hardware,
so it might be a design bug. Dell Optiplex FX 160, Intel Atom 330.)

Revision history for this message
In , H-murray (h-murray) wrote :

Harlan pointed me at a wonderful blog post:
  https://blog.crashed.org/dont-backout
Thanks.

Quick summary: Bug in FreeBSD page fault handler

That solves the FreeBSD half of this bug. I'll submit a new one
for the Linux variant.

Harlan:
  I don't see anything like UPSTREAM in the resolved-at options.
  I'll let you sort out how to mark this as no-longer-open.

Revision history for this message
In , john.marshall@riverwillow.com.au (john.marshallriverwillow.com.au) wrote :

(In reply to comment #9)
> Quick summary: Bug in FreeBSD page fault handler
>
> That solves the FreeBSD half of this bug.

Thanks for posting this Hal but you don't reference a patch, I can't find any reference in that blog post to a patch, and my attempts at trawling FreeBSD commit logs have yielded no results (my fault, no doubt). It would be great to close off this bug with a pointer to the FreeBSD pager patch that fixes this. Any clues?

Revision history for this message
In , H-murray (h-murray) wrote :

> Thanks for posting this Hal but you don't reference a patch,
> I can't find any reference in that blog post to a patch ...
> Any clues?

Nope. I'm not plugged into the FreeBSD ecosystem.

I expect there would be something in their bug database
or mailing lists.

Revision history for this message
In , john.marshall@riverwillow.com.au (john.marshallriverwillow.com.au) wrote :

(In reply to comment #11)
Hal, I've sent email to the author of the blog post to which you referred in Comment #9 and plan to post details of any response here. If I can get a pointer to a FreeBSD patch, I'll apply that, test and report.

I think it's premature to suggest that this bug be closed without seeing if there is, actually, a fix for this problem. Peter may even have hit a different crash to the one we are seeing.

Revision history for this message
In , john.marshall@riverwillow.com.au (john.marshallriverwillow.com.au) wrote :

Created attachment 1325
Do mlockall before threads

I exchanged email with the author of the blog post referred to in Comment #9. He suggested that I build ntpd with HAVE_MLOCKALL disabled and test. I had no problem at all with mlockall() disabled.

He also suggested that, notwithstanding potential problems with FreeBSD's mlockall(), running mlockall() in one thread while allocating memory in another thread is probably unwise anyway; and that calling mlockall() before starting any threads may be preferable.

In the attached patch (against 4.3.70), I moved the "if (do_memlock)" block in ntpd.c up to an earlier point, after the fork() and just after the RLIMITs are set.

WARNING: I do not *know* ntpd.c, so this needs careful scrutiny by someone who does but..."It works for me"! (on FreeBSD 10.2-RELEASE with a patched 4.3.70)

Revision history for this message
In , H-max-3 (h-max-3) wrote :

I think I ran into the same issue (realloc() returning an error when being asked for 32 bytes somewhere down the callstack from blocking_getaddrinfo()) with 4.2.8p3 on SUSE, but with a slightly different behaviour:

The error in realloc() only happens when using ntpq to add a server to a running ntpd that does not have any servers yet. When a server is given on the command line or in ntp.conf, ntpd starts fine and more servers can be added at runtime.

I can confirm that disabling mlockall() as suggested in comment 13 prevents the call.

Applying the patch from comment 13 makes it even worse: Now the error also happens when a server is specified at startup on the command line or in ntp.conf.

Revision history for this message
In , H-max-3 (h-max-3) wrote :

It looks like I am rather suffering from bug 2817. Sorry for the noise here.

But while being there, I found that the proposed patch from comment 13 is at least incomplete, because it places the block that depends on do_memlock above getconfig(), which is the only place where it can get changed from 1 to 0, so at the new location it will always be 1.

So, if the do_memlock block needs to be moved up, at least the getconfig() line should be moved with it, but I have not checked whether there are other cross-dependencies to all the init_* stuff that happens between those two locations.

Revision history for this message
Pavlushka (pavelsayekat) wrote : .etc.apparmor.d.usr.sbin.ntpd.txt

apport information

tags: added: apport-collected xenial
description: updated
Revision history for this message
Pavlushka (pavelsayekat) wrote : Dependencies.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : JournalErrors.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : KernLog.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : ProcEnviron.txt

apport information

Revision history for this message
Nish Aravamudan (nacc) wrote : Re: ntp crashes everytime the network goes up or down.

Thank you for filing this bug report! None of the logs provide any of the crashing output, would it be possible to provide a bit more detail as to the crash?

Changed in ntp (Ubuntu):
status: New → Incomplete
Revision history for this message
Pavlushka (pavelsayekat) wrote :
Pavlushka (pavelsayekat)
Changed in ntp (Ubuntu):
status: Incomplete → New
Revision history for this message
In , Smallm (smallm) wrote :

Created attachment 1399
protect dnsworker_contexts with a mutex

Revision history for this message
In , Smallm (smallm) wrote :

I believe this (at least the original problem as described) is caused by the dynamic array pointed to by dnsworker_contexts in ntp_intres.c being potentially realloced from multiple threads with no synchronization objects used.

I encounter an almost identical stack trace from a core file created by a segmentation fault when running ntps (actually, ntpdig from the ntpsec fork but your code here has not diverged). I was able to get the seg fault twice in 40 runs passing ntpdate two server names on the command line. Here was my stack trace:

#0 alloc_dnsworker_context (idx=<optimized out>)
    at /usr/include/x86_64-linux-gnu/bits/string3.h:85
#1 get_worker_context (c=0x11a0750, idx=2) at ../../libntp/ntp_intres.c:911
#2 0x000000000040987d in blocking_getaddrinfo (c=0x11a0750, req=0x11a0ae0)
    at ../../libntp/ntp_intres.c:286
#3 0x000000000040a413 in blocking_child_common (c=0x11a0750)
    at ../../libntp/ntp_worker.c:283
#4 0x000000000040b319 in blocking_thread (ThreadArg=<optimized out>)
    at ../../libntp/work_thread.c:667
#5 0x00007fcedfe99e9a in ?? ()
#6 0x0000000000000000 in ?? ()

Looking at the instructions in frame #0 I saw that the register representing dnsworker_contexts had a 0 (NULL) value.

883 dnsworker_contexts[idx] = emalloc_zero(worker_context_sz);
   0x0000000000408d13 <+67>: mov $0x1,%ecx
   0x0000000000408d18 <+72>: xor %edx,%edx
   0x0000000000408d1a <+74>: mov $0x18,%esi
   0x0000000000408d1f <+79>: xor %edi,%edi
   0x0000000000408d21 <+81>: callq 0x4079e0 <ereallocz>
   0x0000000000408d26 <+86>: mov %rax,(%r12)
   0x0000000000408d2a <+90>: mov 0x20ec17(%rip),%rax # 0x617948 <dnsworker_contexts>
   0x0000000000408d31 <+97>: mov (%rax,%rbx,8),%rax
(gdb) p $rbx
$16 = 2
(gdb) p $rax
$17 = 0

Couldn't figure out how that could come about but noticed that I got here from a worker thread and that dnsworker_contexts is realloced in get_worker_context, so potentially pointed somewhere else. That should have some kind of lock shouldn't it?

When I run with the attached patch protecting that path with a mutex I no longer see the seg faults. Sorry, I did my testing with ntpsec because of work but I think it applies equally to you. I redid the patch off your master branch so it would apply cleanly for you in case you want to test with this.

Revision history for this message
In , Stenn (stenn) wrote :

Comment on attachment 1325
Do mlockall before threads

Pearly, thoughts?

Revision history for this message
In , Stenn (stenn) wrote :

Comment on attachment 1399
protect dnsworker_contexts with a mutex

Pearly, thoughts?

Revision history for this message
In , Stenn (stenn) wrote :

Mike,

Thanks for the patch - I hope we can get it reviewed soon.

Revision history for this message
In , H-murray (h-murray) wrote :

Mike: Thanks for tracking this down. I think this explains all the problems.

I don't think the patch is good enough.

You also need a lock on read references to dnsworker_contexts. The only
other reference is a few lines below and in a subroutine called from there.
I suggest moving the lock to the top of get_worker_context and the
unlock to the bottom. (and adding a assumes-lock comment to the top of
alloc_dnsworker_context)

Revision history for this message
Pavlushka (pavelsayekat) wrote : .etc.apparmor.d.usr.sbin.ntpd.txt

apport information

description: updated
Revision history for this message
Pavlushka (pavelsayekat) wrote : Dependencies.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : JournalErrors.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : KernLog.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : ProcEnviron.txt

apport information

Revision history for this message
In , Smallm (smallm) wrote :

Ack, that was careless of me. Started fixing it the way you suggested, but I'm wondering if something more major is needed. Even if I spread out the locks within get_worker_context() to top and bottom, it would still be giving out a pointer into the array that realloc can relocate. Return a copy of the struct? Does anyone else have ideas for this module (I thought I saw a comment in another CR to that effect)? I'm really very bad at multi-threaded coding.

Also, I guess a real patch needs to consider Windows.

Revision history for this message
In , H-murray (h-murray) wrote :

There is probably another copy of this problem in the other direction.

There are two places where info gets queued up and passed from thread
to thread. One is when the main thread tells the worker thread(s) what
to do. The other is when a worker thread is telling the main thread
an answer.

Looks like the other one is reserve_dnschild_ctx

I suggest folding alloc_dnsworker_context into get_worker_context
It's only called from one place and it will be easier to make
sure the locks are right without that extra layer. It's only a few
lines of code. The abstraction layer isn't helping anything.

It might be cleaner to move the definition of dnsworker_contexts
and dnsworker_contexts_alloc into get_worker_context. The idea
is to make sure the lock covers all uses.
  static xxx
I think all c compilers support that.

> Even if I spread out the locks within get_worker_context()
> to top and bottom, it would still be giving out a pointer
> into the array that realloc can relocate.

The thing that is getting realloc-ed is the array holding pointers
to blocks. The individual block never gets realloc-ed. The lock
only needs to protect the array. It's only referenced within
that routine. (aside from the alloc which I suggested moving)

Revision history for this message
Pavlushka (pavelsayekat) wrote : Re: ntp crashes everytime the network goes up or down.
description: updated
Revision history for this message
Pavlushka (pavelsayekat) wrote : .etc.apparmor.d.usr.sbin.ntpd.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : Dependencies.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : JournalErrors.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : KernLog.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : ProcEnviron.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : .etc.apparmor.d.usr.sbin.ntpd.txt

apport information

description: updated
Revision history for this message
Pavlushka (pavelsayekat) wrote : Dependencies.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : JournalErrors.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : KernLog.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : ProcEnviron.txt

apport information

Revision history for this message
In , Stenn (stenn) wrote :

*** This bug has been marked as a duplicate of bug 2954 ***

Revision history for this message
In , Perlinger (perlinger) wrote :

(In reply to comment #24)
>
> *** This bug has been marked as a duplicate of bug 2954 ***

That was bit early -- my fault. It is *not* exactly a dup of 2954, but related -- that is, it is also a race condition in the async/threaded resolver code.

I think the lock in the latest patch does not protect all data races here, but I'm still digging.

Revision history for this message
In , Perlinger (perlinger) wrote :

Harlan, the repo is in

  psp.ntp.org:~perlinger/ntp-stable-2831

compiled and run with

  linux/x64 --with-threads (threading resolver)
  linux/x64 --without-threads (forking resolver)
  Windows7/x64/VS2008 (threading resolver)

Hal, Mike, good catch. Only the proposed lock falls a bit short. You have to interlock all access to the global table, not just the realloc() call.

And using pthread_mutex_t is not so easy with Windows, but we all knew that ;) I used a semaphore (again) since there is already a suitable wrapper.

Revision history for this message
In , Stenn (stenn) wrote :

Hal,

Thanks for the report. John, Pearly, et al, thanks for your work on this.

Pearly's fix is STAGED for 4.2.8p7.

Revision history for this message
Pavlushka (pavelsayekat) wrote : .etc.apparmor.d.usr.sbin.ntpd.txt

apport information

description: updated
Revision history for this message
Pavlushka (pavelsayekat) wrote : Dependencies.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : JournalErrors.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : KernLog.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : ProcEnviron.txt

apport information

description: updated
Revision history for this message
Pavlushka (pavelsayekat) wrote : .etc.apparmor.d.usr.sbin.ntpd.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : Dependencies.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : JournalErrors.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : KernLog.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : ProcEnviron.txt

apport information

Revision history for this message
In , Stenn (stenn) wrote :

Hal,

Thanks - please mark this bug as VERIFIED or IN_PROGRESS, as appropriate.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Re: ntp crashes everytime the network goes up or down.

From the dumps

 à¦à¦ªà§à¦°à¦¿ 07 15:51:02 nowhere-6 ntpd[6197]: work_thread.c:271: INSIST(((void *)0) != req) failed
 à¦à¦ªà§à¦°à¦¿ 07 15:51:02 nowhere-6 ntpd[6197]: exiting (due to assertion failure)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The system also looks as it is in various problems - the kernel often reports vblank issues and dumps traces, also it seems that network in general or at least towards the configured NTP and DNS seems to be down.

Could you share a bit about the configuration related to that (networking/ntp/dns) - as this might help to recreate the case.

Changed in ntp (Ubuntu):
status: New → Incomplete
Revision history for this message
Pavlushka (pavelsayekat) wrote : .etc.apparmor.d.usr.sbin.ntpd.txt

apport information

tags: added: third-party-packages
description: updated
Revision history for this message
Pavlushka (pavelsayekat) wrote : Dependencies.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : JournalErrors.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : KernLog.txt

apport information

Revision history for this message
Pavlushka (pavelsayekat) wrote : ProcEnviron.txt

apport information

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Re: ntp crashes everytime the network goes up or down.

Hi,
thanks for the extra info.
But that is just "more of the same".

We will have to pass the stage from "agreeing something is wrong" to "understanding or at least reproducing what is wrong".

To help us getting closer to recreation of the issue it would help if you could share:
1. your /etc/ntp.conf
2. Output of "route -n" and "ifconfig -a"
3. A description of your network setup and what might be special - like:
  - are you heavily firewalled
  - often disconnected
  - or anything else?

BTW - the likely reason it crashes everytime your net goes up/down is because there is a hook to refresh ntp time once a new connection is available (ifup/ifdown). But that is just what triggers it - we need to understand why it crashes. Another chance could be - as seen in your log - that there are many issues resolving dns names and othe networking related things - maybe after "too much" of that ntp runs into a bug.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I have found many similar bug, so setting to confirmed.
Also trying to get the active reporters on this one bug to have one place to share setup info to get it reproducible.

summary: - ntp crashes everytime the network goes up or down.
+ ntpd crashed with SIGABRT (was: ntp crashes everytime the network goes
+ up or down.)
Changed in ntp (Ubuntu):
status: Incomplete → Confirmed
importance: Undecided → High
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Debugged and searched with that extra info - found the (very likely) cause and fix:
  http://bugs.ntp.org/show_bug.cgi?id=2954
  http://bugs.ntp.org/show_bug.cgi?id=2831

It is essentially a race around locking which is why at the same time it can:
- occur often in the field
- never showed up in testing

We need to merge the newest version from Debian to fix in Yakkety - yet we have quite a few dependencies that we wanted to put into the next merge so this needs some evaluation.

For Xenial we need to discuss if we can take the stable branch or if we have to backport juts the fix (complex and a lot of dependencies)

Changed in ntp (Ubuntu):
status: Confirmed → Triaged
Changed in ntp:
importance: Unknown → High
status: Unknown → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

TL;DR

The NTP Code around this is simplified:
1. Complex thing eventually setting ret to next element
2. assert ret!=NULL <-- this is what our bug reports hit

"Complex thing" was completely rewritten in the referred patch set to close the race window that it had.

So no matter these NTP bugs refer to slightly other ways of this issue to surface it is very likely the same "race".

Revision history for this message
Pavlushka (pavelsayekat) wrote :
Revision history for this message
Pavlushka (pavelsayekat) wrote :
Revision history for this message
Pavlushka (pavelsayekat) wrote :
Revision history for this message
Pavlushka (pavelsayekat) wrote :

I used to use a PPOE connection through ethernet/adsl router, and there the issue was more frequent. ANd no, I am not heavily firewalled, the dlink 2750U supposed to has its own firewall, and in my system ufw.

Revision history for this message
Pavlushka (pavelsayekat) wrote :

And it was very unlucky that I tried mtrace on ntp but it didn't crashed then but it crashes on boot or when the trace is not running, sounds funny.

Revision history for this message
Pavlushka (pavelsayekat) wrote :

And now I am using a 3g modem for internet but still ntp crashes sometime.

Revision history for this message
Sudhir Reddy (t-sudhirkumar) wrote : Re: [Bug 1567540] Re: ntpd crashed with SIGABRT (was: ntp crashes everytime the network goes up or down.)
Download full text (6.2 KiB)

Yes.. this issue is seen once a while. Issue is occurring more often when I
upgraded..

On Mon, Jun 27, 2016 at 9:38 PM, Pavlushka <email address hidden> wrote:

> And now I am using a 3g modem for internet but still ntp crashes
> sometime.
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (1577292).
> https://bugs.launchpad.net/bugs/1567540
>
> Title:
> ntpd crashed with SIGABRT (was: ntp crashes everytime the network goes
> up or down.)
>
> Status in NTP:
> Fix Released
> Status in ntp package in Ubuntu:
> Triaged
>
> Bug description:
> ntp crashes every time the network goes up or down while the system is
> running and also crashes after booting up without network.
> ---
> ApportVersion: 2.20.1-0ubuntu1
> Architecture: amd64
> CurrentDesktop: XFCE
> DistroRelease: Ubuntu 16.04
> InstallationDate: Installed on 2016-03-12 (26 days ago)
> InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64
> (20160224)
> NtpStatus: ntpq: read: Connection refused
> Package: ntp 1:4.2.8p4+dfsg-3ubuntu4
> PackageArchitecture: amd64
> ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-17-generic
> root=UUID=306314bc-efcb-4c2d-b0e9-e05ec92ed0f0 ro
> ProcVersionSignature: Ubuntu 4.4.0-17.33-generic 4.4.6
> Tags: xenial
> Uname: Linux 4.4.0-17-generic x86_64
> UpgradeStatus: No upgrade log present (probably fresh install)
> UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
> _MarkForUpload: True
> ---
> ApportVersion: 2.20.1-0ubuntu1
> Architecture: amd64
> CurrentDesktop: XFCE
> DistroRelease: Ubuntu 16.04
> InstallationDate: Installed on 2016-03-12 (31 days ago)
> InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64
> (20160224)
> NtpStatus: ntpq: read: Connection refused
> Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
> PackageArchitecture: amd64
> ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic
> root=UUID=306314bc-efcb-4c2d-b0e9-e05ec92ed0f0 ro
> ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
> Tags: xenial
> Uname: Linux 4.4.0-18-generic x86_64
> UpgradeStatus: No upgrade log present (probably fresh install)
> UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
> _MarkForUpload: True
> ---
> ApportVersion: 2.20.1-0ubuntu1
> Architecture: amd64
> CurrentDesktop: XFCE
> DistroRelease: Ubuntu 16.04
> InstallationDate: Installed on 2016-04-13 (0 days ago)
> InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64
> (20160412)
> NtpStatus: ntpq: read: Connection refused
> Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
> PackageArchitecture: amd64
> ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic
> root=UUID=13f57794-2e19-4a56-836a-94185bba5ec5 ro quiet splash
> ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
> Tags: xenial
> Uname: Linux 4.4.0-18-generic x86_64
> UpgradeStatus: No upgrade log present (probably fresh install)
> UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
> _MarkForUpload: True
> ---
> ApportVersion: 2.20.1-0ubuntu1
> Architecture: amd64
> CurrentDesktop: XFCE
> DistroRelease: Ubuntu 16.04
> InstallationDate: I...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ntp (Ubuntu Xenial):
status: New → Confirmed
Changed in ntp (Ubuntu Xenial):
importance: Undecided → High
Changed in ntp (Ubuntu):
assignee: nobody → ChristianEhrhardt (paelzer)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - Merge that will fix this in the development release created and entered review processing.
Once that is completed we have to think about SRU into Xenial (needs a rather complex backport from upstream)

Revision history for this message
Bob Jones (r-a-n-d-o-m-d-e-v-4+ubuntu) wrote :

Have you finished "thinking about" fixing the broken NTP in Xenial yet ? Its becoming a bit of a joke to be without such a key package (let alone the hacks that have had to be put in place to keep server time accurate !).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (6.6 KiB)

Hi,
the fix has to go to the development release first (Yakkety).
The merge is waiting for a review quite a while already.

Thanks you for the push on this, I'll take this as an opportunity to raise
the priority of the merge.

Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Wed, Jul 13, 2016 at 6:30 PM, Bob Jones <
<email address hidden>> wrote:

> Have you finished "thinking about" fixing the broken NTP in Xenial yet ?
> Its becoming a bit of a joke to be without such a key package (let alone
> the hacks that have had to be put in place to keep server time accurate
> !).
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1567540
>
> Title:
> ntpd crashed with SIGABRT (was: ntp crashes everytime the network goes
> up or down.)
>
> Status in NTP:
> Fix Released
> Status in ntp package in Ubuntu:
> Triaged
> Status in ntp source package in Xenial:
> Confirmed
>
> Bug description:
> ntp crashes every time the network goes up or down while the system is
> running and also crashes after booting up without network.
> ---
> ApportVersion: 2.20.1-0ubuntu1
> Architecture: amd64
> CurrentDesktop: XFCE
> DistroRelease: Ubuntu 16.04
> InstallationDate: Installed on 2016-03-12 (26 days ago)
> InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64
> (20160224)
> NtpStatus: ntpq: read: Connection refused
> Package: ntp 1:4.2.8p4+dfsg-3ubuntu4
> PackageArchitecture: amd64
> ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-17-generic
> root=UUID=306314bc-efcb-4c2d-b0e9-e05ec92ed0f0 ro
> ProcVersionSignature: Ubuntu 4.4.0-17.33-generic 4.4.6
> Tags: xenial
> Uname: Linux 4.4.0-17-generic x86_64
> UpgradeStatus: No upgrade log present (probably fresh install)
> UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
> _MarkForUpload: True
> ---
> ApportVersion: 2.20.1-0ubuntu1
> Architecture: amd64
> CurrentDesktop: XFCE
> DistroRelease: Ubuntu 16.04
> InstallationDate: Installed on 2016-03-12 (31 days ago)
> InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64
> (20160224)
> NtpStatus: ntpq: read: Connection refused
> Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
> PackageArchitecture: amd64
> ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic
> root=UUID=306314bc-efcb-4c2d-b0e9-e05ec92ed0f0 ro
> ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
> Tags: xenial
> Uname: Linux 4.4.0-18-generic x86_64
> UpgradeStatus: No upgrade log present (probably fresh install)
> UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
> _MarkForUpload: True
> ---
> ApportVersion: 2.20.1-0ubuntu1
> Architecture: amd64
> CurrentDesktop: XFCE
> DistroRelease: Ubuntu 16.04
> InstallationDate: Installed on 2016-04-13 (0 days ago)
> InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64
> (20160412)
> NtpStatus: ntpq: read: Connection refused
> Package: ntp 1:4.2.8p4+dfsg-3ubuntu5
> PackageArchitecture: amd64
> ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic
> root=UUID=13f57794-2e19-4a56-836a-94185bba5ec5 ro quiet splash
> ProcV...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (5.2 KiB)

This bug was fixed in the package ntp - 1:4.2.8p8+dfsg-1ubuntu1

---------------
ntp (1:4.2.8p8+dfsg-1ubuntu1) yakkety; urgency=medium

  [ Christian Ehrhardt ]
  * Merge from Debian testing. Remaining changes:
    + debian/rules: enable debugging. Asked debian to add this in bug #643954.
    + debian/rules, debian/ntp.dirs, debian/source_ntp.py: Add apport hook.
    + debian/control: Add Suggests on apparmor.
    + debian/source_ntp.py: Add filter on AppArmor profile names to prevent
      false positives from denials originating in other packages
    + debian/ntpdate.if-up: Fix interaction with openntpd. Stop ntp before
      running ntpdate when an interface comes up, then start again afterwards.
    + debian/ntp.init, debian/rules: Only stop when entering single user mode,
      don't use /var/lib/ntp/ntp.conf.dhcp if /etc/ntp.conf is newer - it can
      get stale. Patch by Simon Déziel.
    + debian/ntp.conf, debian/ntpdate.default: Change default server to
      ntp.ubuntu.com.
    + debian/control: Add bison to Build-Depends (for ntpd/ntp_parser.y).
    + Extend PPS support
      - debian/README.Debian: Add a PPS section to the README.Debian
      - debian/ntp.conf: Add some configuration examples from the offical
        documentation.
    + SECURITY UPDATE: NTP statsdir cleanup cronjob insecure (LP: #1528050)
      - debian/ntp.cron.daily: fix security issues, patch thanks to halfdog!
      - CVE-2016-0727
    + Merge also contains an upstream fix that solves (LP: #1567540)
  * Added changes
    + match Ubuntu packages now that Debian has ntp apparmor accepted in
      d/control for Apparmor conflicts/replaces
    + d/apparmor-profile add samba winbindd pipe (LP: #1582767)
  * Drop Changes:
    + Add enforcing AppArmor profile (accepted in Debian):
      - debian/control: Add Conflicts/Replaces on apparmor-profiles.
      - debian/control: Add Suggests on apparmor.
      - debian/control: Build-Depends on dh-apparmor.
      - add debian/apparmor-profile*.
      - debian/ntp.dirs: Add apparmor directories.
      - debian/rules: Install apparmor-profile and apparmor-profile.tunable.
      - debian/source_ntp.py: Add filter on AppArmor profile names to prevent
        false positives from denials originating in other packages.
      - debian/README.Debian: Add note on AppArmor.
    + Add PPS support (accepted in Debian)
      - debian/control: Add Build-Depends on pps-tools
    + debian/apparmor-profile: allow 'rw' access to /dev/pps[0-9]* devices.
    + d/p/fix_local_sync.patch: fix local clock sync (fixed upstream)
    + debian/patches/ntpdate-fix-lp1526264.patch (fixed upstream):
      - Add Alfonso Sanchez-Beato's patch for fixing the cannot correct dates in
        the future bug
    + debian/apparmor-profile: adjust to handle AF_UNSPEC with dgram and stream
    + dropping previous ubuntu security patches/fixes that have been upstreamed
      in 4.2.8p6: CVE-2015-7973, CVE-2015-7975, CVE-2015-7976, CVE-2015-7977,
      CVE-2015-7978, CVE-2015-7979, CVE-2015-8138, CVE-2015-8158
    + dropping previous ubuntu security patches/fixes that have been upstreamed
      in 4.2.8p7: CVE-2016-1548, CVE-2016-1550, CVE-2016-2516, CVE-2016-2518...

Read more...

Changed in ntp (Ubuntu):
status: Triaged → Fix Released
Changed in ntp (Ubuntu):
assignee: ChristianEhrhardt (paelzer) → nobody
Changed in ntp (Ubuntu Xenial):
assignee: nobody → ChristianEhrhardt (paelzer)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Looking through the crashes this seems usually one of the two:
http://bugs.ntp.org/show_bug.cgi?id=2831 (as mentioned before)
http://bugs.ntp.org/show_bug.cgi?id=2954 (related)

There was a bit back and forth on what of the two is a dop to the other.
Eventually 2954 has a fix I should be able to backport rather easily.
Eventually 2954 has the fix, but 2831 later on got reopened and became an extension of the old fix.

This is a rather invasive patch, but fortunately at least 2954 came right after p4 got released so it matches still. Not so sure about 2831.

A few, but really only a few crashes I looked at could have been http://bugs.ntp.org/show_bug.cgi?id=2969
Backporting that is rather easy so I'm going to include it (low risk of changing anything else).

Not getting recent patches is hard atm due to https://github.com/ntp-project/ntp/issues/15.
Looking for alternatives right now.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Found an alternative, thanks to Unit193 in #ubuntu-devel!

Backported four fixes overall to address the crashes I've seen in the reports.
Currently building and testing.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Backports and Testing done, coordinated with mdeslaur to not conflict with Security updates staged in https://launchpad.net/~ubuntu-security-proposed/+archive/ubuntu/ppa/+packages.

Plan is to push the SRU asap (which also i favorable looking at all the crash reports) and the Security Team will continue on top of that (there were some extra CVEs that need to be included anyway).

Adding patch and SRU template now...

tags: removed: third-party-packages
description: updated
Changed in ntp (Ubuntu Xenial):
status: Confirmed → Fix Committed
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Refined for SRU upload:
- clarified SRU template in this bug
- reduced number of patches by one to follow the "minimal change possible" guidance for SRU as hard
  as possible
- fixed the reference in d/p/* files to the related upstream commits

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Build and Tested again, ready for sponsoring review

Revision history for this message
Robie Basak (racb) wrote :

Looks good, thanks! I dropped debian/patches/ntp-4.2.8p4-segfaults-3-4.patch, added an apostrophe to the changelog and uploaded.

Changed in ntp (Ubuntu Xenial):
status: Fix Committed → In Progress
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Hello Pavlushka, or anyone else affected,

Accepted ntp into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ntp/1:4.2.8p4+dfsg-3ubuntu5.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ntp (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Martin for verifying and passing this SRU on to proposed!

To Pavlushka, Bob Jones, Sudhir Reddy - due to the raciness of this bug it is hard for anyone else to "really" verify. While I'm quite confident given the upstream discussions and by studying fix and crashes I'd really like your help testing the fix as in proposed if it fixes your issue as intended.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you very much in advance!

P.S. I beg a pardon, but to let this message and status change reach you I took the initiative and also subscribed some of you to the bug.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
since there was no update I'd once more ask one of the people out there which have systems affected by this to verify the new ntp package available in proposed.

Revision history for this message
Alexis Huxley (alexishuxley) wrote :

I am using ntp 1:4.2.8p4+dfsg-3ubuntu5 (i.e. the currently available package) and it crashes consistently and silently at, or shortly after, system boot up. If I restart it using systemctl then it crashes. If I start it manually from the command line just running 'ntpd' then it seems to keep running for longer, but even then it eventually silently crashes.

This is on a KVM-based VM running as a mail server with stock Postfix and Dovecot. I mention these packages because all other VMs (and PMs) have no issues, despite identical underlying installations.

I've installed 1:4.2.8p4+dfsg-3ubuntu5.1 and this seems to have fixed the problem.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thank you so much Alexis!

Due to the raciness even the slightest setup change can open/close the race-window - so it is "expected" that "... because all other VMs (and PMs) have no issues, despite identical underlying installations." can happen.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ntp - 1:4.2.8p4+dfsg-3ubuntu5.1

---------------
ntp (1:4.2.8p4+dfsg-3ubuntu5.1) xenial; urgency=medium

  * d/p/ntp-4.2.8p4-segfaults-[1-3]-3.patch fix startup crashes by
    including Juergen Perlinger's work on upstream bugs 2954 and 2831 to
    fix those (LP: #1567540).

 -- Christian Ehrhardt <email address hidden> Mon, 01 Aug 2016 10:50:52 +0200

Changed in ntp (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for ntp has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.