ntpd crashed with SIGABRT (was: ntp crashes everytime the network goes up or down.)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
NTP |
Fix Released
|
High
|
|||
ntp (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Xenial |
Fix Released
|
High
|
Christian Ehrhardt |
Bug Description
[Impact]
* In NTP 4.2.8p4 there are several races that can cause a crash on
startup or on a bit later but still on startup by DNS querying a
peer.
* The crash obviously affects users, especially as it seems - due to
its racy nature - not appear on most, but severely hamstring some
other users.
* The details are a bit blurred, but overall there were four fixes
upstream that address just this "kind of issue" that seemed to
surface post 4.2.8p4.
[Test Case]
* Start NTP (service)
* Expectation: work
* Failure: Crash
* Constraints: this is a race, it seems to appear at <0.1% chance to
all systems I have (or lower - as I just can say it didn't trigger in
1000 tests). But that matches other reports. OTOH for some systems it
seems to trigger >50% which also matches the high amount of crash
reports (close to 20k now) as referred in comment 43
[Regression Potential]
* Eventually the change is rather invasive as it changes the locking
scheme of parts of the code - so there surely is some regression
potential.
* Fortunately all of this change is upstream and tested there
quite heavily. Most of it for a few months already.
* I tested as good as I could and could neither in code nor in test
find an obvious weakness, and looking at all the crash reports it is
about time.
[Other Info]
* While all study of bugs, upstream changes and tests suggest we
haven't broken anything, still I have to admit that "on my own" I
can't confirm that it fixed the bug. So we are really dependent on
the reporters here that seem to have the kind of hardware where it
"crashes reliably".
--------
ntp crashes every time the network goes up or down while the system is running and also crashes after booting up without network.
---
ApportVersion: 2.20.1-0ubuntu1
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-03-12 (26 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64 (20160224)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+
PackageArchitec
ProcCmdline: BOOT_IMAGE=
ProcVersionSign
Tags: xenial
Uname: Linux 4.4.0-17-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu1
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-03-12 (31 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64 (20160224)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+
PackageArchitec
ProcCmdline: BOOT_IMAGE=
ProcVersionSign
Tags: xenial
Uname: Linux 4.4.0-18-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu1
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-04-13 (0 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+
PackageArchitec
ProcCmdline: BOOT_IMAGE=
ProcVersionSign
Tags: xenial
Uname: Linux 4.4.0-18-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu1
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-04-13 (0 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+
PackageArchitec
ProcCmdline: BOOT_IMAGE=
ProcVersionSign
Tags: xenial
Uname: Linux 4.4.0-18-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-04-14 (3 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+
PackageArchitec
ProcCmdline: BOOT_IMAGE=
ProcVersionSign
Tags: xenial
Uname: Linux 4.4.0-20-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-04-14 (3 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+
PackageArchitec
ProcCmdline: BOOT_IMAGE=
ProcVersionSign
Tags: xenial
Uname: Linux 4.4.0-20-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
CurrentDesktop: XFCE
DistroRelease: Ubuntu 16.04
InstallationDate: Installed on 2016-04-14 (63 days ago)
InstallationMedia: Xubuntu 16.04 LTS "Xenial Xerus" - Beta amd64 (20160412)
NtpStatus: ntpq: read: Connection refused
Package: ntp 1:4.2.8p4+
PackageArchitec
ProcCmdline: BOOT_IMAGE=
ProcVersionSign
Tags: xenial third-party-
Uname: Linux 4.4.0-25-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dialout dip lpadmin mail netdev plugdev sambashare sudo
_MarkForUpload: True
Changed in ntp (Ubuntu): | |
status: | Incomplete → New |
Changed in ntp: | |
importance: | Unknown → High |
status: | Unknown → Fix Released |
Changed in ntp (Ubuntu Xenial): | |
importance: | Undecided → High |
Changed in ntp (Ubuntu): | |
assignee: | nobody → ChristianEhrhardt (paelzer) |
Changed in ntp (Ubuntu): | |
assignee: | ChristianEhrhardt (paelzer) → nobody |
Changed in ntp (Ubuntu Xenial): | |
assignee: | nobody → ChristianEhrhardt (paelzer) |
description: | updated |
Changed in ntp (Ubuntu Xenial): | |
status: | Confirmed → Fix Committed |
description: | updated |
It's not solid, but I've seen three of these so far.
It crashes ballpark of 1 in 5 tries.
FreeBSD 10.1-RELEASE amd64
I haven't seen any troubles like this before 4.3.33
It crashes before it writes anything to the post-switching log file.
May 14 01:32:20 ted3 ntpd[79529]: switching logging to file /var/log/ ntp/ntpd. lo
g
May 14 01:32:20 ted3 kernel: pid 79529 (ntpd), uid 0: exited on signal 11 (core
dumped)
Core was generated by `ntpd'. s.so.1. ..done. so.6... done. so.5... done. so.3... done. so.7... done. ld-elf. so.1... done. ld-elf. so.1 emalloc. c:43 ntp_intres. c:982 getaddrinfo (c=0x801c42100, req=0x801c1b0c0) ntp_intres. c:327 child_common (c=<value optimized out>) ntp_worker. c:288 0x80180a140) work_thread. c:663
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libgcc_
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libmd.
Loaded symbols for /lib/libmd.so.6
Reading symbols from /lib/libm.
Loaded symbols for /lib/libm.so.5
Reading symbols from /lib/libthr.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /lib/libc.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/
Loaded symbols for /libexec/
#0 0x000000080119db43 in sbrk () from /lib/libc.so.7
[New Thread 801c06c00 (LWP 100287/ntpd)]
[New Thread 801c06400 (LWP 100169/ntpd)]
(gdb) bt
#0 0x000000080119db43 in sbrk () from /lib/libc.so.7
#1 0x0000000801199aaf in sbrk () from /lib/libc.so.7
#2 0x0000000801184593 in syscall () from /lib/libc.so.7
#3 0x00000008011a5283 in realloc () from /lib/libc.so.7
#4 0x0000000000437285 in ereallocz (ptr=0x80180a140, newsz=32, priorsz=0,
zero_init=1) at ../../libntp/
#5 0x00000000004399c7 in get_worker_context (c=0x801c42100, idx=0)
at ../../libntp/
#6 0x0000000000439665 in blocking_
at ../../libntp/
#7 0x000000000043a5d0 in blocking_
at ../../libntp/
#8 0x000000000043c619 in blocking_thread (ThreadArg=
at ../../libntp/
#9 0x0000000800ed74f5 in pthread_create () from /lib/libthr.so.3
#10 0x0000000000000000 in ?? ()
(gdb)