ntpd interface listing is bugged
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
NTP Charm |
Invalid
|
Undecided
|
Unassigned | ||
ntp (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
This charm sets up ntpmon and nagios checks to alert when ntp was not able to select a sync peer.
On a server without a routable ipv6 configured, ntpq -p fails with:
$ ntpq -p
localhost: timed out, nothing received
***Request timed out
$ /opt/ntpmon-
CRITICAL: No sync peer selected | frequency= offset=nan peers=0 reach=nan result=2 rootdelay= rootdisp= runtime= stratum= sync=0.000000 sysjitter= sysoffset= tracehosts= traceloops= tracetime=
This results in a nagios alert complaining about the problem.
Although:
$ ntpq -p -4
remote refid st t when poll reach delay offset jitter
=======
*hostname1 xxx.xxx.xxx.x 2 u 210 256 377 0.842 0.031 0.050
+hostname2 xxx.xxx.xxx.x 2 u 88 256 377 0.327 0.062 0.107
-hostname3 xxx.xxx.xxx.x 2 u 210 256 377 75.810 -1.198 1.035
+hostname4 xxx.xxx.xxx.x 2 u 68 256 377 0.751 0.078 0.193
$ ntpq -p -4 | /opt/ntpmon-
OK: Time is in sync with hostname1 | frequency= offset=0.000057 peers=4 reach=100.000000 result=0 rootdelay= rootdisp= runtime= stratum= sync=1.000000 sysjitter= sysoffset= tracehosts= traceloops= tracetime=
Maybe this is a bug to file against ntp itself ? Or some configuration could allow ntpq -p and check_ntpmon.py to succeed ? I've tested running ntpd with -4 (using defaults file) but with no luck.
Let us know if you need more information.
Thank you,
Loïc
summary: |
- ntp sync checks fail when server as no IPv6 connectivity + ntpd interface listing is bugged |
I investigated this quite a bit, and this appears to be an ntp bug and not a charm bug.
This host is a trusty host, running ntp version 1:4.2.6. p5+dfsg- 3ubuntu2. 14.04.13. We have other hosts running the same version that don't have the problem described above.
I spent quite some time investigating this, comparing the hosts, running strace etc, and I noticed a subtle difference in /etc/hosts : on the working host, the ::1 entry doesn't have "localhost", but it does on the failing host. When I removed "localhost" from the ::1 entry on the failing host, "ntpq -pn" started working.
Investigating things a bit more, I found out that on the working host, ntpd was listening on ::1 but on the failing host, it wasn't (by checking "ss -anupe" output as well as ntpd starting logs).
Comparing straces of starting ntpd, I think I was able to find what's going on. On the working host it gives (only relevant output is posted here) :
3973 19:41:32 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 5 "qvobb268af4- e9", ifr_flags= IFF_UP| IFF_BROADCAST| IFF_RUNNING| IFF_PROMISC| IFF_MULTICAST} ) = 0 "qbrd5588b49- e3", ifr_flags= IFF_UP| IFF_BROADCAST| IFF_RUNNING| IFF_MULTICAST} ) = 0 "qvb1693c156- 5f", ifr_flags= IFF_UP| IFF_BROADCAST| IFF_RUNNING| IFF_PROMISC| IFF_MULTICAST} ) = 0
[...]
3973 19:41:32 ioctl(5, SIOCGIFFLAGS, {ifr_name=
3973 19:41:32 ioctl(5, SIOCGIFFLAGS, {ifr_name=
3973 19:41:32 ioctl(5, SIOCGIFFLAGS, {ifr_name=
[... the same for a bunch of interfaces - this is a nova compute node so this is expected ...]
3973 19:41:32 close(5) = 0
But on the failing host, it checks a single interface : "qvbba244f00- 69", ifr_flags= IFF_UP| IFF_BROADCAST| IFF_RUNNING| IFF_PROMISC| IFF_MULTICAST} ) = 0
56717 19:37:03 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 5
[...]
56717 19:37:03 ioctl(5, SIOCGIFFLAGS, {ifr_name=
56717 19:37:03 close(5) = 0
So I thought this interface was a bit special : 69@qvoba244f00- 69: <BROADCAST, MULTICAST, PROMISC, UP,LOWER_ UP> mtu 1500 qdisc noqueue master qbrba244f00-69 state UP mode DEFAULT group default qlen 1000
$ ip li sh dev qvbba244f00-69
67772: qvbba244f00-
link/ether 0e:ac:86:b1:c8:24 brd ff:ff:ff:ff:ff:ff
It appears completely normal, except that it has an unusually high ifindex (67772). Could that be the cause of the problem ? Looking at the source code at https:/ /git.launchpad. net/ubuntu/ +source/ ntp/tree/ ?h=ubuntu/ trusty- updates : interfaces are parsed looking at the /proc/net/if_inet6 file (https:/ /git.launchpad. net/ubuntu/ +source/ ntp/tree/ lib/isc/ unix/ifiter_ getifaddrs. c?h=ubuntu/ trusty- updates# n54) which strace confirms :
3973 19:41:32 open("/ proc/net/ if_inet6" , O_RDONLY) = 6
Each line is parsed using fgets :
fgets(iter->entry, sizeof( iter->entry) , iter->proc) != NULL)
https:/ /git.launchpad. net/ubuntu/ +source/ ntp/tree/ lib/isc/ unix/interfacei ter.c?h= ubuntu/ trusty- updates# n181
What's sizeof(iter->entry) ? Well "entry" is defined like that :
char entry[ISC_ IF_INET6_ SZ];
https:/ /git.launchpad. net/ubuntu/ +source/ ntp/tree/ lib/isc/ unix/ifiter_ getifaddrs. c?h=ubuntu/ trusty- updates# n48
And ISC_IF_INET6_SZ is : "00000000000000 000000000000000 001 01 80 10 80 XXXXXX...
#define ISC_IF_INET6_SZ \
sizeof(