ntpd interface listing is bugged

Bug #1952264 reported by Loïc Gomez
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
NTP Charm
Invalid
Undecided
Unassigned
ntp (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

This charm sets up ntpmon and nagios checks to alert when ntp was not able to select a sync peer.

On a server without a routable ipv6 configured, ntpq -p fails with:
$ ntpq -p
localhost: timed out, nothing received
***Request timed out

$ /opt/ntpmon-ntp-charm/check_ntpmon.py --check sync
CRITICAL: No sync peer selected | frequency= offset=nan peers=0 reach=nan result=2 rootdelay= rootdisp= runtime= stratum= sync=0.000000 sysjitter= sysoffset= tracehosts= traceloops= tracetime=

This results in a nagios alert complaining about the problem.
Although:

$ ntpq -p -4
     remote refid st t when poll reach delay offset jitter
==============================================================================
*hostname1 xxx.xxx.xxx.x 2 u 210 256 377 0.842 0.031 0.050
+hostname2 xxx.xxx.xxx.x 2 u 88 256 377 0.327 0.062 0.107
-hostname3 xxx.xxx.xxx.x 2 u 210 256 377 75.810 -1.198 1.035
+hostname4 xxx.xxx.xxx.x 2 u 68 256 377 0.751 0.078 0.193

$ ntpq -p -4 | /opt/ntpmon-ntp-charm/check_ntpmon.py --check sync --test
OK: Time is in sync with hostname1 | frequency= offset=0.000057 peers=4 reach=100.000000 result=0 rootdelay= rootdisp= runtime= stratum= sync=1.000000 sysjitter= sysoffset= tracehosts= traceloops= tracetime=

Maybe this is a bug to file against ntp itself ? Or some configuration could allow ntpq -p and check_ntpmon.py to succeed ? I've tested running ntpd with -4 (using defaults file) but with no luck.

Let us know if you need more information.

Thank you,
Loïc

Revision history for this message
Junien F (axino) wrote :
Download full text (4.6 KiB)

I investigated this quite a bit, and this appears to be an ntp bug and not a charm bug.

This host is a trusty host, running ntp version 1:4.2.6.p5+dfsg-3ubuntu2.14.04.13. We have other hosts running the same version that don't have the problem described above.

I spent quite some time investigating this, comparing the hosts, running strace etc, and I noticed a subtle difference in /etc/hosts : on the working host, the ::1 entry doesn't have "localhost", but it does on the failing host. When I removed "localhost" from the ::1 entry on the failing host, "ntpq -pn" started working.

Investigating things a bit more, I found out that on the working host, ntpd was listening on ::1 but on the failing host, it wasn't (by checking "ss -anupe" output as well as ntpd starting logs).

Comparing straces of starting ntpd, I think I was able to find what's going on. On the working host it gives (only relevant output is posted here) :

3973 19:41:32 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 5
[...]
3973 19:41:32 ioctl(5, SIOCGIFFLAGS, {ifr_name="qvobb268af4-e9", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_PROMISC|IFF_MULTICAST}) = 0
3973 19:41:32 ioctl(5, SIOCGIFFLAGS, {ifr_name="qbrd5588b49-e3", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_MULTICAST}) = 0
3973 19:41:32 ioctl(5, SIOCGIFFLAGS, {ifr_name="qvb1693c156-5f", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_PROMISC|IFF_MULTICAST}) = 0
[... the same for a bunch of interfaces - this is a nova compute node so this is expected ...]
3973 19:41:32 close(5) = 0

But on the failing host, it checks a single interface :
56717 19:37:03 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 5
[...]
56717 19:37:03 ioctl(5, SIOCGIFFLAGS, {ifr_name="qvbba244f00-69", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_PROMISC|IFF_MULTICAST}) = 0
56717 19:37:03 close(5) = 0

So I thought this interface was a bit special :
$ ip li sh dev qvbba244f00-69
67772: qvbba244f00-69@qvoba244f00-69: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc noqueue master qbrba244f00-69 state UP mode DEFAULT group default qlen 1000
    link/ether 0e:ac:86:b1:c8:24 brd ff:ff:ff:ff:ff:ff

It appears completely normal, except that it has an unusually high ifindex (67772). Could that be the cause of the problem ? Looking at the source code at https://git.launchpad.net/ubuntu/+source/ntp/tree/?h=ubuntu/trusty-updates : interfaces are parsed looking at the /proc/net/if_inet6 file (https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/ifiter_getifaddrs.c?h=ubuntu/trusty-updates#n54) which strace confirms :

3973 19:41:32 open("/proc/net/if_inet6", O_RDONLY) = 6

Each line is parsed using fgets :

fgets(iter->entry, sizeof(iter->entry), iter->proc) != NULL)

https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/interfaceiter.c?h=ubuntu/trusty-updates#n181

What's sizeof(iter->entry) ? Well "entry" is defined like that :

 char entry[ISC_IF_INET6_SZ];

https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/ifiter_getifaddrs.c?h=ubuntu/trusty-updates#n48

And ISC_IF_INET6_SZ is :
#define ISC_IF_INET6_SZ \
    sizeof("00000000000000000000000000000001 01 80 10 80 XXXXXX...

Read more...

Changed in ntp-charm:
status: New → Invalid
Junien F (axino)
summary: - ntp sync checks fail when server as no IPv6 connectivity
+ ntpd interface listing is bugged
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ntp (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.