Comment 1 for bug 1952264

Revision history for this message
Junien F (axino) wrote : Re: ntp sync checks fail when server as no IPv6 connectivity

I investigated this quite a bit, and this appears to be an ntp bug and not a charm bug.

This host is a trusty host, running ntp version 1:4.2.6.p5+dfsg-3ubuntu2.14.04.13. We have other hosts running the same version that don't have the problem described above.

I spent quite some time investigating this, comparing the hosts, running strace etc, and I noticed a subtle difference in /etc/hosts : on the working host, the ::1 entry doesn't have "localhost", but it does on the failing host. When I removed "localhost" from the ::1 entry on the failing host, "ntpq -pn" started working.

Investigating things a bit more, I found out that on the working host, ntpd was listening on ::1 but on the failing host, it wasn't (by checking "ss -anupe" output as well as ntpd starting logs).

Comparing straces of starting ntpd, I think I was able to find what's going on. On the working host it gives (only relevant output is posted here) :

3973 19:41:32 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 5
[...]
3973 19:41:32 ioctl(5, SIOCGIFFLAGS, {ifr_name="qvobb268af4-e9", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_PROMISC|IFF_MULTICAST}) = 0
3973 19:41:32 ioctl(5, SIOCGIFFLAGS, {ifr_name="qbrd5588b49-e3", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_MULTICAST}) = 0
3973 19:41:32 ioctl(5, SIOCGIFFLAGS, {ifr_name="qvb1693c156-5f", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_PROMISC|IFF_MULTICAST}) = 0
[... the same for a bunch of interfaces - this is a nova compute node so this is expected ...]
3973 19:41:32 close(5) = 0

But on the failing host, it checks a single interface :
56717 19:37:03 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 5
[...]
56717 19:37:03 ioctl(5, SIOCGIFFLAGS, {ifr_name="qvbba244f00-69", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_PROMISC|IFF_MULTICAST}) = 0
56717 19:37:03 close(5) = 0

So I thought this interface was a bit special :
$ ip li sh dev qvbba244f00-69
67772: qvbba244f00-69@qvoba244f00-69: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc noqueue master qbrba244f00-69 state UP mode DEFAULT group default qlen 1000
    link/ether 0e:ac:86:b1:c8:24 brd ff:ff:ff:ff:ff:ff

It appears completely normal, except that it has an unusually high ifindex (67772). Could that be the cause of the problem ? Looking at the source code at https://git.launchpad.net/ubuntu/+source/ntp/tree/?h=ubuntu/trusty-updates : interfaces are parsed looking at the /proc/net/if_inet6 file (https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/ifiter_getifaddrs.c?h=ubuntu/trusty-updates#n54) which strace confirms :

3973 19:41:32 open("/proc/net/if_inet6", O_RDONLY) = 6

Each line is parsed using fgets :

fgets(iter->entry, sizeof(iter->entry), iter->proc) != NULL)

https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/interfaceiter.c?h=ubuntu/trusty-updates#n181

What's sizeof(iter->entry) ? Well "entry" is defined like that :

 char entry[ISC_IF_INET6_SZ];

https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/ifiter_getifaddrs.c?h=ubuntu/trusty-updates#n48

And ISC_IF_INET6_SZ is :
#define ISC_IF_INET6_SZ \
    sizeof("00000000000000000000000000000001 01 80 10 80 XXXXXXloXXXXXXXX\n")

https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/interfaceiter.c?h=ubuntu/trusty-updates#n153

And this is where the problem is. The computation of ISC_IF_INET6_SZ assumes that ifindex will be 2 chars (in hex), so that ifindex will be < 256. However, ifindexes higher than that are likely common, so why don't we see this bug elsewhere ? Well because the computation of ISC_IF_INET6_SZ also assumes that the interface name is 16 chars.

In our example, the interface name is "only" 14 chars, so we have a buffer of 2 chars for the ifindex. But that's not enough, it's off by 1 in fact !
"00000000000000000000000000000001 01 80 10 80 XXXXXXloXXXXXXXX\n" is 62 chars long.
The first line of if_inet6 on our machine is :
fe800000000000000cac86fffeb1c824 108bc 40 20 80 qvbba244f00-69, and that's 62 chars long... but without the \n !

So what might be happening here is that the first iteration of the loop will properly read the whole line except the \n, and the next iteration will resume at that location, and because fgets() stops at EOF or newline, it will just return a newline, which will make the whole iteration stop.

The fix here is pretty simple : the computation of ISC_IF_INET6_SZ should assume an ifindex of UINT_MAX, ie ffffffff (or any 8-chars number). If I can trust https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/interfaceiter.c?h=applied/ubuntu/jammy this is still present in Jammy.

Redirecting the bug to the "ntp" package.