Segmentation fault in ntpd when system has more than 1134 interface addresses

Bug #235793 reported by Simon Boggis
6
Affects Status Importance Assigned to Milestone
NTP
Fix Released
High
ntp (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Binary package hint: ntp

$ uname -a
Linux xxxxx 2.6.24-17-generic #1 SMP Thu May 1 13:57:17 UTC 2008 x86_64 GNU/Linux

$ lsb_release -rd
Description: Ubuntu 8.04
Release: 8.04

$ apt-cache policy ntp
ntp:
  Installed: 1:4.2.4p4+dfsg-3ubuntu2
  Candidate: 1:4.2.4p4+dfsg-3ubuntu2
  Version table:
 *** 1:4.2.4p4+dfsg-3ubuntu2 0
        500 http://gb.archive.ubuntu.com hardy/main Packages
        100 /var/lib/dpkg/status

On a system which has any more than 1134 interface addresses (counting both IPv4 and IPv6 address thus: 'ip addr ls | grep inet | wc -l') ntpd fails with a segmentation fault.

I run a large network for a university (20,000 people), and we routinely operate open-source routers with several thousand interface addresses (every client machine lives in it's own /30 subnet).

Whilst examining the problem I used this script to add interface addresses:

$ cat breakit.sh
#!/bin/sh
set -e
dev="vlan252"
sudo ifdown ${dev} # clears all old addr
naddr=`ip add ls | grep inet | wc -l` # number of addr already on system
echo "Pre-existing addresses on system: ${naddr}"
: ${nbreakit:=1217} # number of addr required to break ntp
sudo ifup ${dev}
total=${naddr}
for I in `seq 1 5` ; do
  for J in `seq 0 255` ; do
    sudo ip addr add 10.0.${I}.${J}/32 dev ${dev}
    total=$((${total}+1))
    if [ ${total} -ge ${nbreakit} ]; then break ; fi
  done
  if [ ${total} -ge ${nbreakit} ]; then break ; fi
done
echo "Added extra addresses: $((${total}-${naddr}))"
echo "Total addresses on system now: $(ip add ls | grep inet | wc -l)"
exit 0

$ grep -B1 -A2 vlan252 /etc/network/interfaces

#auto vlan252
iface vlan252 inet manual
  vlan-raw-device eth0

I could then add the desired number of interfaces by doing:
$ nbreakit=1135 ../breakit.sh

and then run ntpd [1][2] under gdb by doing:

sudo sh -c "ulimit -n 8192 ; gdb --args /usr/sbin/ntpd -n -d -D3 -p /var/run/ntpd.pid -u 115:126 -g"

[1] You can't use '-d -D3' on the standard ubuntu package - debugging is disabled.
[2] It is necessary to raise the number of open files ulimit of 1024 in order to run ntpd with this many addresses.

In order to help me understand what was going wrong with the standard package, I rebuilt it with debugging enabled and symbols not stripped:

$ mkdir work
$ cd work/
$ apt-get source ntp
Reading package lists... Done
Building dependency tree
Reading state information... Done
NOTICE: 'ntp' packaging is maintained in the 'Svn' version control system at:
svn://svn.debian.org/pkg-ntp/ntp/
Need to get 3120kB of source archives.
Get: 1 http://gb.archive.ubuntu.com hardy/main ntp 1:4.2.4p4+dfsg-3ubuntu2 (dsc) [1034B]
Get: 2 http://gb.archive.ubuntu.com hardy/main ntp 1:4.2.4p4+dfsg-3ubuntu2 (tar) [2835kB]
Get: 3 http://gb.archive.ubuntu.com hardy/main ntp 1:4.2.4p4+dfsg-3ubuntu2 (diff) [284kB]
Fetched 3120kB in 0s (7433kB/s)
dpkg-source: extracting ntp in ntp-4.2.4p4+dfsg
dpkg-source: unpacking ntp_4.2.4p4+dfsg.orig.tar.gz
dpkg-source: applying ./ntp_4.2.4p4+dfsg-3ubuntu2.diff.gz
$ cd ntp-4.2.4p4+dfsg/
$ cp -p debian/rules{,.orig}
$ vi debian/rules
$ diff -u debian/rules{.orig,}
--- debian/rules.orig 2008-05-28 17:52:21.000000000 +0100
+++ debian/rules 2008-05-28 17:53:21.000000000 +0100
@@ -21,7 +21,7 @@
        ./configure CFLAGS='$(CFLAGS)' \
                --prefix=/usr \
                --enable-all-clocks --enable-parse-clocks --enable-SHM \
- --disable-debugging --sysconfdir=/var/lib/ntp \
+ --enable-debugging --sysconfdir=/var/lib/ntp \
                --with-sntp=no \
                --enable-linuxcaps \
                --disable-dependency-tracking
@@ -104,7 +104,7 @@
        dh_installlogcheck -a
        dh_installchangelogs -a
        dh_perl -a
- dh_strip -a
+ #dh_strip -a
        dh_compress -a
        dh_fixperms -a
        dh_installdeb -a
$ dpkg-buildpackage -us -uc
...
dpkg-buildpackage: binary and diff upload (original source NOT included)
$ sudo dpkg -i ../ntp_4.2.4p4+dfsg-3ubuntu2_amd64.deb

Once one has 1135 interface addresses on the system you get this segmentation fault:

0x000000000040c9d2 in update_interfaces (port=<value optimized out>,
    receiver=0, data=0x0) at ntp_io.c:769
769 ISC_LIST_UNLINK_TYPE(inter_list, interface, link, struct interface);

If one increases the number of interface addresses to 1215 you get a different segmentation fault:

update_interfaces (port=<value optimized out>, receiver=0, data=0x0)
    at ntp_io.c:1325
1325 if (!(interf->flags & (INT_WILDCARD|INT_MCASTIF))) {

and if one increases it to 1216 or more you finally get this one:

0x000000000040ba7f in add_interface (interface=0x7d13c0) at ntp_io.c:756
756 ISC_LIST_APPEND(inter_list, interface, link);

I had hoped to work around the problem by renaming the devices to 'vlan252:foo', since in ntpd/ntp_io.c:address_okay() the -L flag causes ntpd to ignore "virtual addresses" - this turns out to mean addresses on interfaces which contain a ':' in their name. Unfortunately, since the
segmentation fault occurs during interface enumeration (building a linked list of all interface+address) this doesn't help.

In the past, earlier versions of ntpd did not use a linked list for this purpose, but rather a fixed-size array of 512. By simply increasing the size of the array I was able to run with large numbers of addresses. It seems to me that, whilst the linked-list code ought to be made to work correctly, it is in fact an unnecessary precaution (and overhead) to bind all addresses on a linux system - only the root user could bind a more specific address than * on port 123, and if one has root the game is over anyway.

I haven't submitted this bug to the upstream package maintainer (http://www.ntp.org/bugs.html) partly because firefox won't let me see their bugs site (invalid SSL certificate), and partly because it might be better coming via the distribution. I note that ubuntu has the currently up-to-date version of ntp (4.2.4p4).

I include a full transcript of my tests attached as ntp_bugreport_v2.txt

Revision history for this message
Simon Boggis (s-a-boggis) wrote :
Revision history for this message
In , Jw-ntp (jw-ntp) wrote :
Download full text (6.2 KiB)

Hi:

We're running into the same problem on a redhat system that was previously
reported against ubuntu.

I don't think I could write a better bug report, so I'll just include/reference
it here:

http://<email address hidden>/msg858457.html

$ uname -a
Linux xxxxx 2.6.24-17-generic #1 SMP Thu May 1 13:57:17 UTC 2008 x86_64
GNU/Linux

$ lsb_release -rd
Description: Ubuntu 8.04
Release: 8.04

$ apt-cache policy ntp
ntp:
  Installed: 1:4.2.4p4+dfsg-3ubuntu2
  Candidate: 1:4.2.4p4+dfsg-3ubuntu2
  Version table:
 *** 1:4.2.4p4+dfsg-3ubuntu2 0
        500 http://gb.archive.ubuntu.com hardy/main Packages
        100 /var/lib/dpkg/status

On a system which has any more than 1134 interface addresses (counting
both IPv4 and IPv6 address thus: 'ip addr ls | grep inet | wc -l') ntpd
fails with a segmentation fault.

I run a large network for a university (20,000 people), and we routinely
operate open-source routers with several thousand interface addresses
(every client machine lives in it's own /30 subnet).

Whilst examining the problem I used this script to add interface addresses:

$ cat breakit.sh
#!/bin/sh
set -e
dev="vlan252"
sudo ifdown ${dev} # clears all old addr
naddr=`ip add ls | grep inet | wc -l` # number of addr already on system
echo "Pre-existing addresses on system: ${naddr}"
: ${nbreakit:=1217} # number of addr required to break ntp
sudo ifup ${dev}
total=${naddr}
for I in `seq 1 5` ; do
  for J in `seq 0 255` ; do
    sudo ip addr add 10.0.${I}.${J}/32 dev ${dev}
    total=$((${total}+1))
    if [ ${total} -ge ${nbreakit} ]; then break ; fi
  done
  if [ ${total} -ge ${nbreakit} ]; then break ; fi
done
echo "Added extra addresses: $((${total}-${naddr}))"
echo "Total addresses on system now: $(ip add ls | grep inet | wc -l)"
exit 0

$ grep -B1 -A2 vlan252 /etc/network/interfaces

#auto vlan252
iface vlan252 inet manual
  vlan-raw-device eth0

I could then add the desired number of interfaces by doing:
$ nbreakit=1135 ../breakit.sh

and then run ntpd [1][2] under gdb by doing:

sudo sh -c "ulimit -n 8192 ; gdb --args /usr/sbin/ntpd -n -d -D3 -p
/var/run/ntpd.pid -u 115:126 -g"

[1] You can't use '-d -D3' on the standard ubuntu package - debugging is
disabled.
[2] It is necessary to raise the number of open files ulimit of 1024 in order
to run ntpd with this many addresses.

In order to help me understand what was going wrong with the standard
package, I rebuilt it with debugging enabled and symbols not stripped:

$ mkdir work
$ cd work/
$ apt-get source ntp
Reading package lists... Done
Building dependency tree
Reading state information... Done
NOTICE: 'ntp' packaging is maintained in the 'Svn' version control system at:
svn://svn.debian.org/pkg-ntp/ntp/
Need to get 3120kB of source archives.
Get: 1 http://gb.archive.ubuntu.com hardy/main ntp 1:4.2.4p4+dfsg-3ubuntu2
(dsc) [1034B]
Get: 2 http://gb.archive.ubuntu.com hardy/main ntp 1:4.2.4p4+dfsg-3ubuntu2
(tar) [2835kB]
Get: 3 http://gb.archive.ubuntu.com hardy/main ntp 1:4.2.4p4+dfsg-3ubuntu2
(diff) [284kB]
Fetched 3120kB in 0s (7433kB/s)
dpkg-source: extracting ntp in ntp-4.2.4p4+dfsg
dpkg-source: unpacking...

Read more...

Revision history for this message
In , Mayer-r (mayer-r) wrote :

Can you say why you need that many addresses on one system?

I haven't looked recently at what's going on but I would recommend that you turn
off rescanning by setting -u 0 on the command line. The only way that I can
think of that might cause a linked list to fail is if it ran out of memory but
then I don't see why that would happen where you say since it would be in the
malloc() call.

Does BIND9 run okay on this system?

Danny

Revision history for this message
In , Jw-ntp (jw-ntp) wrote :

Hi Danny:

The systems in question are part of an outbound mail farm running eCelerity (
http://messagesystems.com/ ). Clients get one or more ip addresses depending on
their volume/rate.

I'll see if -u 0 helps.

We don't run bind on those boxes, but I can try and see if it starts...

Thanks!

Revision history for this message
In , Stenn (stenn) wrote :

Subject: Segmentation fault in ntpd when system has more than 1134 interface addresses

While I can appreciate that this is a rare case, the bottom line is we
seem to have a bug and it should be fixed.

We've been meaning to upgrade and use a more "stock" libisc/, perhaps
reporting some bugs along the way.

This looks like an opportunity...

--
Harlan Stenn <email address hidden>
http://ntpforum.isc.org - be a member!

Revision history for this message
In , Jw-ntp (jw-ntp) wrote :

-u 0 didn't make any difference. We'll try to see if bind starts.

We're happy to test a patch, if one comes available.

Revision history for this message
In , Norbert-eder (norbert-eder) wrote :

It is clear, that your additional options don't solve the problem.

You must use a BIG U

I used -U 0 and it solved this Problem.
I opened the Bug-ID 1102.

bye,
Nobsi

Revision history for this message
In , Dave Hart (hart-ntp) wrote :

Thanks to a few hours of cooperative debugging with Sandu Adrian
<email address hidden> today, the faulty code has been identified. In
ntpd/ntp_io.c the last line of add_fd_to_list is FD_SET(fd, &activefds). This
(and the preceding maxactivefd maintenance) should be protected by a check that
fd < FD_SETSIZE. If you add code to check for that and msyslog a message and
then exit, you will no longer crash ntpd starting with many interfaces.

Fixing the actual crash is easy. The tougher part is figuring out how we are
going to enumerate more interfaces than FD_SETSIZE, in other words, how to
defer opening sockets for each until after enumeration, so that some mechanism
can be used to select a subset of interfaces less numerous than FD_SETSIZE that
ntpd can use.

I suggest we immediately add code to terminate with an error if we are about to
try to add an fd >= FD_SETSIZE with FD_SET. We could also add code to
add_interface to catch corruption of inter_list.head sooner, as it helped to
track this issue down. That depends on the first entry in inter_list never
being removed but I think that's a safe assumption.

Revision history for this message
In , Dave Hart (hart-ntp) wrote :

I have the two fixes described in Comment #6 ready, which fixes the crash
reported. Note that ntpd will simply refuse to start with more interfaces than
around FD_SETSIZE (1024 on Linux), so this is not a solution to the overall
problem, but it does fix the bug that caused corruption and the fault.

pogo:~hart/ntp-stable-784-jjy

Revision history for this message
In , Stenn (stenn) wrote :

Jeff,

Please check 4.2.4p7-RC6 (or later) and mark this bug as VERIFIED or
REOPENED, as appropriate.

Dave Hart and Sandu, thanks for your work in getting the root cause
identified and resolved.

Revision history for this message
In , Dave Hart (hart-ntp) wrote :

See Bug #1180 regarding a fix for starting ntpd with more than 1000 interfaces

Revision history for this message
Dave Hart (hart-ntp) wrote :

This bug report was refiled in the NTP bug database nearly verbatim as bug #1071, which was just resolved with a bounds-check. If you have cacert.org's root certificate loaded in your browser, or are willing to click past possibly dire warnings about an invalid certificate, you're welcome to check it out directly at http://bugs.ntp.org/1071

Revision history for this message
In , Mayer-r (mayer-r) wrote :

Just note that FD_SETSIZE can be any value and it may default 1024 on SOME
flavors of Linux and be something totally different on other O/S's but it could
be anything if overridden so this check looks right.

Danny

Revision history for this message
Simon Boggis (s-a-boggis) wrote : Re: [Bug 235793] Re: Segmentation fault in ntpd when system has more than 1134 interface addresses

Dave Hart wrote:
> This bug report was refiled in the NTP bug database nearly verbatim as
> bug #1071, which was just resolved with a bounds-check. If you have
> cacert.org's root certificate loaded in your browser, or are willing to
> click past possibly dire warnings about an invalid certificate, you're
> welcome to check it out directly at http://bugs.ntp.org/1071
>

Thanks very much for letting me know - I'll have a look and see if it
resolves the problem for me, and report back.

Simon

Revision history for this message
Chuck Short (zulcss) wrote :

Looks like karmic is still affected by this according to the bug report.

Regards
chuck

Changed in ntp (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Changed in ntp:
status: Unknown → Fix Released
Revision history for this message
Chuck Short (zulcss) wrote :

Hi Simon,

I was not able to reproduce this on karmic. If possible can you re-do your tests on karmic?

Thanks
chuck

Revision history for this message
Simon Boggis (s-a-boggis) wrote :

Chuck Short wrote:
> Hi Simon,
>
> I was not able to reproduce this on karmic. If possible can you re-do
> your tests on karmic?
>
> Thanks
> chuck
>

Someone who was working on ntpd told me that they thought that
4.2.4-p7-rc6 and higher might fix my problem - I haven't had time to
retest yet though.

I'll try to grab your package and give it a whirl again.

Simon

--
Dr Simon A. Boggis Senior Network Analyst
Computing Services, Tel. 020 7882 7078
Queen Mary, University of London, London E1 4NS UK.

Revision history for this message
In , Kostecke-8 (kostecke-8) wrote :

Please mark this bug as VERIFIED if you agree that it is fixed.

Or reopen it if further work is required.

Revision history for this message
In , Stenn (stenn) wrote :

*** Bug 1102 has been marked as a duplicate of this bug. ***

Changed in ntp:
importance: Unknown → High
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I know it's a long time, but I'm cleaning up old NTP bugs atm.

Everything other than precise that is still in service has the upstream fix (>=4.2.4p4).
After that much time it is not worth considering an SRU for the remaining time of Precise.
Setting fix released for the devel task of this bug.

Changed in ntp (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.