ntp host name not found error

Bug #548885 reported by Barry Fishman
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
NTP
Fix Released
Medium
ntp (Ubuntu)
Fix Released
Low
Unassigned

Bug Description

Binary package hint: ntp

When ntp is started (or restarted) it produces in syslog messages of the form:
Mar 26 11:03:26 ecube ntpd_initres[7854]: host name not found: pool.ntp.org

for all of the servers I have put in the /etc/ntp.conf file.

A workaround is to use "dig" to get internet addresses for these servers and put them in
my /etc/hosts file. When ntp is then restarted, a time adjustment is finally made.

Other programs line filefox and dig seem to have no problems finding
network adresses for hostnames. I tries substituting the 'karmic' executable for /usr/sbin/ntpd but
it showed the same behavior as the llucid version.

apt-cache policy ntp
returns:
ntp:
  Installed: 1:4.2.4p8+dfsg-1ubuntu1
  Candidate: 1:4.2.4p8+dfsg-1ubuntu1
  Version table:
 *** 1:4.2.4p8+dfsg-1ubuntu1 0
        500 http://us.archive.ubuntu.com/ubuntu/ lucid/main Packages
        100 /var/lib/dpkg/status

lsb_release -rd
Returns:
Description: Ubuntu lucid (development branch)
Release: 10.04

ProblemType: Bug
Architecture: amd64
Date: Fri Mar 26 11:09:07 2010
DistroRelease: Ubuntu 10.04
ExecutablePath: /usr/sbin/ntpd
InstallationMedia: Ubuntu 10.04 "Lucid Lynx" - Alpha amd64 (20100310)
NonfreeKernelModules: nvidia
NtpStatus:
 remote refid st t when poll reach delay offset jitter
 ==============================================================================
 *ntp.ubuntu.com 193.79.237.14 2 u 54 64 377 112.184 -71.271 36.530
 +0.pool.ntp.org 128.4.1.1 2 u 9 64 377 52.150 -63.289 26.678
 +1.pool.ntp.org 128.32.206.55 3 u 42 64 377 89.701 -69.962 30.392
Package: ntp 1:4.2.4p8+dfsg-1ubuntu1
ProcAttrCurrent: /usr/sbin/ntpd (enforce)
ProcEnviron:
 LC_COLLATE=C
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-17.26-generic 2.6.32.10+drm33.1
SourcePackage: ntp
Uname: Linux 2.6.32-17-generic x86_64
mtime.conffile..etc.ntp.conf: 2010-03-24T17:10:08

Revision history for this message
In , H-murray (h-murray) wrote :

If DNS isn't working when ntpd starts, the lookup is deferred.

For the server command, that works correctly. For the pool command, it only
gets 1 IP address.

Revision history for this message
In , Stenn (stenn) wrote :

Bug 761 is kinda related to this.

If one is fixed, the other could be fairly easily fixed at the same time.

Revision history for this message
In , Mayer-r (mayer-r) wrote :

Well no. That bug doesn't deal with multiple addresses at all. The problem here
is that the pool option expects multiple addresses to be returned by DNS and
used to set up associations. That bug won't solve this one.

Danny

Revision history for this message
In , Stenn (stenn) wrote :

Subject: Deferred DNS lookup on pool command only gets 1 server

Danny,

Well, yes. If the response needs to be tailored, it should be pretty
trivial to knock these things out together.

I'm talking about the work that needs to be done in the forked resolver
process to get the information sent back to the main process.

--
Harlan Stenn <email address hidden>

Revision history for this message
In , H-murray (h-murray) wrote :

Fix is in pogo:/usa/murray/bug-975

This also fixes bug-761.

I've tested on Linus, NetBSD and FreeBSD.

Revision history for this message
In , Dave Hart (hart-ntp) wrote :

I've spent a bit of time reviewing Hal's bug-975 repo.

First, above all else, Hal is cleaning house in ntp_intres.c while I'm stacking
kindling under the corners to burn it to the ground once 4.2.6 is out.

Second, after 6 weeks in a RC cycle, is it really important to fix the bugs this
fixes and provide the improvements this provides for 4.2.6? A bunch of new code
which we get to debug in -stable is unappealing to me.

Assuming you get past those concerns and proceed, other items I note:

A) ntpd.h reverts a recent change adding const to extern char *chrootdir;
B) ntp_intres.c has hints.ai_protocol = 17; and a comment about the header that
value was sniped from. That should simply come out, setting ai_protocol
portably is nontrivial and we don't need it, do we?
C) getaddrinfo() returning EAI_SYSTEM was retried before, now it's considered
permanent. If I were writing from scratch I'd do the same, but given it has
behaved that way for years, was there a good reason to change?
D) There are a dozen or so places where the body of an if statement continues
on the same line as the conditional, NTP style is to put the body on the
following line indented.
E) In doconfigure() there's an apparently overlooked block of test code in
column 1 under if (0) that should come out or be cleaned up to a #ifdef
SOME_TEST_MACRO.
F) A few lines later in doconfigure(), there's a questionable use of
in6addr_any when returning a v4 result, which depends on whether the system was
built with IPv6 support. Why not use NULL as is done for the v4 address when
returning a v6 result?

Hal, I'm troubled posting these comments. On the one hand, I'm thrilled you've
taken an interest in getting ntpd DNS resolution right, and you've put a lot of
effort into developing and testing your changes. On the other hand, well, what
I said up top. The timing is just horrible. I really do want to replace
ntp_intres wholesale with a callback-based getaddrinfo() clone that will call
back a given function when the results are available, and that will mean most of
your intres work will be tossed.

Revision history for this message
In , H-murray (h-murray) wrote :
Download full text (4.3 KiB)

Subject: Deferred DNS lookup on pool command only gets 1 server

> I've spent a bit of time reviewing Hal's bug-975 repo.

Thanks.

> First, above all else, Hal is cleaning house in ntp_intres.c while I'm
> stacking kindling under the corners to burn it to the ground once
> 4.2.6 is out.

I'm OK if we dump my stuff. If nothing else, I think some of the quirks I've
sorted out might be helpful.

> Second, after 6 weeks in a RC cycle, is it really important to fix the
> bugs this fixes and provide the improvements this provides for 4.2.6?
> A bunch of new code which we get to debug in -stable is unappealing
> to me.

I don't know how to call this. Perhaps we should move the discussion to the
hackers list.

> Hal, I'm troubled posting these comments.

No problem from my end. I thought they were all constructive. Thanks.

                          On the one hand, I'm
> thrilled you've taken an interest in getting ntpd DNS resolution
> right, and you've put a lot of effort into developing and testing
> your changes. On the other hand, well, what I said up top. The
> timing is just horrible. I really do want to replace ntp_intres
> wholesale with a callback-based getaddrinfo() clone that will call
> back a given function when the results are available, and that will
> mean most of your intres work will be tossed.

Yup, the timing sucks. I did it in case Harlan (and/or others) thought
getting those bugs fixed was important enough to delay the release for more
testing and/or take the risk.

Another possibility would be to release what we have now and plan to have
another release as soon as some collection of changes can be tested.

> A) ntpd.h reverts a recent change adding const to extern char
> *chrootdir;

I think that was because I hadn't done a recent enough bk pull. Fixed now.

> B) ntp_intres.c has hints.ai_protocol = 17; and a comment about the
> header that value was sniped from. That should simply come out,
> setting ai_protocol portably is nontrivial and we don't need it, do
> we?

Without that, I got 3 copies of each IP Address. One was TCP, one was UDP, I
don't remember the 3rd. There may be a cleaner fix, but something is needed
in that area.

> E) In doconfigure() there's an apparently overlooked block of test
> code in column 1 under if (0) that should come out or be cleaned up
> to a #ifdef SOME_TEST_MACRO.

That's what I used to debug the above stuff.

I'm not sure how to handle code like that. Perhaps throwing it away is best.
 (Then we don't have to discuss it.) That would mean somebody would have to
type it in again if they were ever chasing the same sort of bug. I don't
think it's worth a macro name. It's very localized.

Often, when I'm debugging with printfs, after I've solved the problem, rather
than discard the debugging code, I just comment it out. It's reasonably
likely I'm going to want it, or something close to it, when chasing the next
bug. Sometimes it doubles as documentation.

Perhaps we should make OLD_DEBUG or something like that for this use.

> C) getaddrinfo() returning EAI_SYSTEM was retried before, now it's
> considered permanent. If I were writing from scratch I...

Read more...

Revision history for this message
In , Dave Hart (hart-ntp) wrote :

(In reply to comment #6)
> > B) ntp_intres.c has hints.ai_protocol = 17; and a comment about the
> > header that value was sniped from. That should simply come out,
> > setting ai_protocol portably is nontrivial and we don't need it, do
> > we?
>
> Without that, I got 3 copies of each IP Address. One was TCP, one was UDP, I
> don't remember the 3rd. There may be a cleaner fix, but something is needed
> in that area.

I'm a bit surprised, since I'd expect ai_socktype == SOCK_DGRAM would have
excluded TCP. Still, I was apparently wrong to say it is nontrivial to set
ai_protocol portably, as I see ntpq and ntpdc use:

hints.ai_protocol = IPPROTO_UDP;

So that's presumably the safe choice here as well.

Revision history for this message
In , Stenn (stenn) wrote :

Hal,

Would you please re-check this repo against ntp-dev, and discuss
this with Dave Hart (if you think that would be useful)?

Revision history for this message
In , Stenn (stenn) wrote :

Hal,

After more reflection, I am hesitant to see a patch for this go in to 4.2.6.

I think it will be better to rewrite this code completely, after 4.2.6 is
released.

Do you have strong feelings about getting this patch in to 4.2.6?

Revision history for this message
In , H-murray (h-murray) wrote :

Subject: Deferred DNS lookup on pool command only gets 1 server

Harlan says:
> After more reflection, I am hesitant to see a patch for this go in to
> 4.2.6.

> I think it will be better to rewrite this code completely, after
> 4.2.6 is released.

> Do you have strong feelings about getting this patch in to 4.2.6?

Nope.

I could make good arguments either way.

--
These are my opinions, not necessarily my employer's. I hate spam.

Revision history for this message
Barry Fishman (barry-fishman) wrote :
Revision history for this message
Barry Fishman (barry-fishman) wrote : Re: [Bug 548885] [NEW] ntp host name not found error

Barry Fishman <email address hidden> writes:
> NtpStatus:
> remote refid st t when poll reach delay offset jitter
> ==============================================================================
> *ntp.ubuntu.com 193.79.237.14 2 u 54 64 377 112.184 -71.271 36.530
> +0.pool.ntp.org 128.4.1.1 2 u 9 64 377 52.150 -63.289 26.678
> +1.pool.ntp.org 128.32.206.55 3 u 42 64 377 89.701 -69.962 30.392

Note that these are the servers I put in my /etc/hosts file. The
other servers 2.pool.ntp.org and pool.ntp.org (not in the host file) are
still not found.

--
Barry Fishman

Revision history for this message
C de-Avillez (hggdh2) wrote :

Thank you for opening this bug and helping make Ubuntu better. If I understand you correctly, this is not a NTP problem -- you seem to be having an issue with name resolution (DNS), not NTP.

Please check your DNS setup.

Changed in ntp (Ubuntu):
importance: Undecided → Low
status: New → Incomplete
Revision history for this message
Barry Fishman (barry-fishman) wrote : Re: [Bug 548885] Re: ntp host name not found error
Download full text (4.3 KiB)

C de-Avillez <email address hidden> writes:

> Thank you for opening this bug and helping make Ubuntu better. If I
> understand you correctly, this is not a NTP problem -- you seem to be
> having an issue with name resolution (DNS), not NTP.
>
> Please check your DNS setup.
>
> ** Changed in: ntp (Ubuntu)
> Importance: Undecided => Low
>
> ** Changed in: ntp (Ubuntu)
> Status: New => Incomplete

DNS seems to work with everything else I've tried. Firefox,
chromium-browser, dig, gethostip, emacs/gnus, even ntpdate.
If I could find another program that failed I would use that for
testing.

With ntpd stopped:

$ ntpdate ntp.ubuntu.com
27 Mar 14:16:46 ntpdate[22200]: adjust time server 91.189.94.4 offset 0.000613 sec
$ ntpdate 0.pool.ntp.org
27 Mar 14:16:59 ntpdate[22201]: adjust time server 69.94.105.81 offset -0.000226 sec
$ ntpdate 1.pool.ntp.org
27 Mar 14:17:06 ntpdate[22206]: adjust time server 4.79.132.217 offset -0.001346 sec
$ ntpdate 2.pool.ntp.org
27 Mar 14:17:12 ntpdate[22207]: adjust time server 208.53.158.34 offset 0.009339 sec
$ ntpdate pool.ntp.org
27 Mar 14:17:20 ntpdate[22208]: adjust time server 72.167.54.201 offset 0.006475 sec

With ntpd started:

$ ntpq
ntpq> peers
No association ID's returned
ntpq> host ntp.ubuntu.com
current host set to ntp.ubuntu.com
ntpq> peers
ntp.ubuntu.com: timed out, nothing received
***Request timed out
ntpq> host 0.pool.ntp.org
current host set to 0.pool.ntp.org
ntpq> peers
     remote refid st t when poll reach delay offset jitter
==============================================================================
*clock.fmt.he.ne .PPS. 1 u 226 1024 377 0.282 0.138 0.125
+clock.sjc.he.ne .CDMA. 1 u 572 1024 377 1.793 0.117 0.878
+clepsydra.dec.c .GPS. 1 u 246 1024 377 0.998 0.106 0.331
-nist1.symmetric .ACTS. 1 u 207 1024 377 5.771 2.650 0.198
-time-A.timefreq .ACTS. 1 u 557 1024 377 43.445 -2.132 0.232
-clock.xmission. .GPS. 1 u 227 1024 377 18.184 -0.026 0.003
ntpq> host 1.pool.ntp.org
current host set to 1.pool.ntp.org
ntpq> peers
     remote refid st t when poll reach delay offset jitter
==============================================================================
 ntp1.csl.tjhsst 192.5.41.40 2 u 318 1024 377 0.170 -0.586 1.770
*ntp0.usno.navy. .USNO. 1 u 363 1024 377 20.175 -2.152 0.252
+ntp.alaska.edu .GPS. 1 u 283 1024 377 123.193 2.198 0.957
+clock.isc.org .GPS. 1 u 368 1024 377 72.917 0.836 0.592
 18.18.1.95 .STEP. 16 u - 1024 0 0.000 0.000 0.000
 LOCAL(1) .LOCL. 10 l 64 64 377 0.000 0.000 0.001
ntpq> host 2.pool.ntp.org
current host set to 2.pool.ntp.org
ntpq> peers
2.pool.ntp.org: timed out, nothing received
***Request timed out
ntpq> host pool.ntp.org
current host set to pool.ntp.org
ntpq> peers
     remote refid st t when poll reach delay offset jitter
==============================================================================
*clock.fmt.he.ne .PPS. 1 u 343 1024 3...

Read more...

Revision history for this message
C de-Avillez (hggdh2) wrote :

OK. I found the upstream bug on this. Please note that -- per upstream -- this is still open. The issue stems from a *non-working* DNS at the time NTP is triggered (most probably on boot).

This is why, when you are up & running, you do not see this error. Please see the upstream bug for details.

So. Right now you have the following options:
* move to IP addresses instead of FQSN
* have a working DNS at the time NTP is triggered

Changed in ntp (Ubuntu):
status: Incomplete → Triaged
Revision history for this message
Barry Fishman (barry-fishman) wrote :

C de-Avillez <email address hidden> writes:

> OK. I found the upstream bug on this. Please note that -- per upstream
> -- this is still open. The issue stems from a *non-working* DNS at the
> time NTP is triggered (most probably on boot).

The boot time issue is not what is happening for me. Retarting the ntpd
daemon using the /etc/init.d/ntp script when DNS is known to be working
still fails. Running it from the command line as previously mentioned
still fails.

Since I can get the upstream ntp-4.2.6 release to fail, I am going to
try to debug that rather than the 4.2.4p8@1.1620-0 lucid version, unless
you some need otherwise. My guess would be it involves a memory leak,
although I would assume the code in deamon programs like ntpd are very
closely examined. I can't otherwise explain why a program that works
from /home/util64/ntp-4.2.6/bin would fail when moved to /usr/sbin.

Nobody else seems to have the problem with Lucid, and there is a simple
fix to get it working, if someone else reports the problem.
--
Barry Fishman

Revision history for this message
Barry Fishman (barry-fishman) wrote :

My problem is fixed.

When I installed 10.10, I saw a similar problem, but noticed in syslog the message:

Oct 18 19:49:07 ecube kernel: [209271.514454] type=1400 audit(1287445747.259:19): apparmor="DENIED" operation="open" parent=1 profile="/usr/sbin/ntpd" name="/etc/resolv.conf-google" pid=3036 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0

I had as my /etc/resolve.conf a soft link to /etc/resolv.conf-google. Evidenlty apparmor does not allow
/usr/sbin/ntpd to run with resolver reference to /etc/resolv.conf-google. When I had made /etc/ntpd a soft
link elsewhere there was no problem.

I now have a /etc/resolv.conf file and not a soft link, so things work fine.

--
Barry Fishman

Revision history for this message
Mike Kupfer (mkupfer37) wrote :

After enabling NTP this morning, I noticed very similar messages to the above in my syslog.

Mar 13 09:16:23 assam kernel: [ 927.068998] type=1503 audit(1300032983.633:25): operation="open" pid=2184 parent=1 profile="/usr/sbin/ntpd" requested_mask="r::" denied_mask="r::" fsuid=0 ouid=0 name="/etc/resolv.conf.rawbw"
Mar 13 09:16:23 assam kernel: [ 927.069297] type=1503 audit(1300032983.633:26): operation="open" pid=2184 parent=1 profile="/usr/sbin/ntpd" requested_mask="r::" denied_mask="r::" fsuid=0 ouid=0 name="/etc/resolv.conf.rawbw"
Mar 13 09:16:25 assam ntpd_initres[2186]: host name not found: ntp.rawbw.com
Mar 13 09:16:25 assam ntpd_initres[2186]: host name not found: ntp.ubuntu.com

I had /etc/resolv.conf as a symlink pointing to resolv.conf.rawbw After deleting the symlink and copying /etc/resolv.conf.rawbw to /etc/resolv.conf, I restarted ntpd (using "/etc/init.d/ntp stop" and "/etc/init.d/ntp start"). This time I didn't get the error messages.

The permissions on /etc/resolv.conf.rawbw are

  -rw-r--r-- 1 root root 80 Mar 13 09:01 resolv.conf.rawbw

so it's not a problem with the file permissions. dig(1) and everything else worked just fine when /etc/resolv.conf was a symlink. Why is it a problem for ntpd?

Revision history for this message
In , Dave Hart (hart-ntp) wrote :

I believe this bug was fixed by 4.2.7p22 and subsequent cleanup, which introduced both a new ntp_intres.c implementation with a callback-based interface, and a new implementation of the pool command modelled on manycast.

Thanks for your work on the alternative 4.2.6 patch, Hal.

Changed in ntp:
importance: Unknown → Medium
status: Unknown → Fix Released
Changed in ntp (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.