FQDN written to /etc/hosts causes problems for clustering systems

Bug #871966 reported by Clint Byrum on 2011-10-10
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
Undecided
Unassigned
cloud-init
Medium
Scott Moser
cassandra (Juju Charms Collection)
Medium
James Page
cloud-init (Ubuntu)
Low
Scott Moser
Precise
Low
Scott Moser

Bug Description

*** Ubuntu 11.10 Release Note ***

Cloud instances and servers pre-seeded with cloud-init will have their FQDN written to /etc/hosts and pointed to the IP 127.0.1.1. This may cause issues for daemons which try to listen on their hostname, rather than 0.0.0.0, as they will now only be reachable locally, rather than on the network address that their FQDN resolves to.

***

By writing the FQDN to /etc/hosts as resolving to 127.0.1.1, systems like Cassandra have a much harder time determining their address to communicate to other cluster members.

While some might see communicating your IP to others as a bug, being able to use gethostname() and then resolving it to get the actual IP address of one's machine is fairly important.

Its my understanding that in resolving bug #802637 , the Debian networking docs were used as a guide:

http://www.debian.org/doc/manuals/debian-reference/ch05.en.html
Point 5.1.2 specifically.

It does suggest that one needs an FQDN in /etc/hosts.

However cloud-init should only set the addresss if it cannot be determined.

cloud-init should first try gethostbyname() on the FQDN. If it resolves, *do not write FQDN to /etc/hosts*. This assures that if it has been configured to be resolvable by some method in nsswitch.conf such as DNS or NIS or etc., it will not be overidden by /etc/hosts.

related bugs:
 bug 890501: EC2 cloud-init overwrites 127.0.1.1 in /etc/hosts on every reboot

Related branches

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-init (Ubuntu):
status: New → Confirmed
Scott Moser (smoser) wrote :

gethostbyname(hostname) is non-determinable.
there could be multiple responses. and indeterminable order.

James Page (james-page) wrote :

As a work-around until this is resolved in one way or another in juju I'm using:

  dig +short `unit-get private-address`

dig deals with being passed an IP address nicely and will resolve a hostname correctly to the internal IP address of the instance in ec2 and openstack environments.

James Page (james-page) wrote :

A bit more testing reveals that this solution does not work so well in a local juju environment; adapting with a check to see if an IP address is returned from private-address before trying to resolve using dig

Excerpts from Scott Moser's message of Tue Oct 11 00:18:27 UTC 2011:
> gethostbyname(hostname) is non-determinable.
> there could be multiple responses. and indeterminable order.
>

Agreed, which is precisely why I believe cloud-init must leave the FQDN
out of /etc/hosts if there is any response to this call.

Scott Moser (smoser) wrote :

On Tue, 11 Oct 2011, Clint Byrum wrote:

> Excerpts from Scott Moser's message of Tue Oct 11 00:18:27 UTC 2011:
> > gethostbyname(hostname) is non-determinable.
> > there could be multiple responses. and indeterminable order.
> >
>
> Agreed, which is precisely why I believe cloud-init must leave the FQDN
> out of /etc/hosts if there is any response to this call.

That doesn't make any sense. Its non-determinable, its non-determinable
when cloud-init runs as well as later on.
When cloud-init ran to decide if there was a response to this call, it may
give a single value, or may give 4 values, or may give no response. That
may change later on when some other utility was asking.

Clint Byrum (clint-fewbar) wrote :

Excerpts from Scott Moser's message of Tue Oct 11 16:57:27 UTC 2011:
> On Tue, 11 Oct 2011, Clint Byrum wrote:
>
> > Excerpts from Scott Moser's message of Tue Oct 11 00:18:27 UTC 2011:
> > > gethostbyname(hostname) is non-determinable.
> > > there could be multiple responses. and indeterminable order.
> > >
> >
> > Agreed, which is precisely why I believe cloud-init must leave the FQDN
> > out of /etc/hosts if there is any response to this call.
>
> That doesn't make any sense. Its non-determinable, its non-determinable
> when cloud-init runs as well as later on.
> When cloud-init ran to decide if there was a response to this call, it may
> give a single value, or may give 4 values, or may give no response. That
> may change later on when some other utility was asking.
>

The number and content of responses is irrelevant, only that the count
of positive responses is at least one.

The idea isn't to cover the case for all time, its to cover the case
where users have not asked to have fqdn "fixed", yet we need to make a
best effort attempt to make sure it is resolvable during bootup.

If it can be resolved through the normal means, what reason do we have
to override that with a value like 127.0.1.1?

Clint Byrum (clint-fewbar) wrote :

After discussing with Scott Moser, its agreed that this may cause issues, but not necessarily that it is a "bug" as much as a change in behavior that needs documenting. Adding a ubuntu-release-notes task with suggested release note.

Changed in cassandra (juju Charms Collection):
status: New → In Progress
assignee: nobody → James Page (james-page)
Changed in cloud-init (Ubuntu):
importance: Undecided → Low
description: updated
Changed in ubuntu-release-notes:
status: New → Incomplete
status: Incomplete → Fix Committed
James Page (james-page) on 2011-10-13
Changed in cassandra (juju Charms Collection):
status: In Progress → Fix Released
Changed in ubuntu-release-notes:
status: Fix Committed → Fix Released
Scott Moser (smoser) on 2011-11-15
description: updated
Scott Moser (smoser) on 2011-12-20
Changed in cloud-init:
status: New → Triaged
importance: Undecided → Medium
Changed in cassandra (juju Charms Collection):
importance: Undecided → Medium
Scott Moser (smoser) wrote :

This is fix-commited in cloud-init in revision 491 (http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/revision/491).

See the commit message there for more information.

Changed in cloud-init (Ubuntu Precise):
status: Confirmed → Fix Committed
status: Fix Committed → Triaged
Changed in cloud-init:
status: Triaged → Fix Committed
assignee: nobody → Scott Moser (smoser)
Changed in cloud-init (Ubuntu Precise):
assignee: nobody → Scott Moser (smoser)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.6.3~bzr497-0ubuntu1

---------------
cloud-init (0.6.3~bzr497-0ubuntu1) precise; urgency=low

  * New upstream snapshot.
    - cloud-config support for configuring apt-proxy
    - selection of local mirror based on presense of 'ubuntu-mirror' dns
      entry in local domain. (LP: #897688)
    - DataSourceEc2: more resilliant to slow metadata service (LP: #894279)
    - close stdin in all programs launched by cloud-init (LP: #903993)
    - revert management of /etc/hosts to 0.6.1 style (LP: #890501, LP: #871966)
    - write full ssh keys to console for easy machine consumption (LP: #893400)
    - put INSTANCE_ID environment variable in bootcmd scripts
    - add 'cloud-init-per' script for easily running things with a given freq
      (this replaced cloud-init-run-module)
    - support configuration of landscape-client via cloud-config (LP: #857366)
    - part-handlers now get base64 decoded content rather than 2xbase64 encoded
      in the payload parameter. (LP: #874342)
 -- Scott Moser <email address hidden> Thu, 22 Dec 2011 04:07:38 -0500

Changed in cloud-init (Ubuntu Precise):
status: Triaged → Fix Released
Jason X. (jasxun) wrote :

Will the fix also be available for Oneiric?

Scott Moser (smoser) wrote :

Jason,
  I would actually like to not pull the fix back to oneiric. As doing so would break people using oneiric who were expecting the behavior that is present there now. At very least we have to think seriously about it and come up with a list of what types of users would be affected. Do you have any thoughts?

Eric Hammond (esh) wrote :

Though I don't like the current way Oneiric manages /etc/hosts (and submitted related bug #890501) I agree with Scott that it is how Oneiric works on EC2 and changes could cause existing installations to break. In fact, I have automated system code that works around the "bug" which would break if the behavior were fixed. Not horrible for me personally, but I don't know how many might be in a similar situation.

Scott Moser (smoser) on 2012-04-11
Changed in cloud-init:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers