UEC NC failed to fetch preseed.conf from CC using lucid-server-amd64-20100218

Bug #524147 reported by Mathias Gug
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
eucalyptus (Ubuntu)
Medium
Unassigned

Bug Description

While trying to run an automated install from an iso, the NC fails to fetch the preseed.conf from the CLC+Walrus+CC+SC system:

   ┌────────────────┤ [!!] Select cloud installation mode ├────────────────┐
   │ │
   │ Failed to retrieve the preconfiguration file │
   │ The file needed for preconfiguration could not be retrieved from │
   │ http://#2 10.55.55.2:8774/preseed/preseed.conf. The installation will │
   │ proceed in non-automated mode. │
   │ │
   │ <Continue> │
   │ │
   └───────────────────────────────────────────────────────────────────────┘

Note the strange url of the CC: #2 10.55.55.2:8774

Revision history for this message
Mathias Gug (mathiaz) wrote :

On the CC:

ubuntu@cempedak:~$ ps -ef | grep avahi
avahi 991 1 0 18:07 ? 00:00:00 avahi-daemon: running [cempedak.local]
avahi 992 991 0 18:07 ? 00:00:00 avahi-daemon: chroot helper
root 1411 1 0 18:08 ? 00:00:00 avahi-publish -s Walrus _eucalyptus._tcp 8773 txtvers=1 protovers=1.5.0 type=walrus ipaddr=10.55.55.2
root 1412 1 0 18:08 ? 00:00:00 avahi-publish -s UEC-TEST1 _eucalyptus._tcp 8774 txtvers=1 protovers=1.5.0 type=cluster ipaddr=10.55.55.2
root 1413 1 0 18:08 ? 00:00:00 avahi-publish -s UEC-TEST1 storage _eucalyptus._tcp 8773 txtvers=1 protovers=1.5.0 type=storage ipaddr=10.55.55.2
root 2058 1 0 18:09 ? 00:00:00 avahi-publish -s CLC _eucalyptus._tcp 8773 txtvers=1 protovers=1.5.0 type=cloud ipaddr=10.55.55.2
ubuntu 23237 21328 0 19:17 pts/0 00:00:00 grep --color=auto avahi

Revision history for this message
Mathias Gug (mathiaz) wrote :

This is with eucalyptus 1.6.2-0ubuntu1.

Revision history for this message
Mathias Gug (mathiaz) wrote :

Installing directly from the archive using the netboot installer works correctly with 1.6.2-0ubuntu2.

Revision history for this message
Thierry Carrez (ttx) wrote :

Sounds like something has been advertising itself as "UEC-TEST1" and your CC advertised itself under the same name and got published as "UEC-TEST1 #2". That confused the logic in Colin's get_component which failed to parse the IP from the announcement.

So there may be a bug in that last part, but I suspect the cause was a duplicate avahi announcement for the same cluster name. Can you reproduce it ?

Changed in eucalyptus (Ubuntu):
importance: Undecided → High
status: New → Incomplete
Thierry Carrez (ttx)
Changed in eucalyptus (Ubuntu):
importance: High → Medium
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Marking confirmed, as I hit this on the test rig on all of my nodes against Alpha3 ISOs.

Changed in eucalyptus (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Thierry Carrez (ttx) wrote :

@Dustin: see my comment 4 above: I think this bug is linked to the reinstall of CCs under the same cluster name (avahi will call it #2 only if there is another one already published under that name), so it might be an artifact of the test environment.

To workaround it, make sure nothing is already advertising itself as the UEC-TEST1 CC on the test rig... or use another cluster name.

As a permanent fix, euca_find_component should use, as the IP:PORT, the last word rather than every word except the first one (i.e. support both "UEC-TEST1 10.0.0.1:8774" and "UEC-TEST1 #2 10.0.0.1:8774")

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Okay, I found a suitable workaround, at least, for the lab testing ...

Simply ensure that all systems are powered off before starting a new run.

I suspect the reason I hit this was because I had deployed a topo2 setup (with a standalone CC). Then I setup a topo1 installation, which has a CLC+WC+CC+SC on cempedak. But mabolo was still running with a CC, broadcasting its cluster name. Which confused things.

Clearly we need to fix this bug.

But I'm un-blocked on my testing, assuming I start with all physical systems powered off.

Revision history for this message
Thierry Carrez (ttx) wrote :

So euca_find_component could be fixed to be able to use "#2" types, however I'm unsure we should support the case of multiple CCs with the same cluster name running in the same network. That opens the door for lots of bugs, I think. From which CC would the NC download its preseed ?

At least the current situation allows us to spot when you try to autodeploy a broken topology... because it *is* a broken topology. Note that manual ISO install would tell you that there already is a CC when you install the new one. And the NC manual install is supposed to ask you to select your CC when it detects multiple ones.

Revision history for this message
Thierry Carrez (ttx) wrote :

Thinking about it, this should be prevented before, when you install the CC and you select a cluster name for the CC that happens to already exist on the network.

Revision history for this message
Dustin Kirkland  (kirkland) wrote : Re: [Bug 524147] Re: UEC NC failed to fetch preseed.conf from CC using lucid-server-amd64-20100218

Yeah, I agree with your last comment.

A cluster with that name already exists on this network. Pick another.

Changed in eucalyptus (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Dave Walker (davewalker) wrote :

Marking Invalid, as the comments suggest it's actually a bug in the deployment environment and not one that should be encountered in the wild.

Changed in eucalyptus (Ubuntu):
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers