[alpha3] Re-registered NC fails to be detected.

Bug #530091 reported by Torsten Spindler on 2010-03-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
eucalyptus (Ubuntu)
Medium
Thierry Carrez

Bug Description

I installed a frontend and node with the alpha3 server CD. Frontend works fine, but
$ euca-describe-availability-zones verbose
does not have any node. I tried
$ euca_conf --discover-nodes --no-rsync
but the node was not discovered. I added it then by hand to /etc/eucalyptus/eucalyptus.conf and copied the keys manually. However, the node is still not part of the cloud. On the node I see the following eucalyptus processes:

ubuntu@node01:~$ ps aux | grep eucalyptus
root 1211 0.0 0.0 4972 2176 ? Ss 15:52 0:00 apache2 -f /var/run/eucalyptus/httpd-nc.conf -D FOREGROUND
108 1260 0.0 0.0 45288 3988 ? Sl 15:52 0:00 apache2 -f /var/run/eucalyptus/httpd-nc.conf -D FOREGROUND
root 1444 0.0 0.0 2232 1004 ? Ss 15:52 0:00 avahi-publish -s torstentest node _eucalyptus._tcp 8775 txtvers=1 protovers=1.5.0 type=node

Thierry Carrez (ttx) wrote :

There are some changes in the nodes registration process. It should now work automatically and not require "discover-nodes". You can look into the CC's /var/log/eucalyptus/registration.log if it detected the NC and (if yes) what the euca_conf --register-nodes command returned.

From your last attempts, it looks like the key from the CC was not distributed to the /var/lib/eucalyptus/.ssh/authorized_keys on the NC during the install process. Could you check the contents of that file on the NC ?

Was the rest of the NC install preseeded ? Or did you have to manually enter username/password and other install details ? I suspect you started the NC install too early and no preseed was yet available from the CC.

Changed in eucalyptus (Ubuntu):
status: New → Incomplete
Torsten Spindler (tspindler) wrote :

I'm pretty sure I waited for the front-end to be ready, e.g. euca-describe-availability-zones verbose returns a good output. I attach the registration log from the front-end.

On the node the authorized_keys file contain a good key for the frontend. As user eucalyptus on the front-end I can do a password less login to node01 with ssh.

I can re-install the node and see if it gets any better.

$ sudo cat authorized_keys
[sudo] password for ubuntu:
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAt9C17TNGYpqIyCT74LtzXE1fpVluKGCIql8HBufmux7a5/AdVqa+b4OMs+bkNbQRPJiaUmGKyRioHX9vZwngN8FlDxI35QG5keEd/flI0ltXghnOVBoHXh9QVc2ux78GzAu+u0bxI9En4DfETvidgTcVmHNSUJlT270oQX7JiXj0bfK87S/d5vzA4pZInODFYilX+RHCgaZocgYkYP2cGqH2hFR2KSmbzVgkeV0Axk9FQAMSmrrPwrnenYmC9oobo0LQ8ZUp/1STruQVxWkLph8wfPY0JRrw5PYmsYIf8hys9t5vqhNCKUs29tCJDaSy/HdcXYB/GwgFgSjkSaPE5Q== eucalyptus@frontend
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAt9C17TNGYpqIyCT74LtzXE1fpVluKGCIql8HBufmux7a5/AdVqa+b4OMs+bkNbQRPJiaUmGKyRioHX9vZwngN8FlDxI35QG5keEd/flI0ltXghnOVBoHXh9QVc2ux78GzAu+u0bxI9En4DfETvidgTcVmHNSUJlT270oQX7JiXj0bfK87S/d5vzA4pZInODFYilX+RHCgaZocgYkYP2cGqH2hFR2KSmbzVgkeV0Axk9FQAMSmrrPwrnenYmC9oobo0LQ8ZUp/1STruQVxWkLph8wfPY0JRrw5PYmsYIf8hys9t5vqhNCKUs29tCJDaSy/HdcXYB/GwgFgSjkSaPE5Q== eucalyptus@frontend

Thierry Carrez (ttx) wrote :

Everything looks good on the install side... Maybe try sudo euca_conf --deregister-nodes "IP" && sudo euca_conf --register-nodes "IP" and see if you get any error ?

If it succeeds, it's probably an issue on the NC side, do you get anything in the NC logs ?

I tried the deregister and register but not change in the overall
situation.

For the logs on the node controller, I don't see nc.log:

ubuntu@node01:/var/log/eucalyptus$ ls
axis2c.log euca_test_nc.log httpd-nc_error_log

Anything in euca_test_nc.log or httpd-nc_error_log ?

Nothing of interest in there, I attach the two

I'm confirming this.

I have a new UEC setup this morning from the current archive. CLC+WC+CC+SC, and 4xNC.

The CLC was definitely up and running and serving the preseed.conf before the NCs were installed.

The NCs installed correctly, and they have the CLC's ssh key in /var/lib/eucalyptus/.ssh/authorized_keys.

But none of them are registering automatically.

I think I've seen this quite a bit around Alpha3, and when I mentioned it, we chalked it up to preseed/netboot/timing issues.

In any case, this is non ideal. And it looks like a regression to me, as this was working very, very well in Portland.

Changed in eucalyptus (Ubuntu):
status: Incomplete → Confirmed
importance: Undecided → High
Changed in eucalyptus (Ubuntu):
assignee: nobody → Dustin Kirkland (kirkland)
Dustin Kirkland  (kirkland) wrote :

I believe I've tracked down where this problem was introduced:

http://bazaar.launchpad.net/~ubuntu-core-dev/eucalyptus/ubuntu/revision/909

My nodes don't have [ -f "/etc/eucalyptus/eucalyptus-nc.conf" ], so the publication job isn't starting.

This is because the debconf key/value for eucalyptus/cluster-name does not exist on my NC.
  $ echo GET eucalyptus/cluster-name | sudo debconf-communicate

And thus the eucalyptus-nc.postinst isn't able to populate that file.

I'm working on a fix.

Thierry Carrez (ttx) wrote :

@Dustin: In Torsten's logs, the publication is working alright, since the node is detected on the CC's registration logs:

2010-02-26 15:10:29+01:00 | 2758 -> Calling node torstentest node 192.168.1.106
2010-02-26 15:10:30+01:00 | 2758 -> euca_conf --register-nodes returned 0

Also eucalyptus/cluster-name is normally present in the CC preseed ?

Dustin Kirkland  (kirkland) wrote :

Thierry,

Agreed. My issue was actually different. I filed this under: Bug #530937, and I have committed a fix for my issue to the tree.

I'm not sure what's going on with Torsten's issue. I'm going to unassign myself from this bug for now.

Changed in eucalyptus (Ubuntu):
assignee: Dustin Kirkland (kirkland) → nobody
Torsten Spindler (tspindler) wrote :

I confirm that on my cloud there is an eucalyptus-nc.conf on the node and it reads
CC_NAME="torstentest"

I will reinstall the node controller next and see if the problem persists.

Torsten Spindler (tspindler) wrote :

After reinstallation nothing changes. I removed quiet and splash from the kernel boot command line and see the following report:
init: eucalyptus-network (lo) main process (704) killed by TERM signal

Thierry Carrez (ttx) wrote :

Looking at the authorized_keys, might be a node re-registration issue. Did you register the same Node IP with the CC in the past ? When you deregistered/registered manually the node, did you get any error ? After deregister, what do you have in /var/lib/eucalyptus/nodes.list ? Could you reproduce on a full new setup (reinstall the CC)? (I can't)

The process is:
0/ NC Installer copies CC key to NC authorized_keys
1/ NC publishes its existence
2/ CC picks up the publication
3/ CC runs euca_conf --register-nodes
4/ euca_conf --register-nodes syncs up the eucalyptus CC keys to the NC /var/lib/eucalyptus/keys
5/ euca_conf --register-nodes adds IP to the CC nodes.list
6/ NC picks up keys and starts up, starts writing up messages to nc.log

From your logs it looks like 0-3 is working alright, and that 6 never happens. Could you tell where it stops ?

Changed in eucalyptus (Ubuntu):
importance: High → Medium
status: Confirmed → Incomplete

On Wed, 2010-03-03 at 09:48 +0000, Thierry Carrez wrote:
> Looking at the authorized_keys, might be a node re-registration issue.
> Did you register the same Node IP with the CC in the past ?

Yes, it was registered before.

> When you
> deregistered/registered manually the node, did you get any error ?

Nope. But when I register the node it is not listed anywhere, e.g.

$ sudo euca_conf --register-nodes 192.168.1.106

INFO: We expect all nodes to have eucalyptus installed
in //var/lib/eucalyptus/keys for key synchronization.

Trying rsync to sync keys with "192.168.1.106"...The authenticity of
host '192.168.1.106 (192.168.1.106)' can't be established.
RSA key fingerprint is 5e:3f:83:e2:83:18:c5:f9:04:3f:a1:f2:9a:94:30:4e.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.1.106' (RSA) to the list of known
hosts.
eucalyptus@192.168.1.106's password:
done.

ubuntu@frontend:/etc/eucalyptus$ grep -ri 106 *
grep: preseed/preseed.conf: Permission denied

> After
> deregister, what do you have in /var/lib/eucalyptus/nodes.list ?

sudo euca_conf --deregister-nodes 192.168.1.106
[sudo] password for ubuntu:
SUCCESS: removed node '192.168.1.106'
ubuntu@frontend:~$ cd /etc/eucalyptus/
ubuntu@frontend:/etc/eucalyptus$ grep 106 *
eucalyptus.conf:NODES="192.168.1.106"

I added it manually to eucalyptus.conf in the past, will remove it.

> Could
> you reproduce on a full new setup (reinstall the CC)? (I can't)

I will do so, first reinstall the CC, then the NC and report back if the
situation changes.

> The process is:
> 0/ NC Installer copies CC key to NC authorized_keys
> 1/ NC publishes its existence
> 2/ CC picks up the publication
> 3/ CC runs euca_conf --register-nodes
> 4/ euca_conf --register-nodes syncs up the eucalyptus CC keys to the NC /var/lib/eucalyptus/keys
> 5/ euca_conf --register-nodes adds IP to the CC nodes.list
> 6/ NC picks up keys and starts up, starts writing up messages to nc.log
>
> >From your logs it looks like 0-3 is working alright, and that 6 never
> happens. Could you tell where it stops ?
>
> ** Changed in: eucalyptus (Ubuntu)
> Importance: High => Medium
>
> ** Changed in: eucalyptus (Ubuntu)
> Status: Confirmed => Incomplete
>

I think there is still an issue around re-registration of nodes, I did fall into that hole once. We just need to reproduce it to pinpoint where it comes from. Please confirm that it works for you on a fully-new install, so that we can rename that bug "Re-registered NC fails to be detected" :)

Torsten Spindler (tspindler) wrote :

On a fresh install of front-end and node controller the cloud works as expected:

ubuntu@frontend:/var/log/eucalyptus$ euca-describe-availability-zones verbose
AVAILABILITYZONE torsten 192.168.1.103
AVAILABILITYZONE |- vm types free / max cpu ram disk
AVAILABILITYZONE |- m1.small 0002 / 0002 1 128 2
AVAILABILITYZONE |- c1.medium 0002 / 0002 1 256 5
AVAILABILITYZONE |- m1.large 0001 / 0001 2 512 10
AVAILABILITYZONE |- m1.xlarge 0001 / 0001 2 1024 20
AVAILABILITYZONE |- c1.xlarge 0000 / 0000 4 2048 20

Thierry Carrez (ttx) on 2010-03-03
summary: - [alpha3] NC fails to be detected.
+ [alpha3] Re-registered NC fails to be detected.
Changed in eucalyptus (Ubuntu):
status: Incomplete → Confirmed
Dustin Kirkland  (kirkland) wrote :

One thing we noticed in debian/registration/node:

# Check if node isn't already registered
. /etc/eucalyptus/eucalyptus.conf
for nip in "$NODES"; do
  if [ "${nip# }" == "${IP}" ]; then
    reglog "Node $IP is already registered."
    exit 1
  fi
done

This code doesn't support /var/lib/eucalyptus/nodes.list, but it seems like it should ...

Changed in eucalyptus (Ubuntu):
assignee: nobody → Thierry Carrez (ttx)
Daniel Nurmi (nurmi) wrote :

It looks like the problem is related to the fact that euca_conf --deregister will de-register a node, but the uec_component_listener is not informed when de-registration happens. Thus, if the component listener is restarted, it will re-register the node as soon as it sees the avahi publication of the node.

One possible avenue here would be to modify euca_conf in the UEC to send a signal of some sort (perhaps, by putting a message into registration.log, which uec_component_listener reads periodically?), to inform the listener that a node has been de-registered and should not be re-registered.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eucalyptus - 1.6.2-0ubuntu23

---------------
eucalyptus (1.6.2-0ubuntu23) lucid; urgency=low

  * debian/eucalyptus-udeb.postinst, debian/eucalyptus-udeb.templates:
    add a debconf/preseed option to skip the euca_find_component
    checks in the installer
  * debian/eucalyptus-network.upstart: only rewrite ipaddr.conf if
    it does not exist (admin can force a rewrite by removing it);
    only write the pertinent addrs to ipaddr.conf, LP: #523126
  * debian/registration/node: ensure that nodes.list is used in
    building the $NODES ip list, might (in part) solve LP: #530091
  * debian/rules: drop the install-init --noscripts option, as this
    is not what we want and appears to have arrived from a bad
    copy-n-paste; this fix ensures that eucalyptus-nc is started on
    on package install, LP: #545606
 -- Dustin Kirkland <email address hidden> Wed, 24 Mar 2010 18:10:05 -0700

Changed in eucalyptus (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers