Comment 7 for bug 564355

Revision history for this message
Piotr T Zbiegiel (pzbiegiel) wrote :

Not sure about logs from all Eucalyptus components. This problem seems to be centered in the cluster controller code. I think the telling log lines I've seen after the second "euca-run-instances" command are:

[Thu Apr 15 14:29:46 2010][001328][EUCAINFO ] RunInstances(): called
[Thu Apr 15 14:29:46 2010][001328][EUCAERROR ] vnetAddHost(): failed to add host d0:0d:3B:E6:07:11 on vlan 10
[Thu Apr 15 14:29:46 2010][001328][EUCAERROR ] RunInstances(): could not find/initialize any free network address, failing doRunInstances()

Once the cluster controller fails to issue network addresses for the new instances it doesn't bother to farm them out to the node controllers. Those instances are never started on any of the NCs.

It almost seems like the cluster controller forgets about the available network addresses on a given network and won't allocate addresses for new instances. The most distressing thing is (and this doesn't happen every time) the network associated with a given security group is deallocated by the cluster controller. Its rule chain is removed from iptables and I've even seen other users get issued the same slice of network addresses for their new security groups. All this while instances in the old security group are still in a running state.

I can confirm Aimon's comment. We have seen this behavior with ADDRSPERNET set to 256, 128, and 64.