Second euca-run-instance request in same security group causes eucalyptus to remove network assoicated with security group
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Eucalyptus |
Fix Released
|
Undecided
|
Unassigned | ||
1.6.2 |
Won't Fix
|
Undecided
|
Unassigned | ||
eucalyptus (CentOS) |
New
|
Undecided
|
Unassigned | ||
eucalyptus (Ubuntu) |
Invalid
|
High
|
C de-Avillez | ||
Lucid |
Won't Fix
|
High
|
Unassigned | ||
Maverick |
Invalid
|
High
|
C de-Avillez |
Bug Description
We are running eucalyptus 1.6.2-0ubuntu27 on lucid beta1 in MANAGED-NOVLAN. I will retest as soon as is feasible with ubuntu30 but as I see no mention of this issue/fix in the changelog I wanted to get the information in your hands.
Eucalyptus has trouble allocating additional VMs to existing security groups in some cases. I tried several tests and saw very similar results. Eucalyptus allows you to request VMs in a given security group. Once all the VMs are running an additional euca-run-instances request for that security group will fail and in some cases the network associated with that security group will be removed from iptables (even if there are running VMs within that security group). The network that was freed up can be re-allocated to another security group but new VMs requested in that security group fail with the same "failed to add host" message.
-------
A typical cycle looks like this (command-line interspersed with snippets of cc.log):
$ euca-run-instances -n 250 -g default…
[Thu Apr 15 14:14:51 2010][001325]
[Thu Apr 15 14:14:51 2010][001324]
[Thu Apr 15 14:14:51 2010][001324]
[Thu Apr 15 14:14:51 2010][001327]
#….Proceeds to run 250 instances successfully…..
$ euca-run-instances -n 1 -g default….
[Thu Apr 15 14:29:46 2010][001376]
[Thu Apr 15 14:29:46 2010][001368]
[Thu Apr 15 14:29:46 2010][001368]
[Thu Apr 15 14:29:46 2010][001328]
[Thu Apr 15 14:29:46 2010][001328]
[Thu Apr 15 14:29:46 2010][001328]
#…..After 15 minutes instance goes to terminated and TerminateInstance() is called many times (once per NC?)…….
[Thu Apr 15 14:39:51 2010][005458]
ls)
[Thu Apr 15 14:39:51 2010][001326]
[Thu Apr 15 14:39:51 2010][005459]
ls)
[Thu Apr 15 14:39:51 2010][001326]
[Thu Apr 15 14:39:51 2010][005460]
ls)
[Thu Apr 15 14:39:51 2010][001326]
[Thu Apr 15 14:39:51 2010][005461]
ls)
#……It then removes the network allocated for the user's default security group even though there are 250 running VMs!!!……
[Thu Apr 15 14:40:00 2010][001328]
#iptables shows that the chain user-default has disappeared!
-------
I tried many different combinations of numbers of nodes, etc.
(ADDRSPERNET is 256)
250 + 1 additional (the 1 additional failed, network was removed and VMs are inaccessible)
100 + 1 additional (the 1 additional failed, network was removed and VMs are inaccessible)
20 + 20 additional (the 20 additional failed, network was removed and VMs are inaccessible)
I did have some success adding to to existing security groups by 10 or 20 nodes at a time. One security group grew to 80 nodes before I received the "failed to add host" messages. It seemed I was more successful when I was making requests rapidly (waiting only a few minutes between requests) rather than waiting for all the nodes to allocate in a given reservation. I am at a loss to the exact cause because some security groups are allowed to expand while others are cut off from receiving additional IPs well before they reach ADDRSPERNET.
Changed in eucalyptus (Ubuntu): | |
importance: | Undecided → High |
Changed in eucalyptus (Ubuntu Lucid): | |
assignee: | nobody → Dustin Kirkland (kirkland) |
Changed in eucalyptus (Ubuntu): | |
milestone: | lucid-updates → none |
Changed in eucalyptus (Ubuntu Lucid): | |
milestone: | lucid-updates → none |
tags: | added: server-mrs |
I was able to repeat this behavior with ADDRSPERNET set to 128. The system seems more prone to this behavior when a user makes requests for large numbers of VMs in a security group and then attempts to add more. Not sure if this bug manifests based on the size of requests or how many IPs are already allocated in a given security group.