10.04 LTS: Failure to start instance due to network address failure

Bug #728018 reported by Torsten Spindler
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
eucalyptus (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

I'm running tests and launch rather large numbers of instances on a UEC running with Ubuntu 10.04 LTS. I discovered that instances regularly fail to start and it seems to be that they never get a network address assigned. In the cc.log I discovered the following entries:

[Wed Mar 2 19:34:08 2011][011910][EUCADEBUG ] RunInstances(): running instance i-3D9C0776 with emiId emi-0CE515A7...
[Wed Mar 2 19:34:08 2011][011910][EUCAERROR ] vnetAddHost(): failed to add host d0:0d:3D:9C:07:76 on vlan 10
[Wed Mar 2 19:34:08 2011][011910][EUCADEBUG ] RunInstances(): assigning MAC/IP: d0:0d:3D:9C:07:76/0.0.0.0/0.0.0.0/14
[Wed Mar 2 19:34:08 2011][011910][EUCAERROR ] RunInstances(): could not find/initialize any free network address, failing doRunInstances()

The problems seems to be only temporary, as later on I can launch more instances.

Eucalyptus in use is 1.6.2-0ubuntu30.4.

Revision history for this message
Torsten Spindler (tspindler) wrote :

Another fail during tonight's test run:

[Thu Mar 3 01:36:43 2011][011910][EUCADEBUG ] RunInstances(): running instance i-49AF07C3 with emiId emi-0CE515A7...
[Thu Mar 3 01:36:43 2011][011910][EUCAERROR ] vnetAddHost(): failed to add host d0:0d:49:AF:07:C3 on vlan 10
[Thu Mar 3 01:36:43 2011][011910][EUCADEBUG ] RunInstances(): assigning MAC/IP: d0:0d:49:AF:07:C3/0.0.0.0/0.0.0.0/6
[Thu Mar 3 01:36:43 2011][011910][EUCAERROR ] RunInstances(): could not find/initialize any free network address, failing doRunInstances()
[Thu Mar 3 01:36:43 2011][011910][EUCADEBUG ] RunInstances(): done
[Thu Mar 3 01:36:43 2011][011910][EUCAERROR ] vnetAttachTunnels(): bad input params
[Thu Mar 3 01:36:43 2011][011910][EUCADEBUG ] maintainNetworkState(): failed to attach tunnels for vlan 10 during maintainNetworkState()
[Thu Mar 3 01:36:43 2011][011910][EUCAERROR ] shawn(): network state maintainance failed
[Thu Mar 3 01:36:45 2011][012497][EUCADEBUG ] monitor_thread(): running
[Thu Mar 3 01:36:45 2011][012497][EUCAINFO ] refresh_resources(): called
[Thu Mar 3 01:36:45 2011][012497][EUCADEBUG ] refresh_resources(): calling http://172.24.55.254:8775/axis2/services/EucalyptusNC
[Thu Mar 3 01:36:45 2011][012497][EUCADEBUG ] refresh_resources(): time left for next op: 60

Revision history for this message
Torsten Spindler (tspindler) wrote :

For completeness the network definition of the cloud. If wanted I can grant ssh access to the system:

VNET_MODE="MANAGED-NOVLAN"
VNET_SUBNET="172.19.0.0"
VNET_NETMASK="255.255.0.0"
VNET_DNS="172.24.1.1"
VNET_ADDRSPERNET="32"
VNET_PUBLICIPS="172.24.129.136"

Again, when starting new instances it works and the error does not show up again.

Revision history for this message
Torsten Spindler (tspindler) wrote :

And another one:

[Thu Mar 3 11:48:55 2011][011912][EUCADEBUG ] RunInstances(): running instance i-3393055A with emiId emi-0CE515A7...
[Thu Mar 3 11:48:55 2011][011912][EUCAERROR ] vnetAddHost(): failed to add host d0:0d:33:93:05:5A on vlan 10
[Thu Mar 3 11:48:55 2011][011912][EUCADEBUG ] RunInstances(): assigning MAC/IP: d0:0d:33:93:05:5A/0.0.0.0/0.0.0.0/16
[Thu Mar 3 11:48:55 2011][011912][EUCAERROR ] RunInstances(): could not find/initialize any free network address, failing doRunInstances()
[Thu Mar 3 11:48:55 2011][011912][EUCADEBUG ] RunInstances(): done
[Thu Mar 3 11:48:55 2011][011912][EUCAERROR ] vnetAttachTunnels(): bad input params
[Thu Mar 3 11:48:55 2011][011912][EUCADEBUG ] maintainNetworkState(): failed to attach tunnels for vlan 10 during maintainNetworkState()
[Thu Mar 3 11:48:55 2011][011912][EUCAERROR ] shawn(): network state maintaina

Is there any more meaningful debug information I can provide than the excerpts from the cc.log?

Revision history for this message
Torsten Spindler (tspindler) wrote :

The problem seems to occur quite randomly, here the statistics from the last 5 test runs:

Started: 25 Clients, 4 Servers.
Stopped: 21

Started: 340 Clients, 27 Servers.
Stopped: 360

Started: 70 Clients, 8 Servers.
Stopped: 71

Started: 70 Clients, 5 Servers.
Stopped: 68

Started: 30 Clients, 6 Servers.
Stopped: 26

Revision history for this message
Torsten Spindler (tspindler) wrote :

From time to time I see these timeouts, could they be related?

         4 0 8
path=/services/Eucalyptus/?AWSAccessKeyId=WKy3rMzOWPouVOxK1p3Ar1C2uRBwa2FBXnCw&A
ction=TerminateInstances&InstanceId.1=i-47F10968&InstanceId.2=i-4D930877&Instanc
eId.3=i-2BFF05A8&InstanceId.4=i-48E108C9&InstanceId.5=i-51070907&SignatureMethod
=HmacSHA256&SignatureVersion=2&Timestamp=2011-03-04T14%3A08%3A28&Version=2009-11
-30&Signature=8N3z8cgovnd94xj2WuFwyDdoF9DohA5hYqe%2ByREbBsE%3D
Failure: 408 Request Timeout

Revision history for this message
Torsten Spindler (tspindler) wrote :

I wonder if the timeout for the termination request leads to a situation when the pool of network addresses gets empty. Is there a way to see the internal IP address allocation?

Revision history for this message
Torsten Spindler (tspindler) wrote :

The only work around for the problem seems to be to terminate instances and then try again.

Scott Moser (smoser)
Changed in eucalyptus (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
mahmoh (mahmoh)
Changed in eucalyptus (Ubuntu):
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.