Instance fails to start

Bug #610479 reported by C de-Avillez on 2010-07-27
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fix Released
eucalyptus (Ubuntu)
Dustin Kirkland 
Dustin Kirkland 

Bug Description

Instances fail to start, at a rate near 10%. All failed instances show up in euca-describe-instances with both public and private IPs set to, and stay in pending for a while (in our tests, around 20 minutes) before being forcefully terminated.

Maverick, Eucalyptus 2.0.

eucalyptus-cc 2.0~bzr1211-0ubuntu1 eucalyptus install ok installed
eucalyptus-cloud 2.0~bzr1211-0ubuntu1 eucalyptus install ok installed
eucalyptus-common 2.0~bzr1211-0ubuntu1 eucalyptus install ok installed
eucalyptus-gl 2.0~bzr1211-0ubuntu1 eucalyptus install ok installed
eucalyptus-java-common 2.0~bzr1211-0ubuntu1 eucalyptus install ok installed
eucalyptus-sc 2.0~bzr1211-0ubuntu1 eucalyptus install ok installed
eucalyptus-walrus 2.0~bzr1211-0ubuntu1 eucalyptus install ok installed
libeucalyptus-commons-ext-java 0.5.0-0ubuntu2 eucalyptus-commons-ext install ok installed
uec-component-listener 2.0~bzr1211-0ubuntu1 eucalyptus install ok installed

C de-Avillez (hggdh2) wrote :
Download full text (6.4 KiB)

I searched the logs for one such instance:

ubuntu@cempedak:/var/log/eucalyptus$ grep i-58C70904 *
cc.log.2:[Mon Jul 26 20:32:27 2010][013178][EUCADEBUG ] RunInstances(): running instance i-58C70904 with emiId emi-3C7E1C5B...
cc.log.2:[Mon Jul 26 20:42:45 2010][013178][EUCADEBUG ] TerminateInstances(): params: userId=(null), instIdsLen=1, firstInstId=i-58C70904
cloud-debug.log.2:20:32:27 INFO [ResourceToken:ClusterSink.16] :1280190747674:ResourceToken:77c754b4-9383-476a-bb07-a516fbb228a0:TOKEN_SPLIT:ResourceToken [addresses=[], amount=1, cluster=UEC-TEST1, correlationId=77c754b4-9383-476a-bb07-a516fbb228a0, creationTime=Mon Jul 26 20:32:27 EDT 2010, instanceIds=[i-58C70904], networkTokens=[NetworkToken [cluster=UEC-TEST1, indexes=[4], name=admin-uectest-g0, networkName=uectest-g0, userName=admin, vlan=10]], sequenceNumber=398, userName=admin, vmType=c1.xlarge]:ClusterAllocator.<init>.129
cloud-debug.log.2:20:32:27 DEBUG [QueuedEventCallback:UEC-TEST1-ClusterAllocator-208] :1280190747934:QueuedEventCallback:QUEUE:class[VmRunType reservationId=r-2EAC0570 userData= min=1 max=1 vlan=10 launchIndex=0 imageInfo=VmImageInfo [ancestorIds=[], imageId=emi-3C7E1C5B, imageLocation=, kernelId=eki-D9422170, kernelLocation=, productCodes=[], ramdiskId=null, ramdiskLocation=null, size=1476395008] vmTypeInfo=VmTypeInfo [name='c1.xlarge', memory=2048, disk=20, cores=4] keyInfo=VmKeyInfo [fingerprint=10:14:76:bd:e7:4c:ec:96:20:da:2f:2d:5f:87:61:0c:ad:53:a6:40, name=uectest-k0, value=ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCQTKr0e39y1gpz2e0w80xfgmmI5tIoGd7cLYLb2ii12gJJImJwyOTaViXoUn26aQ5iN3z4zzXGW0LxupqLHWDApTZYG4Aqu2GuaVyxcBTgCJ7qthoKrn7wjifJsr6gF5a8LrPWdPu+8WKEIxLy/o45Y99jDuBZHWvNG4SbnUS94XX7Z1dLwaSYMMVGryB0TFCJ8UJXNLOGsKqJnshJwjMiGi5LONhjmIfrqzncghcd1M+9gGJ1P30EG5rrgTaWyIGetI6oKY6EKip0FJjBXFBhEI9rEHruSZIt4A0ADxlldqYD72XSHTUnn3dSD+wtlzwKEF5uR7Ss5HXXe0ANbGvf admin@eucalyptus] instanceIds=[i-58C70904] macAddresses=[d0:0d:58:C7:09:04] networkNames=[uectest-g0] networkIndexList=[4] correlationId=77c754b4-9383-476a-bb07-a516fbb228a0-220298 userId=admin effectiveUserId=eucalyptus _return=false statusMessage=null]
cloud-debug.log.2:20:42:45 INFO [VmInstance:New I/O client worker #2-10] i-58C70904 state change: PENDING -> TERMINATED
cloud-debug.log.2:20:42:45 DEBUG [QueuedEventCallback:New I/O client worker #2-10] :1280191365437:QueuedEventCallback:QUEUE:class edu.ucsb.eucalyptus.msgs.TerminateInstancesType:[TerminateInstancesType instancesSet=[i-58C70904] correlationId=30b262eb-d044-492a-9040-1901ebbbff48 userId=null effectiveUserId=null _return=false statusMessage=null]:VmInstance.setState.247
cloud-debug.log.2:20:42:45 INFO [VmInstance:New I/O client worker #2-10] :1280191365438:VmInstance:VM_STATE:user=admin:instance=i-58C70904:type=c1.xlarge:state=TERMINATED:details=[]:SystemState.handle.147
cloud-debug.log.2:20:42:45 INFO [TerminateInstan...


description: updated
Thierry Carrez (ttx) on 2010-07-28
Changed in eucalyptus (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Dave Walker (davewalker) wrote :

@C de-Avillez: Is this behavior presenting similar characteristics to the pre SRU Lucid (1.6.2) version? Is it totally unrelated to bug #566792 ?

C de-Avillez (hggdh2) wrote :

@Dave W:

> Is this behavior presenting similar characteristics to the pre SRU Lucid (1.6.2) version?

no, signature is distinct. On bug 566792 we would have the instance startup and reach RUNNING, with both private and public IPs set to the same (private). Here the instance does not ever seen to get off PENDING, and both (private|public) IP addresses are shown as

> Is it totally unrelated to bug #566792 ?

Given the above, yes, it seems totally unrelated. I have not yet had time to dig into it, though. I will try to repeat -- heh. I *will* repeat, it is guaranteed to fail -- and follow an instance to the NC.

C de-Avillez (hggdh2) wrote :

milestoning to Alpha3

Changed in eucalyptus (Ubuntu):
milestone: none → maverick-alpha-3
Thierry Carrez (ttx) on 2010-08-03
Changed in eucalyptus (Ubuntu Maverick):
milestone: maverick-alpha-3 → ubuntu-10.10-beta
Ye Wen (wenye) wrote :

Could you verify this is still happening with the latest commits? I guess one of my fixes solves the problem.

C de-Avillez (hggdh2) wrote :

Seems to still happen on r1219. I have just ran a small (200) instances test, and I got one such error. I will be uploading the logs to the bzr repository in a few.

C de-Avillez (hggdh2) wrote :

logs uploaded:

Pushed up to revision 28.

Thierry Carrez (ttx) on 2010-08-11
Changed in eucalyptus (Ubuntu Maverick):
assignee: nobody → Dave Walker (davewalker)
Dave Walker (davewalker) wrote :

Recent results indicate that this issue is fixed in an upstream snapshot.

Changed in eucalyptus (Ubuntu Maverick):
status: Confirmed → Fix Committed
Dave Walker (davewalker) wrote :

Marking fixed release, as the fix is now released :)

Changed in eucalyptus (Ubuntu Maverick):
status: Fix Committed → Fix Released
C de-Avillez (hggdh2) wrote :

Reopening. On my first test run on r1230 I got the following results:

2010-08-13 15:16:03,002 SUMMARY:INFO not-tested=16
2010-08-13 15:16:03,002 SUMMARY:INFO being-tested=0
2010-08-13 15:16:03,002 SUMMARY:INFO success=130
2010-08-13 15:16:03,002 SUMMARY:INFO failed=54
2010-08-13 15:16:03,002 SUMMARY:INFO rescheduled=0

I did not look at all, but pretty much all 54 failed did *not* reach successful running state. Of course, this was a stress test (200 instances, as fast as possible, with one single NC (16 cores). I then ran another test with a larger interval between euca-run-instances, so that we should always have available VMs. I still got about 8% failure rate, all of them failure to start.

Logs are at lp:~hggdh2/uec-qa, revision 35.

Changed in eucalyptus (Ubuntu Maverick):
status: Fix Released → Triaged
Changed in eucalyptus (Ubuntu Maverick):
assignee: Dave Walker (davewalker) → Dustin Kirkland (kirkland)
status: Triaged → In Progress
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eucalyptus - 2.0~bzr1231-0ubuntu1

eucalyptus (2.0~bzr1231-0ubuntu1) maverick; urgency=low

  * New upstream snapshot, -r1231, bugs fixed by upstream:
    - LP: #566792 - metadata service returns empty data with 200 OK
    - LP: #606243 - euca-describe-availability-zones verbose corrupted
    - LP: #563175 - should hold on to console logs after terminated
    - LP: #613832 - Cannot mark address as allocating
    - LP: #610479 - Instance fails to start
 -- Dustin Kirkland <email address hidden> Tue, 17 Aug 2010 12:49:28 -0400

Changed in eucalyptus (Ubuntu Maverick):
status: In Progress → Fix Released
Changed in eucalyptus:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers