EC2 builds pile-up due to lack of instance slots

Bug #1021191 reported by Paul Sokolovsky
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro AWS Tools
Fix Released
Critical
Paul Sokolovsky

Bug Description

This (EU) morning we had a case of pretty big pending builds pile-up in Jenkins service: ~15 build on ci.linaro.org, ~5 builds on android-build. Both hosts are set at 28 instance cap.

On a quick look, there were 6 cbuild instances running at that time (most for more than a day) which took bunch of lots. On closer look, we also have few undocumented long-running instances (some for more than a month), which account for other slots. More detailed info to be provided below.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Output of ./check-aws-resources :

FAIL: test_no_unknown_old_instance (__main__.TestEC2Instances)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./check-aws-resources", line 527, in test_no_unknown_old_instance
    self.assertEqual([], no_match)
AssertionError: Lists differ: [] != [Instance:i-91eee8f7, Instance...

Second list contains 8 additional elements.
First extra element 0:
Instance:i-91eee8f7

- []
+ [Instance:i-91eee8f7,
+ Instance:i-6bf7f10d,
+ Instance:i-1f150379,
+ Instance:i-91cddbf7,
+ Instance:i-dae678a3,
+ Instance:i-fcc8ad85,
+ Instance:i-aeddb8d7,
+ Instance:i-18dcba61]

Changed in linaro-aws-tools:
importance: Undecided → Critical
status: New → Confirmed
assignee: nobody → Paul Sokolovsky (pfalcon)
milestone: none → 2012.07
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

$ ec2-describe-instances | grep cbuild | wc -l
6

$ ec2-describe-instances | grep cbuild
INSTANCE i-f8cb0580 ami-ac9943c5 ec2-23-20-134-164.compute-1.amazonaws.com ip-10-244-98-63.ec2.internal running cbuild@crucis 0 c1.medium 2012-07-04T01:14:09+0000 us-east-1b aki-805ea7e9 monitoring-disabled 23.20.134.164 10.244.98.63 ebs paravirtual xen sg-4e13aa27 default
INSTANCE i-f4428c8c ami-37af765e ec2-23-20-53-120.compute-1.amazonaws.com ip-10-243-55-127.ec2.internal running cbuild@crucis 0 c1.medium 2012-07-04T02:47:50+0000 us-east-1b aki-407d9529 monitoring-disabled 23.20.53.120 10.243.55.127 ebs paravirtual xen sg-4e13aa27 default
INSTANCE i-96945dee ami-a29943cb ec2-50-16-78-249.compute-1.amazonaws.com ip-10-212-174-31.ec2.internal running cbuild@crucis 0 c1.xlarge 2012-07-04T03:54:09+0000 us-east-1b aki-825ea7eb monitoring-disabled 50.16.78.249 10.212.174.31 ebs paravirtual xen sg-4e13aa27 default
INSTANCE i-fa915982 ami-ac9943c5 ec2-23-22-214-251.compute-1.amazonaws.com ip-10-196-195-210.ec2.internal running cbuild@crucis 0 c1.medium 2012-07-04T07:03:43+0000 us-east-1b aki-805ea7e9 monitoring-disabled 23.22.214.251 10.196.195.210 ebs paravirtual xen sg-4e13aa27 default
INSTANCE i-d89159a0 ami-ac9943c5 ec2-107-20-53-53.compute-1.amazonaws.com ip-10-244-231-254.ec2.internal running cbuild@crucis 0 c1.medium 2012-07-04T07:03:48+0000 us-east-1b aki-805ea7e9 monitoring-disabled 107.20.53.53 10.244.231.254 ebs paravirtual xen sg-4e13aa27 default
INSTANCE i-c64d99be ami-ac9943c5 ec2-23-22-163-87.compute-1.amazonaws.com ip-10-76-114-53.ec2.internal running cbuild@crucis 0 c1.medium 2012-07-04T21:14:51+0000 us-east-1b aki-805ea7e9 monitoring-disabled 23.22.163.87 10.76.114.53 ebs paravirtual xen sg-4e13aa27 default

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

linaro-aws-tools$ ./pretty-ec2-describe-instances | grep cbuild
i-f8cb0580 cbuild@crucis c1.medium 2012-07-04T01:14:09+0000
i-f4428c8c cbuild@crucis c1.medium 2012-07-04T02:47:50+0000
i-96945dee cbuild@crucis c1.xlarge 2012-07-04T03:54:09+0000
i-fa915982 cbuild@crucis c1.medium 2012-07-04T07:03:43+0000
i-d89159a0 cbuild@crucis c1.medium 2012-07-04T07:03:48+0000
i-c64d99be cbuild@crucis c1.medium 2012-07-04T21:14:51+0000

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

linaro-aws-tools$ ./pretty-ec2-describe-instances | grep -E '(mi|zh).+2012-0[56]'
i-91eee8f7 michaelh@crucis c1.medium 2012-05-28T09:14:48+0000
i-6bf7f10d michaelh@crucis c1.xlarge 2012-05-28T09:20:29+0000
i-fcc8ad85 zhenqiang@linaro m1.large 2012-06-15T02:30:14+0000
i-aeddb8d7 zhenqiang@linaro m1.large 2012-06-15T02:38:38+0000

All these instances unknown to check-aws-resources (see above) and don't have descriptions in AWS console.

Changed in linaro-aws-tools:
status: Confirmed → Triaged
summary: - EC2 build pile-up
+ EC2 builds pile-up due to lack of instance slots
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Sent email to ec2@, Michael Hope, Zhenqiang Chen requesting more info about instances above.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

4 long-running instances were terminated, michaelh confirmed that usually up to 3 cbuild instances may be running, with few more occasionally for more test parallelization.

Closing.

Changed in linaro-aws-tools:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.