ci.linaro.org AMI, etc. setup started to get disorganized and bitrot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linaro CI |
Confirmed
|
High
|
Unassigned |
Bug Description
While working on ci.linaro.org migration, I find many aspects of its current Jenkins setup - both global and per-job to be unclear. There're many questions were communicated to Fathi, some of them are still unanswered.
Some jobs are failing for prolonged time (and affect stability of slaves they run on, like cause them hang), and apparently not being looked at, and my trying to debug them gives little outcome due to issues above (i.e. I don't see easy and complete picture, instead it's pretty complicated if not say obfuscated).
My trying to dig it and figure it out shows that there's divergence from some previous agreed maintenance principles. For example, we agreed that we maintain custom, quickly launchable AMIs, configuration for which are kept pretty well organized by linaro-ami-tool. However, looking at "Precise-64 3exec" build slave configuration, I see that its init script has grown to 70 lines, with lots of stuff being installed on top custom AMI (which, per convention mentioned, should really go into AMI itself). In particular, that the place "linaro-cp.py" appears to be installed - Fathi mentioned it previously, but I could findn't where it was install (nor Fathi gave a clear pointer). Besides that, at this time there're 11 different slave configurations, which is a bit too much to be possible to maintain well and problem-free.
Such situation (largely) complicates migrating, complicates debugging issues like lp:1324882 or recently reported "session timeout too short" issue. Unfortunately, I don't have many suggestions how to improved the situation besides slave setup to be redone to get it under control, job configs to be redone to be explicit, clear, and standalone, etc. Of course, that would take lot of effort and time, which again conflicts with the aim of migrating ci.l.o to ubuntu 14.04.
Changed in linaro-ci: | |
importance: | Undecided → High |
status: | New → Confirmed |
* only one question is left and is related to linaro-cp
* All the addition on top of custom AMI are hot fixes, documented in init and definitely not 70 lines
* linaro-cp is used on dedicated slave, hence not on AMI
* session timeout is too short and is unrelated to any of the issues raised above
so if you feel that AMI setup starts to be disorganized, it's because you aren't in the loop of some of the changes.