ci.linaro.org is going down intermittently
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linaro CI |
Fix Released
|
Critical
|
James Tunnicliffe |
Bug Description
ci.linaro.org is going out of service intermittently because of JVM Out Of Memory Error as shown in the jenkins.log below:
"
[Winstone 2012/03/01 07:15:37] - Untrapped Error in Servlet
java.lang.
"
I have seen this since last week or so intermittently once in a while but it is becoming very obvious now.
I am seeing this recently with the jenkins version 1.419.
I fixed this problem by restarting the jenkins. But this is not a permanent solution.
The following reasons are cited for such kind of errors:
Jenkins is growing in data size, requiring a bigger heap space. In this case you just want to give it a bigger heap.
Jenkins is temporarily processing a large amount of data (like test reports), requiring a bigger head room in memory. In this case you just want to give it a bigger heap.
Jenkins is leaking memory, in which case we need to fix that.
The reason 2 does not seem to be the cause of the OOM as only 3 jobs are scheduled to run at a time. In that case, the culprits can be 1 or 3.
I think I have given enough memory to the JVM " -Xms640m -Xmx1024m " with the actual RAM on the machine being 1GB.
I think the 1GB RAM is also a constraint here.
There are some options to get the dump of the JVM @ https:/
But, am afraid with the increasing jobs being added to ci.linaro.org we would see this problem more often.
Here are the solutions what I think might help us:
1) Try and see if we the latest jenkins upgrade will solve the problem, am not able to decide on which version, but I see 1.439 has memory fixes.
2) Migrate the ci.linaro.org onto another ec2 instance with more memory for time being till we get setup on DC.
Changed in linaro-ci: | |
status: | New → Triaged |
Changed in linaro-ci: | |
assignee: | nobody → Paul Sokolovsky (pfalcon) |
Hello Deepti,
On Thu, 1 Mar 2012 13:29:41 +0530
Deepti Kalakeri <email address hidden> wrote:
> Hello Danilo/Paul,
>
> I am seeing a lot of Out of Memory error on ci.linaro.org.
Deepti now opened bug for this, /bugs.launchpad .net/linaro- ci/+bug/ 943901 (I'm cc:ing it).
https:/
> OutOfMemoryErro r: Java heap space
> "
> [Winstone 2012/03/01 07:15:37] - Untrapped Error in Servlet
> java.lang.
>
> "
> I have seen this since last week or so intermittently once in a while
> but it is becoming very obvious now.
> I am seeing this recently with the jenkins version 1.419.
> I fixed this problem by restarting the jenkins. But this is not a
> permanent solution.
> The following reasons are cited for such kind of errors:
>
> 1. Jenkins is growing in data size, requiring a bigger heap space.
> In this case you just want to give it a bigger heap.
> 2. Jenkins is temporarily processing a large amount of data (like
> test reports), requiring a bigger head room in memory. In this case
> you just want to give it a bigger heap.
> 3. Jenkins is leaking memory, in which case we need to fix that.
>
> The reason 2 does not seem to be the cause of the OOM as only 3 jobs
> are scheduled to run at a time. In that case, the culprits can be 1
> or 3.
>
> I think I have given enough memory to the JVM " -Xms640m -Xmx1024m "
> with the actual RAM on the machine being 1GB.
>
> I think the 1GB RAM is also a constraint here.
Well, small EC2 instance we use for Jenkins masters is 1.7Gb RAM, so
there's room for heap increase. And surprisingly, it turns out
android-build still uses Java defaults (doesn't pass explicit
-Xms/-Xmx), so we can't compare to that.
What's more worrying though is that OOM are accompanied by 99% CPU
usage by Java. Actually, after the restart, Jenkins is back to eating
99% CPU time in 5 mins or less (site still responds). So, I wouldn't
dismiss p.2 "Large processing" above, especially that you mentioned
that there were some issues due to frequent SCM polls already.
Actually, I saw an exception in logs while 99% usage related to SCM
polling:
Mar 1, 2012 10:20:28 AM hudson. triggers. SCMTrigger$ Runner runPolling NullPointerExce ption plugins. bazaar. BazaarSCM. calcRevisionsFr omBuild( BazaarSCM. java:196)
SEVERE: Failed to record SCM polling
java.lang.
at
hudson.
So, my proposals for how to deal with it:
1. Increase heap size by 256Mb as a stop-gap measure.
2. Consider upgrading to new version of Jenkins (read changelogs, test
on a sandbox).
3. Review/investigate how Jenkins does SCM polling. Consider again that
the most efficient ways to handle SCM tip builds would be
interrupt-driven, not polling (i.e. have trigger in SCM repository to
queue a build).
> /wiki.jenkins- ci.org/ display/ JENKINS/ I%27m+getting+ OutOfMemoryErro r .
> There are some options to get the dump of the JVM @
> https:/
> I will have to try that.
Well, investigating JavaVM memory usage is for sure technically sound
plan, especially if you had experience for it.
>
> But, am afraid with the increasing jobs being added to ci.linaro.org
> we would see this problem more often.
>
> Here are the solutions what I think might help us:
>
> 1) Try and see if we the l...