Android mirror cronjobs causing high process count

Bug #1268903 reported by Ben Copeland
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro Android Infrastructure
Fix Released
Critical
Paul Sokolovsky

Bug Description

/mnt/upstream-mirror/linaro-android-gerrit-support/android.git.linaro.org/cron-runner.sh does not quit and ends up causing a high process count.

Today:
 ps aux|grep cron-runner.sh|wc -l
37

Sunday 12/01/14:
ps aux|grep cron-runner.sh|wc -l
49

This will cause android.git.linaro.org slow downs, and if the processes are not killed the server would become unresponsive.

Revision history for this message
Ben Copeland (bcc) wrote :
Changed in linaro-android-infrastructure:
assignee: nobody → Paul Sokolovsky (pfalcon)
importance: Undecided → Critical
milestone: none → 2014.01
Changed in linaro-android-infrastructure:
status: New → In Progress
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

cron-runner.sh is a wrapper for git fetch/push operations. It is set to run every 4 hours so far, and each invocation normally takes <20min. So, I'm not sure so far what may have caused pile-up that was quoted above. Even big code drop in upstream repos shouldn't cause dozen of parallel runs.

Anyway, I can see 2 parallel runs now, and will try to figure out what are they/when were started. Then I'm going to kill everything and watch it over next few days. And we of course can/should add locking to avoid parallel runs (but first need to understand why they happened at all).

summary: - Cronjob causing high process count
+ Android mirror cronjobs causing high process count
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Current state:

$ ps aux|grep cron-runner.sh
ubuntu 24980 0.0 0.0 4404 608 ? Ss 16:30 0:00 /bin/sh -c (cd ~/linaro-android-gerrit-support; android.git.linaro.org/cron-runner.sh push --force) || echo "Push error"
ubuntu 24988 0.0 0.0 11016 1376 ? S 16:30 0:00 /bin/bash android.git.linaro.org/cron-runner.sh push --force
ubuntu 24990 0.0 0.0 11020 640 ? S 16:30 0:00 /bin/bash android.git.linaro.org/cron-runner.sh push --force
ubuntu 27468 0.0 0.0 4404 608 ? Ss 12:30 0:00 /bin/sh -c (cd ~/linaro-android-gerrit-support; android.git.linaro.org/cron-runner.sh push --force) || echo "Push error"
ubuntu 27480 0.0 0.0 11016 1380 ? S 12:30 0:00 /bin/bash android.git.linaro.org/cron-runner.sh push --force
ubuntu 27482 0.0 0.0 11020 648 ? S 12:30 0:00 /bin/bash android.git.linaro.org/cron-runner.sh push --force

So, it's 2 stale "push" (not fetch) jobs running in parallel.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

$ cat /mnt/upstream-mirror/gerrit-mirror-logs/20140116T0430-push.log
INFO:./git-gerrit-mirror:=== Processing: git://gcc.gnu.org (1 repositories) ===
INFO:./git-gerrit-mirror:Pushing in git/gcc.git
android.git.linaro.org/cron-runner.sh: line 12: 23988 Killed $MYPATH/git-gerrit-mirror --mirror-dir=$MIRROR_DIR $* > $LOG 2>&1

real 317m39.531s
user 0m0.204s
sys 0m0.296s

So, I'm afraid, it's vicious cycle - slowdown in gerrit causes push jobs to run slow, which then pile up and bring down gerrit even further.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

There were few adjustments to avoid process pile-up as reported. But underlying issue which cause cronjob hang appear to be related to how Gerrit (started to) serve TCP/IP connections, or some TCP/IP issue overall: https://linaroithelp.zendesk.com/requests/1771

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Ok, after server reboot, mirror pushes appear to work fast, as expected. Closing.

Changed in linaro-android-infrastructure:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.