On Wed, Feb 23, 2011 at 1:27 AM, Данило Шеган <email address hidden> wrote:
> Processes that get killed as part of the rollout usually do not "clean
> up properly".
Any reason why not? We kill things in a staggered process, starting by
simply disabling cron, and only actually killing at the last possible
moment. We can make sure we start with SIGINT that will trigger stack
unwinding.
> I am pretty sure one of the culprits is translations-
> export-to-branch.py script. And what's else, it sometimes (though very
> rarely) gets killed by OOM, which is definitely not something we can
> anticipate from inside the script, so when that happens, we'd need
> notification from operations about it (if they can arrange that).
We can put a ulimit on it - that will cause a MemoryError exception to
unwind the stack, rather than the process being hard-killed.
On Wed, Feb 23, 2011 at 1:27 AM, Данило Шеган <email address hidden> wrote:
> Processes that get killed as part of the rollout usually do not "clean
> up properly".
Any reason why not? We kill things in a staggered process, starting by
simply disabling cron, and only actually killing at the last possible
moment. We can make sure we start with SIGINT that will trigger stack
unwinding.
> I am pretty sure one of the culprits is translations-
> export-to-branch.py script. And what's else, it sometimes (though very
> rarely) gets killed by OOM, which is definitely not something we can
> anticipate from inside the script, so when that happens, we'd need
> notification from operations about it (if they can arrange that).
We can put a ulimit on it - that will cause a MemoryError exception to
unwind the stack, rather than the process being hard-killed.