Comment 6 for bug 126204

Revision history for this message
N7DR (doc-evans) wrote : Re: [Bug 126204] Re: Batch jobs intermittently fail to leave "="queue when complete

On 18/08/07, Scott Kitterman <email address hidden> wrote:
> There is a standard cron job that restarts syslogd on a daily basis.
> That appears to be what this is. I'm not sure how or if that might be
> relevant.
>

No, me neither; but it sure seems to be strongly correlated.

> SIGTERM = Signal("TERM", 15, "Termination")
>
> Looking at the man page for at, I find this:
>
> "At and batch as presently implemented are not suitable when users are
> competing for resources. If this is the case for your site, you
> might want to consider another batch system, such as nqs."
>

That's incredibly vague... "competing for resources" is meaningless.
My jobs aren't doing anything that would be competing for anything
remotely unusual: they are mostly CPU-bound, with occasional writes to
a file descriptor (for an ordinary file in the current working
directory of the job). If that's "competing for resources" then
*anything* could be so called, and it would be unsafe to run anything
at all through the batch system..

> It looks to me like you are running into this known limitation of at.
> What I would suggest is you either follow the recommendations in the man
> page and use a different batch system (that will be most robust I would
> guess) or adjust your cron job to make sure it doesn't run at the same
> time other cron jobs are running.

As far as I can tell, there is no mention of "nqs" in the dapper
64-bit repositories. Where would I get it?

There's also no way to follow your latter suggestion: I have no way of
knowing how long each individual job will take to run, so I can't
simply (for example) stop executing new jobs at (say) 7:15. Some of
the jobs run for two minutes, some run for over 200, so it's simply
impractical to try to not schedule one to be running 7:44 or
thereabouts (which is when the daily job appears to run).

Is the daily system job important? Maybe I could simply stop that from
executing?