Canonical Juju

Bug #1771906
Comment #16

Comment 16 for bug 1771906

Revision history for this message

John A Meinel (jameinel) wrote on 2018-08-13: Re: [Bug 1771906] Re: txnpruner failing to prune on large controller

#16

It seems cursors default to having a 10 minute lifetime, though it seems to
be driver dependent. Both Java and Pymongo expose some sort of
"no_cursor_timeout".
Or:
https://docs.mongodb.com/manual/reference/method/cursor.noCursorTimeout/#cursor.noCursorTimeout

I wouldn't think we have long periods of inactivity, but maybe it is
possible that after creating the temp table, we are busy thinking about
other things, and don't go back to hit one of our other queries very often?

Batching into smaller groups to run at a time probably works around this as
well.

On Mon, Jul 23, 2018 at 8:19 AM, Paul Collins <email address hidden>
wrote:

> We just had another occurrence of this on our largest internal
> production controller.
>
> We received alerts for high load on a unit that turned out to be the
> mongodb primary. We ran "mgopurge -stages resume,prune", which
> eventually removed approximately 27.8 million completed txns.
>
> However, mgopurge crashed twice with "ERROR failed stage prune: Cursor
> not found, cursor id: ${bignum}" before a final run completed cleanly.
>
> --
> You received this bug notification because you are a bug assignee.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1771906
>
> Title:
> txnpruner failing to prune on large controller
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1771906/+subscriptions
>