juju shouldn't let txn-queues grow out of control

Bug #1778907 reported by Junien Fridrick on 2018-06-27
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
High
John A Meinel
2.3
High
John A Meinel

Bug Description

Hi,

We got it by a bug today where the txn-resumer was unable to resume a transaction (because of a bug in the code).

In less than 2 days, the txn-queue for a few documents grew to over 25k. We thankfully caught the problem relatively fast and were able to deal with it, but agents now need to process these 25k txns.

If an operation fails, juju shouldn't let txn-queues grow out of control because of constant retries.

Thanks

Junien Fridrick (axino) on 2018-06-27
summary: - juju shouldn't let txn-queues growing out of control
+ juju shouldn't let txn-queues grow out of control

What version of Juju was running? We have put some changes in place to
avoid letting the Queue grow too large. I believe we had a release bug
where early 2.3 releases weren't getting all of the patches for our
dependencies.

John
=:->

On Wed, Jun 27, 2018, 16:30 Junien Fridrick <email address hidden>
wrote:

> Public bug reported:
>
> Hi,
>
> We got it by a bug today where the txn-resumer was unable to resume a
> transaction (because of a bug in the code).
>
> In less than 2 days, the txn-queue for a few documents grew to over 25k.
> We thankfully caught the problem relatively fast and were able to deal
> with it, but agents now need to process these 25k txns.
>
> If an operation fails, juju shouldn't let txn-queues grow out of control
> because of constant retries.
>
> Thanks
>
> ** Affects: juju
> Importance: Undecided
> Status: New
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1778907
>
> Title:
> juju shouldn't let txn-queues growing out of control
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1778907/+subscriptions
>

Junien Fridrick (axino) wrote :

Ah yes sorry, this was a 2.3.7 controller.

So we do have this patch:
https://github.com/juju/juju/blob/develop/patches/max_txn_queue_length_pr463.diff

which theoretically uses MaxTxnQueueLength (default of 1000), which
immediately removes a txn if it was the 1001st txn.

However, looking at the source for 2.3.7 and 2.3.8 I don't see that patch
in the source tarball:
 https://launchpad.net/juju/+download

I *do* see this patch having been applied:

https://github.com/juju/juju/blob/develop/patches/mgo_server_abended_255.diff

I do see the patch applied in the 2.4 series.

And now that I've dug into it, I remember there was a bug that the name of
the file was ".patch" instead of ".diff" in the 2.3 branch, so it wasn't
getting applied.
It looks like that still hasn't been fixed in the 2.3 series, so I'll go do
that now.

On Thu, Jun 28, 2018 at 9:08 AM, Junien Fridrick <<email address hidden>
> wrote:

> Ah yes sorry, this was a 2.3.7 controller.
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1778907
>
> Title:
> juju shouldn't let txn-queues grow out of control
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1778907/+subscriptions
>

John A Meinel (jameinel) wrote :

this should already be in all of the 2.4 releases, so I'm marking it fixed there.

Changed in juju:
assignee: nobody → John A Meinel (jameinel)
importance: Undecided → High
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers