The action-pruner does not start if the backlog is too large

Bug #2011576 reported by Przemyslaw Lal
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Medium
Unassigned

Bug Description

On a model with a very large number of actions and operations in MongoDB (approx. 1.7M actions and 650k operations, the action-pruner did not start on any of the 3 controller instances (HA).

Deleting the first 6.5k stale actions and running mgopurge made the action-pruner start on one controller instance.

However, even after starting successfully, it didn't prune anything and manual cleanup was still required.

- Which juju version this bug was seen in (e.g. 2.9.22)?

2.9.38.1, upgraded from 2.9.18, previously upgraded from 2.9.12
action-pruner most likely stopped working on 2.9.18

- What cloud the bug was seen on (e.g aws, microk8s, lxd etc.)?

OpenStack provider, the controllers run on regular OpenStack KVM instances

- Are there any relevant logs from the command or controller (for a command, include the --debug argument, for the controller/model use juju debug-log -m controller and juju debug-log -m mymodel, also be sure to scrub the log of confidential details)?

Logs and juju_engine_reports have been shared with https://launchpad.net/~manadart and https://launchpad.net/~simonrichardson privately.

Revision history for this message
Przemyslaw Lal (przemeklal) wrote :

Workaround steps (executed on the primary mongodb instance) while the controllers were down:

db.actions.find({"model-uuid" : "2c0996b0-4650-4a26-8713-64a8bdf7d8a7", "enqueued" : {$lte: ISODate("2023-03-01T00:00:00Z")}}).count()
db.actions.deleteMany({"model-uuid" : "2c0996b0-4650-4a26-8713-64a8bdf7d8a7", "enqueued" : {$lte: ISODate("2023-03-01T00:00:00Z")}})
db.operations.find({"model-uuid" : "2c0996b0-4650-4a26-8713-64a8bdf7d8a7", "enqueued" : {$lte: ISODate("2023-03-01T00:00:00Z")}}).count()
db.operations.deleteMany({"model-uuid" : "2c0996b0-4650-4a26-8713-64a8bdf7d8a7", "enqueued" : {$lte: ISODate("2023-03-01T00:00:00Z")}})

agent=$(cd /var/lib/juju/agents; echo machine-*)
pw=$(sudo grep statepassword /var/lib/juju/agents/${agent}/agent.conf | cut '-d ' -sf2)
mgopurge --username $agent --password $pw

# start all 3 controllers, action-pruner started successfully for 2c0996b0-4650-4a26-8713-64a8bdf7d8a7 on one of the controllers according to juju_engine_report

Changed in juju:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.