The action-pruner does not start if the backlog is too large
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Medium
|
Unassigned |
Bug Description
On a model with a very large number of actions and operations in MongoDB (approx. 1.7M actions and 650k operations, the action-pruner did not start on any of the 3 controller instances (HA).
Deleting the first 6.5k stale actions and running mgopurge made the action-pruner start on one controller instance.
However, even after starting successfully, it didn't prune anything and manual cleanup was still required.
- Which juju version this bug was seen in (e.g. 2.9.22)?
2.9.38.1, upgraded from 2.9.18, previously upgraded from 2.9.12
action-pruner most likely stopped working on 2.9.18
- What cloud the bug was seen on (e.g aws, microk8s, lxd etc.)?
OpenStack provider, the controllers run on regular OpenStack KVM instances
- Are there any relevant logs from the command or controller (for a command, include the --debug argument, for the controller/model use juju debug-log -m controller and juju debug-log -m mymodel, also be sure to scrub the log of confidential details)?
Logs and juju_engine_reports have been shared with https:/
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → Medium |
Workaround steps (executed on the primary mongodb instance) while the controllers were down:
db.actions. find({" model-uuid" : "2c0996b0- 4650-4a26- 8713-64a8bdf7d8 a7", "enqueued" : {$lte: ISODate( "2023-03- 01T00:00: 00Z")}} ).count( ) deleteMany( {"model- uuid" : "2c0996b0- 4650-4a26- 8713-64a8bdf7d8 a7", "enqueued" : {$lte: ISODate( "2023-03- 01T00:00: 00Z")}} ) find({" model-uuid" : "2c0996b0- 4650-4a26- 8713-64a8bdf7d8 a7", "enqueued" : {$lte: ISODate( "2023-03- 01T00:00: 00Z")}} ).count( ) deleteMany( {"model- uuid" : "2c0996b0- 4650-4a26- 8713-64a8bdf7d8 a7", "enqueued" : {$lte: ISODate( "2023-03- 01T00:00: 00Z")}} )
db.actions.
db.operations.
db.operations.
agent=$(cd /var/lib/ juju/agents; echo machine-*) juju/agents/ ${agent} /agent. conf | cut '-d ' -sf2)
pw=$(sudo grep statepassword /var/lib/
mgopurge --username $agent --password $pw
# start all 3 controllers, action-pruner started successfully for 2c0996b0- 4650-4a26- 8713-64a8bdf7d8 a7 on one of the controllers according to juju_engine_report