txnpruner apparently not running on 2.4.3 controller
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
High
|
Unassigned |
Bug Description
Our large production Juju 2 controller has apparently been failing to prune txns. I used mgopurge to deal with them:
2018-09-25 01:56:16 DEBUG pruning completed: removed 43466583 txns
2018-09-25 01:56:18 INFO clean and prune cleaned 32412 docs in 71 collections
removed 43466583 transactions and 11345 stash documents
And poked around to find that the txnpruner stopped working a few days ago, with one dubiously successful run:
2018-09-18 00:25:27 INFO juju.txn prune.go:131 txns after last prune: 196941, txns now: 12893141, pruning: too many new transactions
2018-09-18 00:26:27 WARNING juju.txn oracle.go:228 cleanup of "txns.prunetemp" failed: read tcp 10.25.2.
2018-09-18 00:26:27 ERROR juju.worker.
2018-09-18 01:26:30 INFO juju.txn prune.go:131 txns after last prune: 196941, txns now: 13087808, pruning: too many new transactions
2018-09-18 01:27:30 WARNING juju.txn oracle.go:228 cleanup of "txns.prunetemp" failed: read tcp 10.25.2.
2018-09-18 01:27:30 ERROR juju.worker.
2018-09-18 02:27:33 INFO juju.txn prune.go:131 txns after last prune: 196941, txns now: 13282793, pruning: too many new transactions
2018-09-18 02:28:33 WARNING juju.txn oracle.go:228 cleanup of "txns.prunetemp" failed: read tcp 10.25.2.
2018-09-18 02:28:33 ERROR juju.worker.
[...]
2018-09-18 23:22:23 INFO juju.txn prune.go:131 txns after last prune: 196941, txns now: 17237575, pruning: too many new transactions
2018-09-18 23:23:23 WARNING juju.txn oracle.go:228 cleanup of "txns.prunetemp" failed: read tcp 10.25.2.
2018-09-19 01:22:53 INFO juju.txn prune.go:131 txns after last prune: 196941, txns now: 17569913, pruning: too many new transactions
2018-09-19 01:23:53 WARNING juju.txn oracle.go:228 cleanup of "txns.prunetemp" failed: read tcp 10.25.2.
2018-09-19 02:23:56 INFO juju.txn prune.go:131 txns after last prune: 196941, txns now: 17766989, pruning: too many new transactions
2018-09-19 02:31:18 INFO juju.txn prune.go:176 txn batch pruned in 7m22.09937523s. txns now: 16790490, inspected 70 collections, 59333 docs (785 cleaned)
2018-09-19 02:32:18 WARNING juju.txn oracle.go:228 cleanup of "txns.prunetemp" failed: read tcp 10.25.2.
2018-09-19 03:32:22 INFO juju.txn prune.go:131 txns after last prune: 196941, txns now: 16987610, pruning: too many new transactions
2018-09-19 03:33:22 WARNING juju.txn oracle.go:228 cleanup of "txns.prunetemp" failed: read tcp 10.25.2.
However the txnpruner has not started running at all. I tried to check juju_engine_report. It hangs on machines 0 and 1, and on machine 2 it reports:
transaction-
error: '"is-primary-
inputs:
- clock
- state
- is-primary-
- migration-fortress
- migration-
resource-log:
- name: migration-
type: '*engine.Flag'
- name: migration-fortress
type: '*fortress.Guest'
- name: is-primary-
type: '*engine.Flag'
state: stopped
I also checked juju_goroutines on machines 0 and 1 for mentions of pruning, and it seems that some pruners are running on machine 1, and no pruners on machine 0:
ubuntu@
Querying @jujud-machine-0 introspection socket: /debug/
ubuntu@
ubuntu@
Querying @jujud-machine-1 introspection socket: /debug/
# 0x1e326dc github.
# 0x1e326dc github.
# 0x1e32958 github.
# 0x1e33478 github.
# 0x1e33969 github.
# 0x1e32958 github.
# 0x1eba0c8 github.
# 0x1eba5b9 github.
# 0xc0b5bf github.
# 0xc0ae5a github.
# 0xc0b975 github.
# 0xc02833 github.
ubuntu@
Machine agent uptimes:
root 26734 53.3 11.0 3142772 1811352 ? Sl Sep21 2508:13 /var/lib/
root 9935 170 70.2 13919392 11539156 ? Rl Sep18 15283:34 /var/lib/
root 17006 74.3 26.3 8675108 4322348 ? Sl Sep18 7375:51 /var/lib/
Changed in juju: | |
status: | Expired → Triaged |
importance: | Undecided → High |
Thanks, the pruners would be controlled as part of the workers running on the controllers. Have you been able to restart the juju agent on the controller 0 and see if the pruners are fired up? I assume something went wrong to cause them to not be available.