Comment 9 for bug 1771906

Revision history for this message
Junien F (axino) wrote :

Hi,

This is happening on a 2.3.7 controller with ~30M docs in juju.txns.

juju logs the following :

$ sudo egrep 'txnpruner|prune.go|juju.txn' /var/log/juju/machine-?.log|tail
2018-06-27 09:45:34 ERROR juju.worker.dependency engine.go:551 "transaction-pruner" manifold worker returned unexpected error: pruning failed, txnpruner stopping: failed to retrieve final txns count: read tcp A.B.C.4:33651->A.B.C.2:37017: i/o timeout
2018-06-27 10:45:37 INFO juju.txn prune.go:127 txns after last prune: 80758, txns now: 30356480, pruning: too many new transactions
2018-06-27 10:46:37 WARNING juju.txn oracle.go:220 cleanup of "txns.prunetemp" failed: read tcp A.B.C.4:36240->A.B.C.2:37017: i/o timeout
2018-06-27 10:46:37 ERROR juju.worker.dependency engine.go:551 "transaction-pruner" manifold worker returned unexpected error: pruning failed, txnpruner stopping: failed to retrieve final txns count: read tcp A.B.C.4:36240->A.B.C.2:37017: i/o timeout
2018-06-27 11:46:40 INFO juju.txn prune.go:127 txns after last prune: 80758, txns now: 30438897, pruning: too many new transactions
2018-06-27 11:47:40 WARNING juju.txn oracle.go:220 cleanup of "txns.prunetemp" failed: read tcp A.B.C.4:41384->A.B.C.2:37017: i/o timeout
2018-06-27 11:47:40 ERROR juju.worker.dependency engine.go:551 "transaction-pruner" manifold worker returned unexpected error: pruning failed, txnpruner stopping: failed to retrieve final txns count: read tcp A.B.C.4:41384->A.B.C.2:37017: i/o timeout
2018-06-27 12:47:43 INFO juju.txn prune.go:127 txns after last prune: 80758, txns now: 30521252, pruning: too many new transactions
2018-06-27 12:48:43 WARNING juju.txn oracle.go:220 cleanup of "txns.prunetemp" failed: read tcp A.B.C.4:38816->A.B.C.2:37017: i/o timeout
2018-06-27 12:48:43 ERROR juju.worker.dependency engine.go:551 "transaction-pruner" manifold worker returned unexpected error: pruning failed, txnpruner stopping: failed to retrieve final txns count: read tcp A.B.C.4:38816->A.B.C.2:37017: i/o timeout

It would appear that juju is timing out after 60s, but the aggregation takes 488s :

Jun 27 12:55:51 juju-foo-0 mongod.37017[2055]: [conn144066] command juju.txns command: aggregate { aggregate: "txns", pipeline: [ { $match: { _id: { $lt: ObjectId('5b33795f0000000000000000') }, s: { $gte: 5 } } }, { $project: { _id: 1 } }, { $out: "txns.prunetemp" } ], cursor: { batchSize: 1616 }, allowDiskUse: true } keyUpdates:0 writeConflicts:0 numYields:241900 reslen:115 locks:{ Global: { acquireCount: { r: 961941, w: 475690, W: 1 } }, Database: { acquireCount: { r: 243124, w: 475688, R: 1, W: 2 } }, Collection: { acquireCount: { r: 243124, w: 40 } }, Metadata: { acquireCount: { w: 30440288 } }, oplog: { acquireCount: { w: 475648 } } } protocol:op_query 488082ms