mgopurge is current release version.
Juju is 2.3.3 running on xenial.
I can reconstruct this rough timeline
At some time, before the upgrade to 2.3.8 the complaints about this single transaction started already.
Juju was version 2.2.6
September the 3rd.
The uprade to 2.3.8 was done.
We were a little concerned about the big transaction backlog back then, that but decided it should not interfere with the upgrade.
Intermediate time.
ERROR juju.worker.dependency engine.go:551 "mgo-txn-resumer" manifold worker returned unexpected error: cannot resume transactions: cannot find transaction
5b87d51c5540e3051751d249_0c5d64c0 in queue for document {actions 6a783ac4-0b48-45a3-87fc-9646f8bd82de:eb08e238-dfa8-4c46-842e-47a4eb929adf}
happenig constantly. Every few seconds.
2018-09-09
Error about single transaction turns into document too large.
2018-09-09 20:50:05 ERROR juju.worker.dependency engine.go:551 "mgo-txn-resumer" manifold worker returned unexpected error: cannot resume transactions: cannot find transaction
5b87d51c5540e3051751d249_0c5d64c0 in queue for document {actions 6a783ac4-0b48-45a3-87fc-9646f8bd82de:eb08e238-dfa8-4c46-842e-47a4eb929adf}
2018-09-09 20:50:09 ERROR juju.worker.dependency engine.go:551 "mgo-txn-resumer" manifold worker returned unexpected error: cannot resume transactions: Resulting document after update is larger than 16777216
October 12th
Controlers stopped
db.txns.update({"_id": ObjectId("5b87d51c5540e3051751d249")}, {"$set": {"s": 1}, "$unset": {"n": 1}})
Controllers started. Very heavy system load. Unreliable operations.
mgopurge. Noticed resume OOM.
mgopurge purge. Cleaned a lot of finished transactions.
mgopurge resume would still OOM.
jujud would still not work well.
2018-10-13 17:38:27 ERROR juju.worker.dependency engine.go:551 "log-pruner" manifold worker returned unexpected error: failed to prune logs by time: read tcp 100.107.2.44:48498->100.107.2.44:37017: i/o timeout
log db massively bloated.
juju logs db dropped.
jujud would start. Log no error but ver very quickly run out of memory.
mgopurge is current release version.
Juju is 2.3.3 running on xenial.
I can reconstruct this rough timeline
At some time, before the upgrade to 2.3.8 the complaints about this single transaction started already.
Juju was version 2.2.6
September the 3rd.
The uprade to 2.3.8 was done.
We were a little concerned about the big transaction backlog back then, that but decided it should not interfere with the upgrade.
Intermediate time. dependency engine.go:551 "mgo-txn-resumer" manifold worker returned unexpected error: cannot resume transactions: cannot find transaction 51751d249_ 0c5d64c0 in queue for document {actions 6a783ac4- 0b48-45a3- 87fc-9646f8bd82 de:eb08e238- dfa8-4c46- 842e-47a4eb929a df}
ERROR juju.worker.
5b87d51c5540e30
happenig constantly. Every few seconds.
2018-09-09 dependency engine.go:551 "mgo-txn-resumer" manifold worker returned unexpected error: cannot resume transactions: cannot find transaction 51751d249_ 0c5d64c0 in queue for document {actions 6a783ac4- 0b48-45a3- 87fc-9646f8bd82 de:eb08e238- dfa8-4c46- 842e-47a4eb929a df} dependency engine.go:551 "mgo-txn-resumer" manifold worker returned unexpected error: cannot resume transactions: Resulting document after update is larger than 16777216
Error about single transaction turns into document too large.
2018-09-09 20:50:05 ERROR juju.worker.
5b87d51c5540e30
2018-09-09 20:50:09 ERROR juju.worker.
October 12th update( {"_id": ObjectId( "5b87d51c5540e3 051751d249" )}, {"$set": {"s": 1}, "$unset": {"n": 1}})
Controlers stopped
db.txns.
Controllers started. Very heavy system load. Unreliable operations.
mgopurge. Noticed resume OOM.
mgopurge purge. Cleaned a lot of finished transactions.
mgopurge resume would still OOM.
jujud would still not work well. dependency engine.go:551 "log-pruner" manifold worker returned unexpected error: failed to prune logs by time: read tcp 100.107. 2.44:48498- >100.107. 2.44:37017: i/o timeout
2018-10-13 17:38:27 ERROR juju.worker.
log db massively bloated.
juju logs db dropped.
jujud would start. Log no error but ver very quickly run out of memory.