insert/remove cleanups spike caused juju controllers to become unresponsive.
Bug #1886498 reported by
Nick Moffitt
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Low
|
Unassigned |
Bug Description
First, here is the graphs for this incident:
Second, here's everything mongodb spat to syslog all day:
https:/
Per my chat with simon at https:/
1. insert/remove cleanups spike.
2. txn_ops and locks high
3. deployments slow way down
4. txn_ops and locks resolve
5. status queries continue to become slower for some time
6. Everything resolves.
Is there a way we can prevent these "cleanup" spikes?
To post a comment you must log in.
From the logs it looks like mongo struggled to get resources from the machine. It took 8.5seconds to acquire a lock and Juju really struggled to recover.
------
Jul 6 12:51:32 juju-4da59b22- 9710-4e69- 840a-be49ee864a 97-machine- 0 mongod. 37017[16302] : [ftdc] serverStatus was very slow: { after basic: 0, after asserts: 0, after connections: 0, after extra_info: 0, after globalLock: 0, after locks: 0, after network: 0, after opcounters: 0, after opcountersRepl: 0, after repl: 0, after security: 0, after storageEngine: 0, after tcmalloc: 0, after wiredTiger: 1010, at end: 1010 } 9710-4e69- 840a-be49ee864a 97-machine- 0 mongod. 37017[16302] : [conn188253] command admin.system.users command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: "xxx" } keyUpdates:0 writeConflicts:0 numYields:0 reslen:155 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMi cros: { r: 8443463 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 8455ms
Jul 6 12:51:39 juju-4da59b22-