Massive goroutine leak (logsink 2.5.0)
Bug #1813104 reported by
Christopher Lee
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
Critical
|
John A Meinel | ||
2.5 |
Fix Released
|
Critical
|
John A Meinel |
Bug Description
We noticed a large leak of goroutines while monitoring a 2.5.0 environment.
An extract of a goroutine dump[1] points us to logsink/
It appears that the logsink ServeHTTP is starting up go routines that never get aborted (perhaps related to the mux issue xtian has seen in the past?)
[1] https:/
[2] https:/
Changed in juju: | |
status: | Confirmed → Triaged |
importance: | High → Critical |
Changed in juju: | |
milestone: | 2.5.1 → 2.6-beta1 |
Changed in juju: | |
status: | Triaged → Fix Committed |
Changed in juju: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
(note that the original heap dumps have 1 line from stderr at the top that have to be trimmed. I've attached a tar of the cleaned up dumps to this bug).
If you run: profile- 2019-01- 24.00 jujud juju_heap_ profile- 2019-01- 24.01 com/juju/ juju/vendor/ github. com/gorilla/ websocket. newConnBRW com/juju/ juju/state/ watcher. (*HubWatcher) .queueChange tls.(*block) .reserve
$ go tool pprof -top -base juju_heap_
File: jujud
Type: inuse_space
Showing nodes accounting for 1620.64MB, 76.19% of 2127.01MB total
Dropped 743 nodes (cum <= 10.64MB)
flat flat% sum% cum cum%
1330.01MB 62.53% 62.53% 1330.01MB 62.53% github.
246.83MB 11.60% 74.13% 248.85MB 11.70% github.
80.63MB 3.79% 77.92% 80.63MB 3.79% crypto/
1 vs 2 com/juju/ juju/vendor/ github. com/gorilla/ websocket. newConnBRW tls.(*block) .reserve
flat flat% sum% cum cum%
1136454738B 84.39% 84.39% 1136454738B 84.39% github.
65547996B 4.87% 89.25% 65547996B 4.87% crypto/
2 vs 3 com/juju/ juju/vendor/ github. com/gorilla/ websocket. newConnBRW tls.(*block) .reserve
1007.44MB 85.21% 85.21% 1007.44MB 85.21% github.
48.33MB 4.09% 89.29% 48.33MB 4.09% crypto/
It is fairly clear that the memory is all in gorilla websocket caches.
From the goroutine information, you can see that likely the logSinkHandler is somehow holding on to websockets. /grafana. admin.canonical .com/d/ sR1-JkYmz/ juju2-controlle rs-thumpers? orgId=1& from=1548230449 490&to= 1548287973681& var-controller= prodstack- 45-bootstack- ps45-prodstack- is&var- host=All& var-node= All
Note that
https:/
Shows the clear memory growth, but the "ConnectionCount" is perfectly stable.
It does also show the goroutine growth.