juju debug-log sort may exceed MongoDB's limits

Bug #1590605 reported by Menno Finlay-Smits
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Menno Finlay-Smits

Bug Description

The backend code used for the debug-log API does a sort which may exceed MongoDB's 32MB limit when there's a lot of logs to return.

The error is:

debug-log handler error: tailer stopped: errmsg: "Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit."

The solution is probably to ensure that the sort uses the existing indexes on the logs collection as the limit doesn't apply when indexes are used.

To reproduce, generate a model with lots of logs:

  juju bootstrap ... --config logging-config='<root>=TRACE'
  while true; do juju status ; done # leave this for a while

Then ask debug-log to return all the logs:

   juju debug-log --replay

The command will fail and the error above should be in the controller machine's logs.

Tags: debug-log
description: updated
Changed in juju-core:
status: New → Triaged
tags: added: debug-log
Revision history for this message
Ursula Junque (ursinha) wrote :
Download full text (3.1 KiB)

I have a container with 2GB of RAM and hit this issue, with juju2-beta8, while deploying landscape-dense-maas. I left it running for a while and it was enough to hit this bug.
Symptom is out of a sudden juju debug-log stops working, with no errors:

$ juju debug-log -m maastwo:controller
$ echo $?
0
$ juju debug-log -m maastwo:controller --debug
2016-06-10 22:09:55 INFO juju.cmd supercommand.go:60 running juju [2.0-beta8 gc go1.6.1]
2016-06-10 22:09:55 INFO juju.juju api.go:212 connecting to API addresses: [192.168.200.2:17070]
2016-06-10 22:09:55 INFO juju.api apiclient.go:494 dialing "wss://192.168.200.2:17070/model/13b58f00-4c31-4eaa-8dff-4d1250c040f4/api"
2016-06-10 22:09:55 INFO juju.api apiclient.go:271 connection established to "wss://192.168.200.2:17070/model/13b58f00-4c31-4eaa-8dff-4d1250c040f4/api"
2016-06-10 22:09:55 DEBUG juju.juju api.go:361 API hostnames unchanged - not resolving
2016-06-10 22:09:57 DEBUG juju.api apiclient.go:520 health ping failed: connection is shut down
2016-06-10 22:09:57 INFO cmd supercommand.go:454 command finished
$

I've logged into the machine and checked /var/log/juju/machine-0.log, here's the excerpt when executing debug-log:
2016-06-10 22:13:09 INFO juju.apiserver apiserver.go:311 [2E] API connection from 192.168.200.1:37352, active connections: 13
2016-06-10 22:13:09 DEBUG juju.apiserver utils.go:61 validate model uuid: controller model - 13b58f00-4c31-4eaa-8dff-4d1250c040f4
2016-06-10 22:13:09 DEBUG juju.apiserver apiserver.go:291 <- [2E] <unknown> {"RequestId":1,"Type":"Admin","Version":3,"Request":"Login","Params":"'params redacted'"}
2016-06-10 22:13:09 DEBUG juju.apiserver admin.go:170 hostPorts: [[192.168.200.2:17070 127.0.0.1:17070 [::1]:17070]]
2016-06-10 22:13:09 DEBUG juju.apiserver apiserver.go:305 -> [2E] user-admin@local 33.688574ms {"RequestId":1,"Response":"'body redacted'"} Admin[""].Login
2016-06-10 22:13:09 DEBUG juju.apiserver apiserver.go:291 <- [2E] user-admin@local {"RequestId":2,"Type":"Client","Version":1,"Request":"AgentVersion","Params":"'params redacted'"}
2016-06-10 22:13:09 DEBUG juju.apiserver apiserver.go:305 -> [2E] user-admin@local 584.081µs {"RequestId":2,"Response":"'body redacted'"} Client[""].AgentVersion
2016-06-10 22:13:09 INFO juju.apiserver debuglog.go:76 debug log handler starting
2016-06-10 22:13:09 DEBUG juju.apiserver utils.go:61 validate model uuid: controller model - 13b58f00-4c31-4eaa-8dff-4d1250c040f4
2016-06-10 22:13:11 ERROR juju.apiserver debuglog.go:95 debug-log handler error: tailer stopped: Executor error: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.
2016-06-10 22:13:11 INFO juju.apiserver apiserver.go:316 [2E] user-admin@local API connection terminated after 1.750783377s, active connections: 12

Problem appears to be mongodb has an enforced 32MB limit for in-memory searches. Logs at this point are ~70MB. From my (short) investigation I gathered this is solved by using indexes, so likely a code change, not a config one.

Workaround is to log into the machine and "tail -f /var/log/juju/machine-0.log", which can be noisy if you have DEBUG on but still does the tric...

Read more...

Revision history for this message
Ursula Junque (ursinha) wrote :

Output from my previous comment pastebinned, for readability: https://pastebin.canonical.com/158557/ :)

I think there are two issues here: one is the actual mongodb problem; another is juju being oblivious to a critical error (it returns 0). Should I file a separated bug for the latter?

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta10 → 2.0-beta11
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta11 → 2.0-beta12
Changed in juju-core:
milestone: 2.0-beta12 → 2.0-beta13
Revision history for this message
Uros Jovanovic (uros-jovanovic) wrote :

This bug prevents controllers from running for more than just tests of bringing them up and tearing them down after few hours.

Changed in juju-core:
importance: High → Critical
Changed in juju-core:
milestone: 2.0-beta13 → 2.0-beta12
Changed in juju-core:
assignee: nobody → Menno Smits (menno.smits)
assignee: Menno Smits (menno.smits) → Tim Penhey (thumper)
Changed in juju-core:
assignee: Tim Penhey (thumper) → Menno Smits (menno.smits)
Changed in juju-core:
status: Triaged → In Progress
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

It looks like the recent addition of the "_id" field to the log sort fields was the underlying cause.

Proposed fix here: https://github.com/juju/juju/pull/5785

Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
affects: juju-core → juju
Changed in juju:
milestone: 2.0-beta12 → none
milestone: none → 2.0-beta12
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.