"cannot allocate memory" when running "juju run"

Bug #1382556 reported by Jacek Nykis
32
This bug affects 9 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Critical
Nate Finch
1.25
Fix Released
Critical
Nate Finch

Bug Description

I executed:
juju run "uptime" --all

and is returned messages like this for every machine:
- Error: 'fork/exec /usr/bin/ssh: cannot allocate memory'
  MachineId: "9"
  Stdout: ""

I can "juju ssh" to my instances fine and they all have memory available.

According to the documentation above command should be fine:
https://github.com/juju/cheatsheet
"juju run "uname -a" --all - Run a command (uname) on every instance. (1.17.2 and newer)"

I am running juju 1.18.4

Abel Deuring (adeuring)
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
tags: added: run
Curtis Hovey (sinzui)
Changed in juju-core:
importance: High → Medium
Revision history for this message
Kapil Thangavelu (hazmat) wrote :

interesting.. so the real question is do you have memory on machine 0, as juju run effectively does a parallel ssh from there which the error message is referencing.

Revision history for this message
Nicolas Thomas (thomnico) wrote :

I can reproduce easily :
 Error: 'fork/exec /usr/bin/ssh: cannot allocate memory'
  MachineId: 7/lxc/1
  Stdout: ""

$juju ssh 0 free
             total used free shared buffers cached
Mem: 4048280 3861856 186424 60 13192 207888
-/+ buffers/cache: 3640776 407504
Swap: 4047868 815052 3232816

Meanwhile there is memory available.

I noticed this :
top - 10:06:09 up 19:24, 1 user, load average: 3.71, 2.67, 2.37
Tasks: 96 total, 3 running, 93 sleeping, 0 stopped, 0 zombie
%Cpu(s): 35.2 us, 9.3 sy, 0.1 ni, 47.8 id, 4.9 wa, 0.0 hi, 2.7 si, 0.0 st
KiB Mem: 4048280 total, 3861844 used, 186436 free, 13296 buffers
KiB Swap: 4047868 total, 807168 used, 3240700 free. 208476 cached Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16637 root 20 0 5683348 2.923g 12772 R 85.0 75.7 582:43.67 jujud
16579 root 20 0 7467200 201732 13932 S 39.2 5.0 438:24.19 mongod
    8 root 20 0 0 0 0 R 6.5 0.0 0:15.88 rcuos/0
    1 root 20 0 37208 2660 704 S 0.0 0.1 0:03.14 init

Whereas the environment is deployed and idle

tags: added: cpe-critsit
Revision history for this message
James Page (james-page) wrote :

Also observed this after an upgrade from 1.24.7 to 1.25.0

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

Can we add Juju ssh tests to all the units we create in the course of our usual scale testing, Mark Ramm?

Revision history for this message
Andrew McDermott (frobware) wrote :

@james-page, @thomnico: please could you login to the node and run:

 $ kill -SIGQUIT $(pgrep jujud)

and attach the machine-0.log from /var/log/juju. I want to see how many go routines are currently active.

I fixed a similar issue recently: https://bugs.launchpad.net/juju-core/+bug/1496750

Changed in juju-core:
importance: Medium → Critical
Revision history for this message
Andrew McDermott (frobware) wrote :

@thomnico - I noticed your machine reports an uptime of 19 hours. Has the machine been quiescent all that time? Do you have any steps I can crib to reproduce this issue?

Revision history for this message
Dave Cheney (dave-cheney) wrote : Re: [Bug 1382556] Re: "cannot allocate memory" when running "juju run"

It looks like the machine is not idle, jujud is pegging one code, and
mongodb is consuming 50% of another. Something has gone into a restart
loop.

On Sat, Nov 7, 2015 at 1:22 AM, Andrew McDermott
<email address hidden> wrote:
> @thomnico - I noticed your machine reports an uptime of 19 hours. Has
> the machine been quiescent all that time? Do you have any steps I can
> crib to reproduce this issue?
>
> --
> You received this bug notification because you are subscribed to juju-
> core.
> Matching subscriptions: MOAR JUJU SPAM!
> https://bugs.launchpad.net/bugs/1382556
>
> Title:
> "cannot allocate memory" when running "juju run"
>
> Status in juju-core:
> Triaged
> Status in juju-core 1.25 series:
> New
>
> Bug description:
> I executed:
> juju run "uptime" --all
>
> and is returned messages like this for every machine:
> - Error: 'fork/exec /usr/bin/ssh: cannot allocate memory'
> MachineId: "9"
> Stdout: ""
>
> I can "juju ssh" to my instances fine and they all have memory
> available.
>
> According to the documentation above command should be fine:
> https://github.com/juju/cheatsheet
> "juju run "uname -a" --all - Run a command (uname) on every instance. (1.17.2 and newer)"
>
> I am running juju 1.18.4
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1382556/+subscriptions

Revision history for this message
Andrew McDermott (frobware) wrote :

I see the same memory growth but without the upgrade step. I bootstrapped with 1.25, add additional machines via ensure-availability, and I noticed today that the jujud machine process was around 1GB in size.

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

This issue was discussed in detail on juju-dev a while back: https://lists.ubuntu.com/archives/juju-dev/2015-June/004566.html

With the way Go currently works, it seems like our only practical solutions are to either:

1. Increase the system swap
2. Use a separate "runner" service that Juju calls (via an API) to run external commands. (as per John's initial email in thread above)

Revision history for this message
Dave Cheney (dave-cheney) wrote :

Yes, those stop gap measures will work for a while.

Why is nobody interested in investigating why the Juju process has
ballooned to 2.93gb RSS and spinning at 100% cpu ?

On Mon, Nov 9, 2015 at 7:53 AM, Menno Smits <email address hidden> wrote:
> This issue was discussed in detail on juju-dev a while back:
> https://lists.ubuntu.com/archives/juju-dev/2015-June/004566.html
>
> With the way Go currently works, it seems like our only practical
> solutions are to either:
>
> 1. Increase the system swap
> 2. Use a separate "runner" service that Juju calls (via an API) to run external commands. (as per John's initial email in thread above)
>
> --
> You received this bug notification because you are subscribed to juju-
> core.
> Matching subscriptions: MOAR JUJU SPAM!
> https://bugs.launchpad.net/bugs/1382556
>
> Title:
> "cannot allocate memory" when running "juju run"
>
> Status in juju-core:
> Triaged
> Status in juju-core 1.25 series:
> New
>
> Bug description:
> I executed:
> juju run "uptime" --all
>
> and is returned messages like this for every machine:
> - Error: 'fork/exec /usr/bin/ssh: cannot allocate memory'
> MachineId: "9"
> Stdout: ""
>
> I can "juju ssh" to my instances fine and they all have memory
> available.
>
> According to the documentation above command should be fine:
> https://github.com/juju/cheatsheet
> "juju run "uname -a" --all - Run a command (uname) on every instance. (1.17.2 and newer)"
>
> I am running juju 1.18.4
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1382556/+subscriptions

Revision history for this message
Andrew McDermott (frobware) wrote :

This is a leak, IMO, and I would guess held open by all the go routines that are still active (see below) and the active memory set those go routines close over.

I modified the runtime to make all allocations through runtime.malloc() * 1000 (on go 1.2), otherwise the growth rate was too slow. Relatively quickly you see the memory grow to around 30% of my VM.

There are many go routines active:

root@maas-node2:/var/log/juju# grep goroutine machine-0.log | wc -l
694

$ grep -A 10 maas2-home ~/.juju/environments.yaml
    maas2-home:
        type: maas
        maas-server: 'http://10.17.17.200/MAAS/'
        maas-oauth: '
        bootstrap-timeout: 1800
        enable-os-refresh-update: false
        enable-os-upgrade: false
        default-series: trusty
        logging-config: <root>=DEBUG

There is no load on my machine:

ubuntu@maas-node2:~$ uptime
 08:38:39 up 23:09, 1 user, load average: 0.00, 0.03, 0.05

top - 08:40:32 up 23:11, 1 user, load average: 0.19, 0.08, 0.06
Tasks: 82 total, 1 running, 81 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.6 us, 0.5 sy, 0.0 ni, 98.4 id, 0.4 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 1017856 total, 863576 used, 154280 free, 83444 buffers
KiB Swap: 2035708 total, 35852 used, 1999856 free. 349028 cached Mem

ubuntu@maas-node2:~$ top -b -n 1 | egrep '(jujud|mongod)'
 3901 root 20 0 4156184 51236 9496 S 0.0 5.0 5:54.65 mongod
10321 root 20 0 1349356 316960 18060 S 0.0 31.1 0:05.52 jujud

There are no services deployed either:

$ juju status
environment: maas2-home
machines:
  "0":
    agent-state: started
    agent-version: 1.25.1.1
    dns-name: maas-node2.maas
    instance-id: /MAAS/api/1.0/nodes/node-2ca6a94c-4db7-11e5-94e8-525400a09c8e/
    series: trusty
    hardware: arch=amd64 cpu-cores=1 mem=1024M tags=virtual
    state-server-member-status: has-vote
services: {}

Changed in juju-core:
status: Triaged → In Progress
assignee: nobody → Eric Snow (ericsnowcurrently)
milestone: none → 1.26-alpha2
Revision history for this message
Katherine Cox-Buday (cox-katherine-e) wrote :

Bug 1496016 may be related.

Revision history for this message
Eric Snow (ericsnowcurrently) wrote :

From what I can tell there are 2 different issues here:

1. memory/goroutine leaks (e.g. see https://www.vividcortex.com/blog/2014/01/15/two-go-memory-leaks/)
2. external commands are run via a fork

For the leaks, we should make the effort to track down all leaks and fix them, regardless of how they manifest.

For the external commands, there are several considerations:

1. exec.Cmd.Start() (and Run) is a light wrapper around os.StartProcess()
2. os.StartProcess() is a wrapper around syscall.ForkExec()
3. forks should be using COW, but there may be Go runtime (or Juju) impact on the realized benefit
4. in core exec.Cmd.Start() is called (at least indirectly) through juju/utils/exec.go, juju/utils/ssh, and utils/exec
5. the "forker" stategy John Meinel proposed on the list should help if done early in jujud's main
6. updating juju/utils/ssh to use the forker daemon would have the biggest immediate impact

I spent some time today looking into a reasonable approach to the forker strategy, with special consideration for the juju/utils/ssh package.

Revision history for this message
Dave Cheney (dave-cheney) wrote :

> 3. forks should be using COW, but there may be Go runtime (or Juju) impact on the realized benefit

This is correct, but from the kernel's point of view the child process
_may_ touch every page it interpreted so the kernel accounts for the
child as 100% of the cost of the parent. The accounting is affected by
the amount of swap available, and the kernel's overcommit setting.

On Tue, Nov 10, 2015 at 11:56 AM, Eric Snow <email address hidden> wrote:
> >From what I can tell there are 2 different issues here:
>
> 1. memory/goroutine leaks (e.g. see https://www.vividcortex.com/blog/2014/01/15/two-go-memory-leaks/)
> 2. external commands are run via a fork
>
> For the leaks, we should make the effort to track down all leaks and fix
> them, regardless of how they manifest.
>
> For the external commands, there are several considerations:
>
> 1. exec.Cmd.Start() (and Run) is a light wrapper around os.StartProcess()
> 2. os.StartProcess() is a wrapper around syscall.ForkExec()
> 3. forks should be using COW, but there may be Go runtime (or Juju) impact on the realized benefit
> 4. in core exec.Cmd.Start() is called (at least indirectly) through juju/utils/exec.go, juju/utils/ssh, and utils/exec
> 5. the "forker" stategy John Meinel proposed on the list should help if done early in jujud's main
> 6. updating juju/utils/ssh to use the forker daemon would have the biggest immediate impact
>
> I spent some time today looking into a reasonable approach to the forker
> strategy, with special consideration for the juju/utils/ssh package.
>
> --
> You received this bug notification because you are subscribed to juju-
> core.
> Matching subscriptions: MOAR JUJU SPAM!
> https://bugs.launchpad.net/bugs/1382556
>
> Title:
> "cannot allocate memory" when running "juju run"
>
> Status in juju-core:
> In Progress
> Status in juju-core 1.25 series:
> In Progress
>
> Bug description:
> I executed:
> juju run "uptime" --all
>
> and is returned messages like this for every machine:
> - Error: 'fork/exec /usr/bin/ssh: cannot allocate memory'
> MachineId: "9"
> Stdout: ""
>
> I can "juju ssh" to my instances fine and they all have memory
> available.
>
> According to the documentation above command should be fine:
> https://github.com/juju/cheatsheet
> "juju run "uname -a" --all - Run a command (uname) on every instance. (1.17.2 and newer)"
>
> I am running juju 1.18.4
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1382556/+subscriptions

Revision history for this message
Eric Snow (ericsnowcurrently) wrote :

I've been working on an implementation of the "forker" strategy. A patch should be up for review tomorrow.

Revision history for this message
Eric Snow (ericsnowcurrently) wrote :

Unfortunately the forker approach involves a lot of touches in juju. In order to minimize any potential negative impact I'm taking extra steps. Thus it is taking longer than I'd hoped. That said, it's looking good for tomorrow.

Revision history for this message
Eric Snow (ericsnowcurrently) wrote :

I also briefly looked into using posix_spawn() or vfork() to get around the issues with fork+exec but making that work involves delicate work (incl. cgo and syscalls) that likely wouldn't be worth the trouble.

Revision history for this message
Eric Snow (ericsnowcurrently) wrote :

I have the first part of the solution up for review: http://reviews.vapour.ws/r/3141/.

The next patch will update juju to use the abstraction that the first patch introduces.

Revision history for this message
Eric Snow (ericsnowcurrently) wrote :

Given the feedback I've gotten on that first patch, I'm going to table it and work up a short-term hack for the problem.

Revision history for this message
William Reade (fwereade) wrote :

-1 to a "short-term hack" approach. Please direct your attention towards the excessive memory usage; and let me know if I can help at all there.

Revision history for this message
Eric Snow (ericsnowcurrently) wrote :

ack

Revision history for this message
Katherine Cox-Buday (cox-katherine-e) wrote :

We can either put in a stop-gap (i.e. "short term hack"), bump this from the 1.25.1 milestone, or take the stance that it's OK for the 1.25.1 milestone to drag out a bit. We'll need guidance from Cheryl on which approach she'd like us to take.

Revision history for this message
Eric Snow (ericsnowcurrently) wrote :

I'm splitting out the larger (than juju-run) issues. They have less impact for the juju-run bug here but require substantially more work than this bug requires.

  lp:1516669
  lp:1516676

Also related but not necessary for this bug:

  lp:1516668

Revision history for this message
Nate Finch (natefinch) wrote :

Note that our current strategy on this bug is to prevent juju run --all from spawning N concurrent processes all at once (where N is the number of machines in the environment), since each fork initially reserves an equal amount of memory as juju and thus we can easily go OOM. Instead, we can limit the concurrency to a much smaller number. This will likely slow down the results somewhat, but slower results is better than no results.

Revision history for this message
Nate Finch (natefinch) wrote :

Note that I think the best stop gap we can do right now, something that is actually viable to get into 1.25, is to limit the parallelism in juju run so that we're not forking N processes simultaneously, where N == num machines in the environment. This is a pretty trivial fix, and at least will give us some breathing room.

Finding every major memory leak in 500,000 LOC is not likely something we can get done in any kind of short timeframe. Even when we do fix those leaks, forking is still going to be a problem if you have 150 machines in your environment and we're effectively multiplying Juju's memory footprint by 150, so the above parallelism limitation will still be needed and useful.

Changed in juju-core:
assignee: Eric Snow (ericsnowcurrently) → Nate Finch (natefinch)
Revision history for this message
Nate Finch (natefinch) wrote :

I can't actually reproduce this problem in the released version of 1.25.0. I have an EC2 environment with 10 machines and 3 LXC containers on each machine, all m1.smalls. Machine 0 has ~250megs free steady state... doing juju run --all does not seem to impact the free memory at all, and I don't see any out of memory problems.

Can anyone tell me what I'm doing wrong?

I have a fix coded up, but if I can reproduce the error, I can't know if the fix actually works.

Revision history for this message
Nate Finch (natefinch) wrote :

I upped it to 20 real machines... still no dice.

Changed in juju-core:
assignee: Nate Finch (natefinch) → Eric Snow (ericsnowcurrently)
Revision history for this message
Eric Snow (ericsnowcurrently) wrote :
Revision history for this message
Eric Snow (ericsnowcurrently) wrote :
Changed in juju-core:
assignee: Eric Snow (ericsnowcurrently) → Nate Finch (natefinch)
Nate Finch (natefinch)
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.