qemu-arm-static chroots give copious memory errors when setting up java build dependencies

Bug #906922 reported by Nick Moffitt on 2011-12-20
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro QEMU
Fix Released
Undecided
Unassigned
qemu-linaro (Ubuntu)
Medium
Loïc Minier

Bug Description

When building ant in a qemu-arm-static chroot (0.15.91-2011.11-0ubuntu1~0.IS.10.04.2) on an amd64 Hardy host (PPA buildd environment) we see the following during build dependency satisfaction:

 Setting up gcj-4.6-jre-headless (4.6.2-2ubuntu3) ...
 GC Warning: Out of Memory! Returning NIL!
 GC Warning: Out of Memory! Returning NIL!
[...ad nauseam...]
 GC Warning: Out of Memory! Returning NIL!
 qemu: uncaught target signal 11 (Segmentation fault) - core dumped
 dpkg: error processing libecj-java-gcj (--purge):
  subprocess installed post-removal script returned error exit status 2

Trying http://archive.ubuntu.com/ubuntu/pool/universe/q/qemu-linaro/qemu-user-static_1.0-2011.12-0ubuntu1_amd64.deb on osageorange's oneiric-armel chroot, we get much the same thing from libecj-java-gcj.

It would seem that there's some fatal interaction between qemu-arm-static and some way in which gcj or its maintainer scripts manage memory.

I've attached the ant build log, but this has of course affected other java packages that share these dependencies. I'm happy to help demonstrate the issue on osageorange if needs be (an apt-get -f install is currently all that's needed to trigger it again).

Nick Moffitt (nick-moffitt) wrote :

Er sorry, of course the buildd was in a precise chroot, not a Hardy one.

Steve Langasek (vorlon) wrote :

Matthias, could this be related to the memory mapping issues previously identified on ARM that affect Java? If so, can you explain what the issue is so that we can get this fixed in qemu-linaro?

Changed in qemu-linaro (Ubuntu):
assignee: nobody → Matthias Klose (doko)
importance: Undecided → Medium
status: New → Triaged
Nick Moffitt (nick-moffitt) wrote :

Hmm. So I was going to file another bug about a binutils build on precise running out of ptys, but on osageorange's oneiric-armel chroot, it exhausts virtual memory:

 g++ -DHAVE_CONFIG_H -I. -I../../gold -I../../gold -I../../gold/../include -I../../gold/../elfcpp -DLOCALEDIR="\"/usr/share/locale\"" -DBINDIR="\"/usr/bin\"" -DTOOLBINDIR="\"/usr/arm-linux-gnueabi/bin\"" -DMULTIARCH_DIRNAME=\"arm-linux-gnueabi\" -W -Wall -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -frandom-seed=options.o -g -O2 -MT options.o -MD -MP -MF .deps/options.Tpo -c -o options.o ../../gold/options.cc
 virtual memory exhausted: Cannot allocate memory
 make[5]: *** [options.o] Error 1

Do you think this could be related, or should I file a separate bug?

Nick Moffitt (nick-moffitt) wrote :

The same thing happened to me with bzr: precise build using 0.15.91 hit max open files limit, but oneiric build using 1.0 did the following:

 gcc -pthread -fno-strict-aliasing -g -O0 -Wall -Wstrict-prototypes -g -O2 -fPIC -I/usr/include/python2.7_d -c bzrlib/_dirstate_helpers_pyx.c -o build/temp.linux-armv7l-2.7-pydebug/bzrlib/_dirstate_helpers_pyx.o
 bzrlib/_dirstate_helpers_pyx.c: In function '__pyx_pf_6bzrlib_21_dirstate_helpers_pyx_5bisect_dirblock':
 bzrlib/_dirstate_helpers_pyx.c:3554:13: warning: variable '__pyx_v_cache' set but not used [-Wunused-but-set-variable]
 virtual memory exhausted: Cannot allocate memory

   Cannot build extension "bzrlib._dirstate_helpers_pyx".
   Use "build_ext --allow-python-fallback" to use slower python implementations instead.

 error: command 'gcc' failed with exit status 1
 [150659 refs]
 dh_auto_build: python-dbg setup.py build --force returned exit code 1
 make[1]: *** [override_dh_auto_build] Error 1

Matthias Klose (doko) wrote :

building the test case from bug 861296:

(oneiric-armel)doko@osageorange:~/tmp$ gcc -g mmap-test.c
(oneiric-armel)doko@osageorange:~/tmp$ ./a.out
Couldn't allocate the heap: 32Mb

Matthias Klose (doko) wrote :

so this triggers with the first alloc; I don't think that has something to do with the current 2GB limit

Peter Maydell (pmaydell) wrote :

An untested workaround you could try: arrange to set the environment variable QEMU_RESERVED_VA=0xf7000000 so the qemu in the chroot can see it.

Steve Langasek (vorlon) wrote :

double-checked that this symptom is the same with both 1.0 and 0.15.91, so yes, it doesn't seem related to the 2GB limit that affects java elsewhere - even though the behavior between 1.0 and 0.15.91 is different for bzr and binutils.

Changed in qemu-linaro (Ubuntu):
assignee: Matthias Klose (doko) → nobody
Steve Langasek (vorlon) wrote :

Peter, I can confirm that export QEMU_RESERVED_VA=0xf7000000 fixes this for Matthias's test case.

Peter Maydell (pmaydell) wrote :

Right, so this is really down to qemu not being very good at handling mmap() in the 32-bit-guest-on-64-bit-host case -- it tends to fail mmap() even when there's more address space available because it hasn't managed to get the host kernel to allocate it within the 32-bit region of the address space the guest can see.

We're currently tossing around the idea of making (the equivalent of) that reserved-va setting the default upstream.

Nick Moffitt (nick-moffitt) wrote :

Steve: if this can be worked around with the QEMU_RESERVED_VA environment variable, then is there any straightforward place to put this (such as in the binfmt glue) for right now? I wonder if this could be worked around in the packaged version easily.

Steve Langasek (vorlon) wrote :

I don't think there's a good way to put env vars into the binfmt glue; I think we'd be better off patching it into qemu itself the way Peter says upstream is considering.

Nick Moffitt (nick-moffitt) wrote :

Has anyone here or upstream got a good patch to do this? We need a workaround as soon as possible, and my initial hacky attempts don't seem to have worked.

Loïc Minier (lool) wrote :

Peter, is there a chance of regressing important and currently working functionality by setting QEMU_RESERVED_VA=0xf7000000 as the default in qemu-linaro in Ubuntu? If not, any reason for not doing it upstream until a better mmap implementation appears?

Peter Maydell (pmaydell) wrote :

Loic: You don't want to do it on 32 bit platforms, but for 64 bit hosts I think it should be OK.

The only thing that I can think of that is likely to break is if the user has a "ulimit -v" setting which we would now be breaching.

Peter Maydell (pmaydell) wrote :

Oh, and also you (probably) don't want to set the environment variable for running 64 bit guests as I suspect it will unnecessarily restrict the total amount of RAM that they can use.

Re: doing it upstream: the current status of the discussion is here (plus followups):
 http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg01697.html
(that patch is the equivalent of setting the environment variable for 64 bit hosts running 32 bit guests).

Basically there's general agreement that defaulting -R is probably the right thing, but it needs a little more investigation for the interactions with (a) ARM commpage and (b) user specified ulimit -v, and nobody's got round to actually investigating, writing and submitting a clean patch for this.

Loïc Minier (lool) on 2012-02-17
Changed in qemu-linaro (Ubuntu):
assignee: nobody → Loïc Minier (lool)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu-linaro - 1.0.50-2012.01-0ubuntu4

---------------
qemu-linaro (1.0.50-2012.01-0ubuntu4) precise; urgency=low

  * New patch, 0001_linux-user-reserve-4GB-of-vmem-for-32-on-64, from
    http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg01697.html; fixes
    mmap when running a 32-bits guest on a 64-bits host; LP: #906922.
 -- Loic Minier <email address hidden> Fri, 17 Feb 2012 11:27:00 +0100

Changed in qemu-linaro (Ubuntu):
status: Triaged → Fix Released
Peter Maydell (pmaydell) wrote :

This workaround turns out to cause some regressions in other cases, for example:
http://comments.gmane.org/gmane.comp.emulators.qemu/138180

which seems to be because when we use -R (either explicitly or implicitly because of this patch) we tend to map the guest stack immediately above the guest data/BSS segment. This means brk() will always fail, which is bad news for guest binaries that rely on it.

Steve Langasek (vorlon) wrote :

Nick, I think you're certainly exercising qemu-arm-static more than anyone else at the moment with the autobuilder prototyping. Are you seeing different regressions now as a result of this patch? Should we consider reverting this patch until there's a clean upstream solution?

On Fri, Mar 02, 2012, Peter Maydell wrote:
> This workaround turns out to cause some regressions in other cases, for example:
> http://comments.gmane.org/gmane.comp.emulators.qemu/138180
>
> which seems to be because when we use -R (either explicitly or
> implicitly because of this patch) we tend to map the guest stack
> immediately above the guest data/BSS segment. This means brk() will
> always fail, which is bad news for guest binaries that rely on it.

 Do you recommend reverting it?

--
Loïc Minier

Peter Maydell (pmaydell) wrote :

I'm not sure. I don't have a feel for whether it has fixed more cases than it has broken or vice-versa, I'm afraid.

Nick Moffitt (nick-moffitt) wrote :

So currently all of the armel PPA buildds that are in auto-mode on production launchpad are using qemu-arm-static to run their build chroots.

On staging I saw a dramatic decrease in the number of OOM-type failures and catastrophic mmap() failures, with a slight uptick in sig11s (on different packages, and hard to reproduce via re-builds). We decided it was Good Enough For Now and moved things into production.

Probably the best way to stress-test this now would be to run a bunch of armel PPA builds through production launchpad (say, a precise main rebuild or something) and examine the build failures. We definitely felt that this patch gave a worthwhile improvement over the old behavior.

Loïc Minier (lool) wrote :

In my case, I'm using qemu-arm-static from precise against an unstable Debian armel chroot and that used to work fine but now I get:
bash: xmalloc: ../bash/variables.c:1969: cannot allocate 28 bytes (24576 bytes allocated)
when starting bash; if I run "sh" (which is dash) I don't have the problem.

From sh, I can reproduce the issue:
# bash
bash: xmalloc: ../bash/variables.c:1969: cannot allocate 28 bytes (24576 bytes allocated)
# QEMU_RESERVED_VA=0 bash
(unstable-armel)root@bee:~# exit
# QEMU_RESERVED_VA=0xf7000000 bash
bash: xmalloc: ../bash/variables.c:2159: cannot allocate 26 bytes (24576 bytes allocated)

Steve Langasek (vorlon) wrote :

On Fri, Mar 09, 2012 at 08:35:44PM -0000, Loïc Minier wrote:
> In my case, I'm using qemu-arm-static from precise against an unstable Debian armel chroot and that used to work fine but now I get:
> bash: xmalloc: ../bash/variables.c:1969: cannot allocate 28 bytes (24576 bytes allocated)
> when starting bash; if I run "sh" (which is dash) I don't have the problem.

> >From sh, I can reproduce the issue:
> # bash
> bash: xmalloc: ../bash/variables.c:1969: cannot allocate 28 bytes (24576 bytes allocated)
> # QEMU_RESERVED_VA=0 bash
> (unstable-armel)root@bee:~# exit
> # QEMU_RESERVED_VA=0xf7000000 bash
> bash: xmalloc: ../bash/variables.c:2159: cannot allocate 26 bytes (24576 bytes allocated)

Yes, this is one of the known regressions being discussed upstream, from
what I saw.

Peter Maydell (pmaydell) wrote :

Yes, I've seen the bash failures too. They should be fixed by http://patchwork.ozlabs.org/patch/144476/ which I'm intending to put into qemu-linaro for next week's release.

Peter Maydell (pmaydell) wrote :

I've committed to qemu-linaro the default-to-R-on-64-bit-hosts patch (so Steve, you'll want to drop it from the packaging) and also the followup patch which fixes the bash issues Loic lists. These will both be in qemu-linaro 2011.03 (due this Thursday!)

Changed in qemu-linaro:
milestone: none → 2012.03
status: New → Fix Committed
Peter Maydell (pmaydell) on 2012-03-15
Changed in qemu-linaro:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers