MadGraph5_aMC@NLO

ttDM EFT production : madspin issue (?)

Bug #1435389 reported by mia tosi on 2015-03-23

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	MadGraph5_aMC@NLO	Won't Fix	Undecided	Pierre Artoisenet

Bug Description

I'm trying to generate lhe files using the EffDM_UFO-no_b_mass model
the process I'm interested in is ttbar + DM DM + 0/1/2 jets
and I'm using MadGraph5_aMCatNLO [version 2.2.2]
and MadSpin for the top and W decays

while I am able to produce lhe files up to 1 jet multiplicity,
I'm not able to get the output for 2 jets in the final state

the madgraph step works, while apparently madspin goes out of memory
please, find as attachment
* the MS_debug file
* the datacards
* the model

in case additional info is needed, please let me know
thanks much

Revision history for this message

mia tosi (mia-tosi-1) wrote on 2015-03-23:

MS_debug Edit (6.1 KiB, text/plain)

Revision history for this message

mia tosi (mia-tosi-1) wrote on 2015-03-23:

customize card Edit (314 bytes, application/x-ns-proxy-autoconfig)

Revision history for this message

mia tosi (mia-tosi-1) wrote on 2015-03-23:

madspin card Edit (1.4 KiB, application/x-ns-proxy-autoconfig)

Revision history for this message

mia tosi (mia-tosi-1) wrote on 2015-03-23:

param card Edit (6.0 KiB, application/x-ns-proxy-autoconfig)

Revision history for this message

mia tosi (mia-tosi-1) wrote on 2015-03-23:

proc card Edit (1.4 KiB, application/x-ns-proxy-autoconfig)

Revision history for this message

mia tosi (mia-tosi-1) wrote on 2015-03-23:

run card Edit (13.6 KiB, application/x-ns-proxy-autoconfig)

Revision history for this message

mia tosi (mia-tosi-1) wrote on 2015-03-23:

output Edit (63.6 MiB, application/octet-stream)

Revision history for this message

mia tosi (mia-tosi-1) wrote on 2015-03-23:

model Edit (310.0 KiB, application/x-tar)

Revision history for this message

mia tosi (mia-tosi-1) wrote on 2015-03-23:

I forgot to stress one probably important point

I was able to produce privately 1M lhe events using
MG5v1.5.11_CERN_23082013_patched19092013.tar.gz

Revision history for this message

mia tosi (mia-tosi-1) wrote on 2015-03-23:

#10

while the MadSpin step was successfully done w/ MG5_aMC_v2_1_2_beta2

Olivier Mattelaer (olivier-mattelaer) on 2015-03-23

Changed in mg5amcnlo:
assignee:	nobody → Pierre Artoisenet (partois)

Revision history for this message

Kristian Hahn (kristian-hahn) wrote on 2015-03-24:

#11

I find similar behavior when generating t t~ DM DM with an explicit mediator in a simplified model. MadSpin ultimately succeeds but claims ~60 GB of physical memory. This result is with v2.2.3. I noticed a related update in v2.2.2: "OM: Reduce the amount of RAM used by MadSpin in gridpack mode." Is MadSpin memory optimization for gridpack still on-going?

Revision history for this message

Kristian Hahn (kristian-hahn) wrote on 2015-03-24:

#12

Failed to mention that the above was for +0,1,2j. Memory consumption with +0,1j peaks at about 3.5 GB,

Revision history for this message

mia tosi (mia-tosi-1) wrote on 2015-03-24:

#13

thanks for your feedback,
I've just launched a new test asking for 80GB (last time I was already setting the LSF jobs for having 60GB, actually)

hope to have some good news

but, in the mean time, let me cross check,
is there any patch available in order to fix such an issue ?
thanks

Revision history for this message

mia tosi (mia-tosi-1) wrote on 2015-03-26:

#14

are there any news on possible fix of this memory issue ?
thanks much
mia

Revision history for this message

Pierre Artoisenet (partois) wrote on 2015-03-26:

#15

Hi Olivier, Mia,

I have no idea. Do I understand correctly from the log file that the compilation of the helas library fails ?
I will try to reproduce the bug tomorrow

Pierre

Revision history for this message

mia tosi (mia-tosi-1) wrote on 2015-03-26:

#16

thanks much
mia

Revision history for this message

mia tosi (mia-tosi-1) wrote on 2015-03-30:

#17

ciao,
are there any news on this issue ?

please, find the output I got from my last (still failing) test

Resource usage summary:

    CPU time : 39237.86 sec.
    Max Memory : 76613 MB
    Max Swap : 77789 MB

Max Processes : 33
Max Threads : 33

mia

Revision history for this message

Pierre Artoisenet (partois) wrote on 2015-03-30:

#18

I could not reproduce the bug because the compilation of the code in the *production event* phase takes forever on my latop ...
How much time did it take on your machine ?

Pierre

Revision history for this message

mia tosi (mia-tosi-1) wrote on 2015-03-30: Re: [Bug 1435389] Re: ttDM EFT production : madspin issue (?)

#19

I'm running on LSF queues
it takes at least 8hours
mia

On Mon, Mar 30, 2015 at 2:29 PM, Pierre Artoisenet <
<email address hidden>> wrote:

> I could not reproduce the bug because the compilation of the code in the
> *production event* phase takes forever on my latop ...
> How much time did it take on your machine ?
>
>
> Pierre
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1435389
>
> Title:
> ttDM EFT production : madspin issue (?)
>
> Status in MadGraph5_aMC@NLO Generator:
> New
>
> Bug description:
> I'm trying to generate lhe files using the EffDM_UFO-no_b_mass model
> the process I'm interested in is ttbar + DM DM + 0/1/2 jets
> and I'm using MadGraph5_aMCatNLO [version 2.2.2]
> and MadSpin for the top and W decays
>
> while I am able to produce lhe files up to 1 jet multiplicity,
> I'm not able to get the output for 2 jets in the final state
>
> the madgraph step works, while apparently madspin goes out of memory
> please, find as attachment
> * the MS_debug file
> * the datacards
> * the model
>
> in case additional info is needed, please let me know
> thanks much
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mg5amcnlo/+bug/1435389/+subscriptions
>

Revision history for this message

Kristian Hahn (kristian-hahn) wrote on 2015-04-04:

#20

MS_debug Edit (5.7 KiB, text/plain)

Some more information:

- gridpack madspin for ttbar+chichi+0,1,2j just barely succeeds on our 64 GB headnode. Madspin memory consumption plateaus at 60GB. I actually don't understand why Madspin works on this machine, give what I observe on our compute nodes (below)

- the same process fails on our 96GB compute nodes. Here madspin memory consumption climbs to 77GB during the calculation of the full ME. Once it plateaus (~10 hours), MadSpin continues at 77 GB for ~1 hour and then crashes (MS_debug attached).

INFO: generating the full square matrix element (with decay)
INFO: generate p p > t t~ chi chi~ , (t~ > b~ w- , w- > all all QCD=99), (t > b w+ , w+ > all all QCD=99) @ 0 --no_warning=duplicate;add process p p > t t~ chi chi~ j , (t~ > b~ w- , w- > all all QCD=99), (t > b w+ , w+ > all all QCD=99) @ 1 --no_warning=duplicate;add process p p > t t~ chi chi~ j j , (t~ > b~ w- , w- > all all QCD=99), (t > b w+ , w+ > all all QCD=99) @ 2 --no_warning=duplicate;
Command "launch" interrupted with error:
MadGraph5Error : Impossible to compile /projects/d20385/gridpacking/work/work_S_MFM_10_MMed_1000_gSM_1.0_gDM_1.0/gpack/DMScalar_ttbar012j_mphi_1000_mchi_10_gSM_1p0_gDM_1p0/DMScalar_ttbar012j_mphi_1000_mchi_10_gSM_1p0_gDM_1p0_gridpack/work/process/madspingrid/full_me/Source directory
        Trying to launch make command returns:
            [Errno 12] Cannot allocate memory
        In general this means that your computer is not able to compile.
Please report this bug to developers

More information is found in 'MS_debug'.

Please attach this file to your report.

The crash seems to occur in various.misc compile, I believe in the call to subprocess.Popen. Apparently Popen calls fork ... which has led to some documented OOM problems for python users (eg: http://stackoverflow.com/questions/1216794/python-subprocess-popen-erroring-with-oserror-errno-12-cannot-allocate-memory). I've tried one of the simplest suggested fixes (ie: to manually call gc.collect before the Popen) but this does nothing.

What is actually being held in this 77GB? It seems that at the time of the crash MadSpin is just writing / compiling fortran ...

Thanks,
Kristian

Some more information:

- gridpack madspin for ttbar+chichi+0,1,2j just barely succeeds on our 64 GB headnode.  Madspin memory consumption plateaus at 60GB.  I actually don't understand why Madspin works on this machine, give what I observe on our compute nodes (below)

- the same process fails on our 96GB compute nodes.  Here madspin memory consumption climbs to 77GB during the calculation of the full ME.  Once it plateaus (~10 hours), MadSpin continues at 77 GB for ~1 hour and then crashes (MS_debug attached).

INFO: generating the full square matrix element (with decay) 
INFO: generate p p > t t~ chi chi~ , (t~ > b~ w- , w- > all all  QCD=99), (t > b w+ , w+ > all all  QCD=99) @ 0  --no_warning=duplicate;add process p p > t t~ chi chi~ j , (t~ > b~ w- , w- > all all  QCD=99), (t > b w+ , w+ > all all  QCD=99) @ 1  --no_warning=duplicate;add process p p > t t~ chi chi~ j j , (t~ > b~ w- , w- > all all  QCD=99), (t > b w+ , w+ > all all  QCD=99) @ 2  --no_warning=duplicate; 
Command "launch" interrupted with error:
MadGraph5Error : Impossible to compile /projects/d20385/gridpacking/work/work_S_MFM_10_MMed_1000_gSM_1.0_gDM_1.0/gpack/DMScalar_ttbar012j_mphi_1000_mchi_10_gSM_1p0_gDM_1p0/DMScalar_ttbar012j_mphi_1000_mchi_10_gSM_1p0_gDM_1p0_gridpack/work/process/madspingrid/full_me/Source directory
        Trying to launch make command returns:
            [Errno 12] Cannot allocate memory
        In general this means that your computer is not able to compile.
Please report this bug to developers

More information is found in 'MS_debug'.

Please attach this file to your report.

The crash seems to occur in various.misc compile, I believe in the call to subprocess.Popen.  Apparently Popen calls fork ... which has led to some documented OOM problems for python users (eg: http://stackoverflow.com/questions/1216794/python-subprocess-popen-erroring-with-oserror-errno-12-cannot-allocate-memory).  I've tried one of the simplest suggested fixes (ie: to manually call gc.collect before the Popen) but this does nothing.

What is actually being held in this 77GB?  It seems that at the time of the crash MadSpin is just writing / compiling fortran ...

Thanks,
Kristian

Olivier Mattelaer (olivier-mattelaer) on 2015-05-13

Changed in mg5amcnlo:
status:	New → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.