MadGraph5_aMC@NLO

Bug #1759375
Comment #3

Comment 3 for bug 1759375

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) wrote on 2018-03-29: Re: [Bug 1759375] unweighting issue for large production

Hi Philipp,

I have run this on a cluster so far and the error seems to be related to some issue in the model:
on the cluster, the cross-section evaluated is: nan +- nan

I have run the debugger and the problem seems to be for (some) of the channel of
d3d3bar_c2bare2

The contribution is so small that we hit some numerical inaccuracy.
(the code tries to evaluates 1e-171*1e-171 for example).
Other numerical issue are following that issue including some division by zero.

So you need to be smarter in the way you generate that sample.
(i.e. you have to simplify your model)
1) you have kept the mass of the light quark to be different of zero.
This is highly inefficient for LHC run.
2) your model seems to have extremely small coupling.

so what I suggest is that you take your param_card and
1) set all the mass of the light quark to zero (including the b mass since it is in the initital state)
2) move that card to your UFO model directory under the name restrict_mybenchmark.dat
3) then change your way to import your model
import model UFO_MSSMTriRpV-mybenchmark

By doing this,
1) the survey runs in 2 second instead of 20 min!
2) you avoid such numerical issue.

Cheers,

Olivier

> On 28 Mar 2018, at 13:32, Philipp Millet <email address hidden> wrote:
>
> Dear Olivier,
>
> attached to this message you will find the run/param/proc card and the
> UFO model to reproduce this bug. I am using MG 2_6_1 and LHAPDF 6.1.5.
>
> Thanks a lot,
> Philipp
>
> ** Attachment added: "cmssm_m0_1030_m12_940.tar.gz"
> https://bugs.launchpad.net/mg5amcnlo/+bug/1759375/+attachment/5093479/+files/cmssm_m0_1030_m12_940.tar.gz
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1759375
>
> Title:
> unweighting issue for large production
>
> Status in MadGraph5_aMC@NLO:
> New
>
> Bug description:
> Dear all,
>
> MG_2_6_1 crashes with the following error message while combining
> events.
>
> Command "generate_events run_01" interrupted with error:
> UnboundLocalError : local variable 'max_wgt' referenced before assignment
> Please report this bug on https://bugs.launchpad.net/mg5amcnlo
> More information is found in '/disk1/millet/CMSSW/CMSSW_7_1_21/src/mc-production/hadronization/output/test/MG5_aMC_v2_6_1/output_cmssm/run_01_cmssm_debug.log'.
> Please attach this file to your report.
>
> Please find the MG output, the logfile mentioned in the error message (also contains param/run card) below. Does anybody know why this crash happens? The same model works fine for different param cards / a subset of the processes.
> If you need more information (or e.g. the UFO model I am using) please let me know.
>
> Any help would be greatly appreciated,
> Philipp
>
> files:
> https://cernbox.cern.ch/index.php/s/1PPK37fIXOoBY1P
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mg5amcnlo/+bug/1759375/+subscriptions

Hi Philipp,

I have run this on a cluster so far and the error seems to be related to some issue in the model:
on the cluster, the cross-section evaluated is: nan +- nan

I have run the debugger and the problem seems to be for (some) of the channel of 
d3d3bar_c2bare2

So you need to be smarter in the way you generate that sample.
(i.e. you have to simplify your model)
1) you have kept the mass of the light quark to be different of zero.
This is highly inefficient for LHC run. 
2) your model seems to have extremely small coupling.

so what I suggest is that you take your param_card and 
1) set all the mass of the light quark to zero  (including the b mass since it is in the initital state)
2) move that card to your UFO model directory under the name restrict_mybenchmark.dat
3) then change your way to import your model
import model UFO_MSSMTriRpV-mybenchmark

By doing this,
1)  the survey runs in 2 second instead of 20 min!
2)  you avoid such numerical issue.

Cheers,

Olivier

> On 28 Mar 2018, at 13:32, Philipp Millet <millet@physik.rwth-aachen.de> wrote:
> 
> Dear Olivier,
> 
> attached to this message you will find the run/param/proc card and the
> UFO model to reproduce this bug. I am using MG 2_6_1 and LHAPDF 6.1.5.
> 
> Thanks a lot,
> Philipp
> 
> ** Attachment added: "cmssm_m0_1030_m12_940.tar.gz"
>   https://bugs.launchpad.net/mg5amcnlo/+bug/1759375/+attachment/5093479/+files/cmssm_m0_1030_m12_940.tar.gz
> 
> -- 
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1759375
> 
> Title:
>  unweighting issue for large production
> 
> Status in MadGraph5_aMC@NLO:
>  New
> 
> Bug description:
>  Dear all,
> 
>  MG_2_6_1 crashes with the following error message while combining
>  events.
> 
>  Command "generate_events run_01" interrupted with error:
>  UnboundLocalError : local variable 'max_wgt' referenced before assignment
>  Please report this bug on https://bugs.launchpad.net/mg5amcnlo
>  More information is found in '/disk1/millet/CMSSW/CMSSW_7_1_21/src/mc-production/hadronization/output/test/MG5_aMC_v2_6_1/output_cmssm/run_01_cmssm_debug.log'.
>  Please attach this file to your report.
> 
>  Please find the MG output, the logfile mentioned in the error message (also contains param/run card) below. Does anybody know why this crash happens? The same model works fine for different param cards / a subset of the processes.
>  If you need more information (or e.g. the UFO model I am using) please let me know.
> 
>  Any help would be greatly appreciated,
>  Philipp
> 
>  files:
>  https://cernbox.cern.ch/index.php/s/1PPK37fIXOoBY1P
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mg5amcnlo/+bug/1759375/+subscriptions