MadGraph5_aMC@NLO

unweighting issue for large production

Bug #1759375 reported by Olivier Mattelaer on 2018-03-27

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	MadGraph5_aMC@NLO	Won't Fix	Undecided	Unassigned

Bug Description

Dear all,

MG_2_6_1 crashes with the following error message while combining events.

Command "generate_events run_01" interrupted with error:
UnboundLocalError : local variable 'max_wgt' referenced before assignment
Please report this bug on https://bugs.launchpad.net/mg5amcnlo
More information is found in '/disk1/millet/CMSSW/CMSSW_7_1_21/src/mc-production/hadronization/output/test/MG5_aMC_v2_6_1/output_cmssm/run_01_cmssm_debug.log'.
Please attach this file to your report.

Please find the MG output, the logfile mentioned in the error message (also contains param/run card) below. Does anybody know why this crash happens? The same model works fine for different param cards / a subset of the processes.
If you need more information (or e.g. the UFO model I am using) please let me know.

Any help would be greatly appreciated,
Philipp

files:
https://cernbox.cern.ch/index.php/s/1PPK37fIXOoBY1P

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) wrote on 2018-03-27:

Hi,

This will be quite long to debug since I will need to run this to be able to reproduce it.
Can you send me (or better attach here) all the information to reproduce this issue?

Thanks,

Olivier

Revision history for this message

Philipp Millet (millet-u) wrote on 2018-03-28:

cmssm_m0_1030_m12_940.tar.gz Edit (1000.9 KiB, application/x-tar)

Dear Olivier,

attached to this message you will find the run/param/proc card and the UFO model to reproduce this bug. I am using MG 2_6_1 and LHAPDF 6.1.5.

Thanks a lot,
Philipp

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) wrote on 2018-03-29: Re: [Bug 1759375] unweighting issue for large production

Hi Philipp,

I have run this on a cluster so far and the error seems to be related to some issue in the model:
on the cluster, the cross-section evaluated is: nan +- nan

I have run the debugger and the problem seems to be for (some) of the channel of
d3d3bar_c2bare2

The contribution is so small that we hit some numerical inaccuracy.
(the code tries to evaluates 1e-171*1e-171 for example).
Other numerical issue are following that issue including some division by zero.

So you need to be smarter in the way you generate that sample.
(i.e. you have to simplify your model)
1) you have kept the mass of the light quark to be different of zero.
This is highly inefficient for LHC run.
2) your model seems to have extremely small coupling.

so what I suggest is that you take your param_card and
1) set all the mass of the light quark to zero (including the b mass since it is in the initital state)
2) move that card to your UFO model directory under the name restrict_mybenchmark.dat
3) then change your way to import your model
import model UFO_MSSMTriRpV-mybenchmark

By doing this,
1) the survey runs in 2 second instead of 20 min!
2) you avoid such numerical issue.

Cheers,

Olivier

> On 28 Mar 2018, at 13:32, Philipp Millet <email address hidden> wrote:
>
> Dear Olivier,
>
> attached to this message you will find the run/param/proc card and the
> UFO model to reproduce this bug. I am using MG 2_6_1 and LHAPDF 6.1.5.
>
> Thanks a lot,
> Philipp
>
> ** Attachment added: "cmssm_m0_1030_m12_940.tar.gz"
> https://bugs.launchpad.net/mg5amcnlo/+bug/1759375/+attachment/5093479/+files/cmssm_m0_1030_m12_940.tar.gz
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1759375
>
> Title:
> unweighting issue for large production
>
> Status in MadGraph5_aMC@NLO:
> New
>
> Bug description:
> Dear all,
>
> MG_2_6_1 crashes with the following error message while combining
> events.
>
> Command "generate_events run_01" interrupted with error:
> UnboundLocalError : local variable 'max_wgt' referenced before assignment
> Please report this bug on https://bugs.launchpad.net/mg5amcnlo
> More information is found in '/disk1/millet/CMSSW/CMSSW_7_1_21/src/mc-production/hadronization/output/test/MG5_aMC_v2_6_1/output_cmssm/run_01_cmssm_debug.log'.
> Please attach this file to your report.
>
> Please find the MG output, the logfile mentioned in the error message (also contains param/run card) below. Does anybody know why this crash happens? The same model works fine for different param cards / a subset of the processes.
> If you need more information (or e.g. the UFO model I am using) please let me know.
>
> Any help would be greatly appreciated,
> Philipp
>
> files:
> https://cernbox.cern.ch/index.php/s/1PPK37fIXOoBY1P
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mg5amcnlo/+bug/1759375/+subscriptions

Hi Philipp,

I have run this on a cluster so far and the error seems to be related to some issue in the model:
on the cluster, the cross-section evaluated is: nan +- nan

I have run the debugger and the problem seems to be for (some) of the channel of 
d3d3bar_c2bare2

So you need to be smarter in the way you generate that sample.
(i.e. you have to simplify your model)
1) you have kept the mass of the light quark to be different of zero.
This is highly inefficient for LHC run. 
2) your model seems to have extremely small coupling.

so what I suggest is that you take your param_card and 
1) set all the mass of the light quark to zero  (including the b mass since it is in the initital state)
2) move that card to your UFO model directory under the name restrict_mybenchmark.dat
3) then change your way to import your model
import model UFO_MSSMTriRpV-mybenchmark

By doing this,
1)  the survey runs in 2 second instead of 20 min!
2)  you avoid such numerical issue.

Cheers,

Olivier

> On 28 Mar 2018, at 13:32, Philipp Millet <millet@physik.rwth-aachen.de> wrote:
> 
> Dear Olivier,
> 
> attached to this message you will find the run/param/proc card and the
> UFO model to reproduce this bug. I am using MG 2_6_1 and LHAPDF 6.1.5.
> 
> Thanks a lot,
> Philipp
> 
> ** Attachment added: "cmssm_m0_1030_m12_940.tar.gz"
>   https://bugs.launchpad.net/mg5amcnlo/+bug/1759375/+attachment/5093479/+files/cmssm_m0_1030_m12_940.tar.gz
> 
> -- 
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1759375
> 
> Title:
>  unweighting issue for large production
> 
> Status in MadGraph5_aMC@NLO:
>  New
> 
> Bug description:
>  Dear all,
> 
>  MG_2_6_1 crashes with the following error message while combining
>  events.
> 
>  Command "generate_events run_01" interrupted with error:
>  UnboundLocalError : local variable 'max_wgt' referenced before assignment
>  Please report this bug on https://bugs.launchpad.net/mg5amcnlo
>  More information is found in '/disk1/millet/CMSSW/CMSSW_7_1_21/src/mc-production/hadronization/output/test/MG5_aMC_v2_6_1/output_cmssm/run_01_cmssm_debug.log'.
>  Please attach this file to your report.
> 
>  Please find the MG output, the logfile mentioned in the error message (also contains param/run card) below. Does anybody know why this crash happens? The same model works fine for different param cards / a subset of the processes.
>  If you need more information (or e.g. the UFO model I am using) please let me know.
> 
>  Any help would be greatly appreciated,
>  Philipp
> 
>  files:
>  https://cernbox.cern.ch/index.php/s/1PPK37fIXOoBY1P
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mg5amcnlo/+bug/1759375/+subscriptions

Olivier Mattelaer (olivier-mattelaer) on 2018-04-11

Changed in mg5amcnlo:
status:	New → Won't Fix

Revision history for this message

Philipp Millet (millet-u) wrote on 2018-04-12:

Dear Olivier,

sorry for the late answer and thank you for your quick answer and cross-checks. Yes the coupling for this channel is very small. It was part of a larger set of processes, which actually have some meaningful contribution but due to the one given in this example the whole job crashed. If I understand you correctly this is something which either has to be fixed in this specific UFO model or bypassed by being smarter in how the sample is generated?

Thanks
Philipp