corrupt(?) log_MINT0.txt

Bug #1456964 reported by Josh Bendavid
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MadGraph5_aMC@NLO
Won't Fix
Low
Rikkert Frederix

Bug Description

When generating an NLO process

(high mass drell yan + yets in this case, cards here
https://github.com/cms-sw/genproductions/tree/master/bin/MadGraph5_aMCatNLO/cards/production/13TeV/dyellell012j_5f_NLO_FXFX_M3000
)

One of the jobs for the first step Setting up grid fails with errors as below.
Looking at the directory for the specific job indicated, I notice something strange, which is that log_MINT0.txt (attached) has a lot of binary junk at the beginning. Not sure if this is related to the failure or not.

CRITICAL: Fail to run correctly job 655476238.
            with option: {'log': None, 'stdout': None, 'argument': ['2', 'F', '0'], 'nb_submit': 5, 'stderr': None, 'prog': 'ajob7', 'output_files': ['GF7'], 'time_check': 1431752459.19803, 'cwd': '/afs/cern.ch/work/k/kplee/private/GridpackProduction/genproductions/bin/MadGraph5_aMCatNLO/dyellell012j_5f_NLO_FXFX_M3000/dyellell012j_5f_NLO_FXFX_M3000_gridpack/work/processtmp/SubProcesses/P2_dxg_epemdxg', 'required_output': ['GF7/log_MINT0.txt', 'GF7/results.dat'], 'input_files': ['/afs/cern.ch/work/k/kplee/private/GridpackProduction/genproductions/bin/MadGraph5_aMCatNLO/dyellell012j_5f_NLO_FXFX_M3000/dyellell012j_5f_NLO_FXFX_M3000_gridpack/work/processtmp/MGMEVersion.txt', '/afs/cern.ch/work/k/kplee/private/GridpackProduction/genproductions/bin/MadGraph5_aMCatNLO/dyellell012j_5f_NLO_FXFX_M3000/dyellell012j_5f_NLO_FXFX_M3000_gridpack/work/processtmp/SubProcesses/randinit', '/afs/cern.ch/work/k/kplee/private/GridpackProduction/genproductions/bin/MadGraph5_aMCatNLO/dyellell012j_5f_NLO_FXFX_M3000/dyellell012j_5f_NLO_FXFX_M3000_gridpack/work/processtmp/SubProcesses/P2_dxg_epemdxg/symfact.dat', '/afs/cern.ch/work/k/kplee/private/GridpackProduction/genproductions/bin/MadGraph5_aMCatNLO/dyellell012j_5f_NLO_FXFX_M3000/dyellell012j_5f_NLO_FXFX_M3000_gridpack/work/processtmp/SubProcesses/P2_dxg_epemdxg/iproc.dat', '/afs/cern.ch/work/k/kplee/private/GridpackProduction/genproductions/bin/MadGraph5_aMCatNLO/dyellell012j_5f_NLO_FXFX_M3000/dyellell012j_5f_NLO_FXFX_M3000_gridpack/work/processtmp/SubProcesses/P2_dxg_epemdxg/initial_states_map.dat', '/afs/cern.ch/work/k/kplee/private/GridpackProduction/genproductions/bin/MadGraph5_aMCatNLO/dyellell012j_5f_NLO_FXFX_M3000/dyellell012j_5f_NLO_FXFX_M3000_gridpack/work/processtmp/SubProcesses/P2_dxg_epemdxg/configs_and_props_info.dat', '/afs/cern.ch/work/k/kplee/private/GridpackProduction/genproductions/bin/MadGraph5_aMCatNLO/dyellell012j_5f_NLO_FXFX_M3000/dyellell012j_5f_NLO_FXFX_M3000_gridpack/work/processtmp/SubProcesses/P2_dxg_epemdxg/leshouche_info.dat', '/afs/cern.ch/work/k/kplee/private/GridpackProduction/genproductions/bin/MadGraph5_aMCatNLO/dyellell012j_5f_NLO_FXFX_M3000/dyellell012j_5f_NLO_FXFX_M3000_gridpack/work/processtmp/SubProcesses/P2_dxg_epemdxg/param_card.dat', '/afs/cern.ch/work/k/kplee/private/GridpackProduction/genproductions/bin/MadGraph5_aMCatNLO/dyellell012j_5f_NLO_FXFX_M3000/dyellell012j_5f_NLO_FXFX_M3000_gridpack/work/processtmp/SubProcesses/P2_dxg_epemdxg/FKS_params.dat', '/afs/cern.ch/work/k/kplee/private/GridpackProduction/genproductions/bin/MadGraph5_aMCatNLO/dyellell012j_5f_NLO_FXFX_M3000/dyellell012j_5f_NLO_FXFX_M3000_gridpack/work/processtmp/SubProcesses/P2_dxg_epemdxg/MadLoop5_resources.tar.gz', '/afs/cern.ch/work/k/kplee/private/GridpackProduction/genproductions/bin/MadGraph5_aMCatNLO/dyellell012j_5f_NLO_FXFX_M3000/dyellell012j_5f_NLO_FXFX_M3000_gridpack/work/processtmp/SubProcesses/P2_dxg_epemdxg/madevent_mintMC', '/afs/cern.ch/work/k/kplee/private/GridpackProduction/genproductions/bin/MadGraph5_aMCatNLO/dyellell012j_5f_NLO_FXFX_M3000/dyellell012j_5f_NLO_FXFX_M3000_gridpack/work/processtmp/SubProcesses/madinMMC_F.2', '/afs/cern.ch/work/k/kplee/private/GridpackProduction/genproductions/bin/MadGraph5_aMCatNLO/dyellell012j_5f_NLO_FXFX_M3000/dyellell012j_5f_NLO_FXFX_M3000_gridpack/work/processtmp/lib/PDFsets']}
            file missing: /afs/cern.ch/work/k/kplee/private/GridpackProduction/genproductions/bin/MadGraph5_aMCatNLO/dyellell012j_5f_NLO_FXFX_M3000/dyellell012j_5f_NLO_FXFX_M3000_gridpack/work/processtmp/SubProcesses/P2_dxg_epemdxg/GF7/log_MINT0.txt
            Fails 5 times
            No resubmition. 

Revision history for this message
Josh Bendavid (joshbendavid) wrote :
Changed in mg5amcnlo:
assignee: nobody → Rikkert Frederix (frederix)
Revision history for this message
Rikkert Frederix (frederix) wrote :

Hi Josh,

Yes, this is very likely to be the problem. I've seen the writing of the binary crap happening on AFS systems before. However, the code seems to have tried to run this 5 times, and it seems unlikely that it failed 5 times for this channel in the same way.

However, there is something else worrying as well. By looking at the results in the log file:

accumulated results ABS integral = 0.5385E-10 +/- 0.7298E-13 ( 0.136 %)
accumulated results Integral = 0.2442E-11 +/- 0.6376E-13 ( 2.611 %)
accumulated results Virtual = 0.1518E-11 +/- 0.1351E-12 ( 8.901 %)
accumulated results Virtual ratio = -.4504E+02 +/- 0.1745E-01 ( 0.039 %)
accumulated results ABS virtual = 0.1082E-09 +/- 0.1338E-12 ( 0.124 %)
accumulated results Born*ao2pi = 0.5401E-13 +/- 0.1457E-15 ( 0.270 %)

it seems to me that the contribution from the virtual corrections is large and that there are very large cancelations in that virtual. (The integral of the absolute value of the virtual corrections is twice as large as the integral of the absolute value of the total integrand). This really suggest that there is either a serious problem with the renormalisation scale used for this process (could be: it's set automatically in the FxFx scheme, which has not really been tested in such an extreme request on the invariant mass of the leptons), or a real problem with the stability of the virtual corrections. The latter might be a problem, given that you have some phase-space points that even with quadruple precision were still marked as unstable.

I'm not sure which of the above is the problem. But I think that both will go away if you use a much larger merging scale, and therefore you can also use a much harder generation cut on the light jet. Using a large merging scale is not strange: compared to a 3TeV invariant mass cut, a (several) hundred GeV jet can still be considered soft. Unfortunately, this means that you'll have to rely a lot on the shower to describe those soft-ish, which, as you know, lacks possible correlations between jets that are present in the matrix elements.

To really understand this problem would require quite a bit of investigation and might therefore take a couple of months.

best,
Rikkert

Changed in mg5amcnlo:
importance: Undecided → Medium
Revision history for this message
Josh Bendavid (joshbendavid) wrote :

Thanks Rikkert.

Indeed ptj was set to just 10GeV here. The origin of this is that this is part of a series of mll-binned processes being generated to cover the full phase space. Parameters were simply inherited from the inclusive version we had produced with mll>50.

I suppose that the merging scale dependence should be small enough that we should be able to increase ptj and qcut for the higher mass bins without introducing any serious discontinuities when the samples are combined later.

Revision history for this message
Rikkert Frederix (frederix) wrote :

Hi Josh,

I investigated this a bit further and could not find a bug in the code. It really seems that the very small merging scale (or, better, generation cut) is the cause of the problems (together with the AFS file system). Given that such a small merging scale is not really motivated when very large scales are around, I'll put the status of this bug as "won't fix". Maybe in the future, when we'll investigate possible improvements on the setting of the merging scale and the generation cut, this problem will be re-addressed.

best,
Rikkert

Changed in mg5amcnlo:
status: New → Won't Fix
importance: Medium → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.