MG5 crashes for many processes, but not for a few

Bug #1882254 reported by Marija Glisic
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MadGraph5_aMC@NLO
In Progress
Undecided
Unassigned

Bug Description

I'm trying to generate a sample with 12 generation processes in it (C1N2N1 production which covers C1N1, C1C1, N2N1, and C1N2) which crashes with a very vague error. Running this however with only three processes (C1N1 production, which has p p-> c1 n1, p p-> c1 n1 j, and p p-> c1 n1 j j) runs successfully. I've included the run_01_tag_1_debug.log excerpt below

-------------

generate_events run_01
Traceback (most recent call last):
  File "/cvmfs/atlas.cern.ch/repo/sw/software/21.6/sw/lcg/releases/LCG_88/MCGenerators/madgraph5amc/2.6.7p3.atlas5/x86_64-slc6-gcc62-opt/madgraph/interface/extended_cmd.py", line 1514, in onecmd
    return self.onecmd_orig(line, **opt)
  File "/cvmfs/atlas.cern.ch/repo/sw/software/21.6/sw/lcg/releases/LCG_88/MCGenerators/madgraph5amc/2.6.7p3.atlas5/x86_64-slc6-gcc62-opt/madgraph/interface/extended_cmd.py", line 1463, in onecmd_orig
    return func(arg, **opt)
  File "/afs/cern.ch/user/m/mglisic/test/PROC_MSSM_SLHA2_0/bin/internal/madevent_interface.py", line 2469, in do_generate_events
    self.run_generate_events(switch_mode, args)
  File "/cvmfs/atlas.cern.ch/repo/sw/software/21.6/sw/lcg/releases/LCG_88/MCGenerators/madgraph5amc/2.6.7p3.atlas5/x86_64-slc6-gcc62-opt/madgraph/interface/common_run_interface.py", line 6961, in new_fct
    original_fct(obj, *args, **opts)
  File "/afs/cern.ch/user/m/mglisic/test/PROC_MSSM_SLHA2_0/bin/internal/madevent_interface.py", line 2531, in run_generate_events
    self.exec_cmd('combine_events', postcmd=False,printcmd=False)
  File "/cvmfs/atlas.cern.ch/repo/sw/software/21.6/sw/lcg/releases/LCG_88/MCGenerators/madgraph5amc/2.6.7p3.atlas5/x86_64-slc6-gcc62-opt/madgraph/interface/extended_cmd.py", line 1543, in exec_cmd
    stop = Cmd.onecmd_orig(current_interface, line, **opt)
  File "/cvmfs/atlas.cern.ch/repo/sw/software/21.6/sw/lcg/releases/LCG_88/MCGenerators/madgraph5amc/2.6.7p3.atlas5/x86_64-slc6-gcc62-opt/madgraph/interface/extended_cmd.py", line 1463, in onecmd_orig
    return func(arg, **opt)
  File "/afs/cern.ch/user/m/mglisic/test/PROC_MSSM_SLHA2_0/bin/internal/madevent_interface.py", line 3610, in do_combine_events
    get_wgt, log_level=5, trunc_error=1e-2, event_target=self.run_card['nevents'])
  File "/cvmfs/atlas.cern.ch/repo/sw/software/21.6/sw/lcg/releases/LCG_88/MCGenerators/madgraph5amc/2.6.7p3.atlas5/x86_64-slc6-gcc62-opt/madgraph/various/lhe_parser.py", line 1143, in unweight
    return super(MultiEventFile, self).unweight(outputpath, get_wgt_multi, **opts)
  File "/cvmfs/atlas.cern.ch/repo/sw/software/21.6/sw/lcg/releases/LCG_88/MCGenerators/madgraph5amc/2.6.7p3.atlas5/x86_64-slc6-gcc62-opt/madgraph/various/lhe_parser.py", line 435, in unweight
    banner.modify_init_cross(cross)
  File "/cvmfs/atlas.cern.ch/repo/sw/software/21.6/sw/lcg/releases/LCG_88/MCGenerators/madgraph5amc/2.6.7p3.atlas5/x86_64-slc6-gcc62-opt/madgraph/various/banner.py", line 359, in modify_init_cross
    raise Exception
Exception

Revision history for this message
Marija Glisic (mglisic) wrote :
Revision history for this message
Marija Glisic (mglisic) wrote :
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) wrote :

Does it works better if you force the numbering of each processes?

generate p p > n1 c1 $ susystrong @1
add process p p > n1 c1 j $ susystrong @2
add process p p > n1 c1 j j $ susystrong @3
add process p p > n1 n2 $ susystrong @4
add process p p > n1 n2 j $ susystrong @5
add process p p > n1 n2 j j $ susystrong @6
add process p p > c1 n2 $ susystrong @7
add process p p > c1 n2 j $ susystrong @8
add process p p > c1 n2 j j $ susystrong @9
add process p p > c1 c1 $ susystrong @10
add process p p > c1 c1 j $ susystrong @11
add process p p > c1 c1 j j $ susystrong @12

Also what is the logic to generate all of them in one go?
I thought that for CKKW-L with Py8 (and simply for the matching scale) it would be better to have 4 separated computation.

Cheers,

Olivier

Revision history for this message
Marija Glisic (mglisic) wrote :

Hi Olivier,

Thanks Foy your response, I have tried with the added @1, @2... but got the same result. Do you know of anything else I could try?
We're trying to keep our number of requested samples down, instead of having four different samples for the same mass and lifetime, and thus quadrupling our request.

Thank you,
Marija

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) wrote :

Hi,

I fail to reproduce such error (on simpler process, I do not have the ability to compute such process on my laptop). One suggestion would be to do
define X = n1 c1
define Y = n2 c1
generate p p > X Y $ susystrong @1
add process p p > X Y j $ susystrong @2
add process p p > X Y j j $ susystrong @3

Cheers,

Olivier

Changed in mg5amcnlo:
status: New → Incomplete
Revision history for this message
Mike Hance (mhance) wrote :

Hi Olivier,

I'm having the same problem in 2.7.2, though only with relatively small runs. Asking MG to generate 20k events with a proc card like Marija's threw the error, but increasing to 200k events worked fine. I'm not sure if this helps you to debug the issue at all.

Cheers,

-Mike

Revision history for this message
Mike Hance (mhance) wrote :

Hmm, maybe I spoke too soon -- I still see ~10% of my jobs failing, even when running with more events. The only difference between successful and failing jobs is the random seed. So something must be rather unstable.

-Mike

Revision history for this message
Mike Hance (mhance) wrote :
Download full text (4.8 KiB)

Hi again,

I'm still playing around with this, and maybe this helps more:

From the log file, it looks like we're crashing with this exception in banner.py:

    def modify_init_cross(self, cross):
        """modify the init information with the associate cross-section"""
        assert isinstance(cross, dict)
# assert "all" in cross
        assert "init" in self

        cross = dict(cross)
        for key in cross.keys():
            if isinstance(key, str) and key.isdigit() and int(key) not in cross:
                cross[int(key)] = cross[key]

        all_lines = self["init"].split('\n')
        new_data = []
        new_data.append(all_lines[0])
        for i in range(1, len(all_lines)):
            line = all_lines[i]
            split = line.split()
            if len(split) == 4:
                xsec, xerr, xmax, pid = split
            else:
                new_data += all_lines[i:]
                break
            if int(pid) not in cross:
                raise Exception

I think this is complaining that we can't find a process ID in the init block from the LHE files that are being merged together. I see two partial LHE files in Events/run_01/:

atlas02 | test_guess_008_26 [51]: ls PROC_MSSM_SLHA2_0/Events/run_01/
total 4.1M
-rw-r--r--. 1 nobody users 44K Aug 5 21:51 run_01_tag_1_banner.txt
-rw-r--r--. 1 nobody users 1.1M Aug 5 21:52 partials0.lhe.gz
-rw-r--r--. 1 nobody users 3.0M Aug 5 21:52 partials1.lhe.gz

Looking at the init blocks for those two files:

<init>
2212 2212 6.500000e+03 6.500000e+03 0 0 260000 260000 -4 12
   +7.3565871e-03 +6.7896839e-05 +1.1766687e+01 11
   +4.7601446e-03 +8.1563638e-05 +6.9288533e+00 10
   +1.8776607e+00 +4.9773670e-03 +7.8777596e+00 13
   +2.1637021e-03 +1.3261012e-05 +6.0894384e+00 12
   +5.8831157e-01 +3.3798544e-03 +7.8188791e+00 15
   +1.1177685e+00 +5.1275003e-03 +7.6873225e+00 14
   +1.2539300e-04 +1.6456028e-06 +7.8213219e+00 17
   +1.9666514e-04 +3.5053501e-06 +7.7764568e+00 16
   +6.5569949e-05 +8.3775434e-07 +7.8094409e+00 18
   +2.3384995e+00 +6.0486786e-03 +7.8168534e+00 1
   +4.3344898e-01 +4.0951418e-03 +7.8023543e+00 3
   +1.4318232e+00 +8.0008571e-03 +7.8106911e+00 2
<generator name='MadGraph5_aMC@NLO' version='2.6.7'>please cite 1405.0301 </generator>
</init>

<init>
2212 2212 6.500000e+03 6.500000e+03 0 0 260000 260000 -4 6
   +4.8839995e-03 +4.5076354e-05 +3.0997368e+00 11
   +5.3667600e-03 +9.1957815e-05 +3.1045436e+00 10
   +1.8619488e+00 +4.9357178e-03 +3.1033280e+00 13
   +2.7757069e-03 +1.7011904e-05 +3.1044497e+00 12
   +9.1119429e-02 +8.2545211e-04 +3.0978857e+00 15
   +1.1358744e+00 +5.2105551e-03 +3.1064749e+00 14
<generator name='MadGraph5_aMC@NLO' version='2.6.7'>please cite 1405.0301 </generator>
</init>

so it looks like the process ID's are not unique, and not all process ID's are listed. (e.g. 10 is repeated, and 4 isn't listed anywhere.) I guess the exception arises when the unweighting routine is trying to find a line for a process ID that isn't in the init block.

In my case, I'm trying to run with:

generate p p > n2 x1+ / susystrong @1
add process p p > n2 x1+ j / susystrong @2
add proces...

Read more...

Changed in mg5amcnlo:
status: Incomplete → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.