pythia8: generate events interrupted with error

Bug #1803202 reported by shenty1991 on 2018-11-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MadGraph5_aMC@NLO
Undecided
Valentin Hirschi

Bug Description

Hello,

As I am trying to run through pythia8, the terminal terminates with the following error. I tried 100 number of events small sample, it would work. I tried 50k, 2k, and they all fail.

INFO: Running Pythia8 [arXiv:1410.3012]
No user-defined value for Pythia8 parameter 'JetMatching:nJetMax'. Setting it automatically to 1.
Splitting .lhe event file for PY8 parallelization...
Submitting Pythia8 jobs...
Pythia8 shower jobs: 0 Idle, 4 Running, 0 Done [1 seconds]
/Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/Events/100TeV_2kEvents/PY8_parallelization/run_PY8.sh: line 2: 30834 Segmentation fault: 11 ./MG5aMC_PY8_interface PY8Card.dat >&PY8_log.txt
WARNING: program /Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/Events/100TeV_2kEvents/PY8_parallelization/run_PY8.sh 1 launch ends with non zero status: 139. Stop all computation
Pythia8 shower jobs: 0 Idle, 3 Running, 1 Done [4m52s]
Pythia8 shower jobs: 0 Idle, 2 Running, 2 Done [4m52s]
Pythia8 shower jobs: 0 Idle, 1 Running, 3 Done [4m52s]
Pythia8 shower jobs: 0 Idle, 0 Running, 4 Done [4m52s]

Command "generate_events 100TeV_2kEvents" interrupted with error:
Exception : program /Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/Events/100TeV_2kEvents/PY8_parallelization/run_PY8.sh 1 launch ends with non zero status: 139. Stop all computation
Please report this bug on https://bugs.launchpad.net/mg5amcnlo
More information is found in '/Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/100TeV_2kEvents_tag_1_debug.log'.
Please attach this file to your report.

quit

shenty1991 (shenty1991) wrote :
Changed in mg5amcnlo:
assignee: nobody → Valentin Hirschi (valentin-hirschi)
shenty1991 (shenty1991) wrote :

Hello,

I am wondering if this question is reviewed? I still cannot find solution to this bug.

Thank you!

shenty1991 (shenty1991) wrote :

Hello,

This bug is still there, anyone can help me with this?

I would advise to contact directly Stephan Prestel and Valentin Hirschi about this.
I can not do more than that.

Olivier

> On 4 Dec 2018, at 03:25, shenty1991 <email address hidden> wrote:
>
> Hello,
>
> This bug is still there, anyone can help me with this?
>
> --
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1803202
>
> Title:
> pythia8: generate events interrupted with error
>
> Status in MadGraph5_aMC@NLO:
> New
>
> Bug description:
> Hello,
>
> As I am trying to run through pythia8, the terminal terminates with
> the following error. I tried 100 number of events small sample, it
> would work. I tried 50k, 2k, and they all fail.
>
>
>
> INFO: Running Pythia8 [arXiv:1410.3012]
> No user-defined value for Pythia8 parameter 'JetMatching:nJetMax'. Setting it automatically to 1.
> Splitting .lhe event file for PY8 parallelization...
> Submitting Pythia8 jobs...
> Pythia8 shower jobs: 0 Idle, 4 Running, 0 Done [1 seconds]
> /Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/Events/100TeV_2kEvents/PY8_parallelization/run_PY8.sh: line 2: 30834 Segmentation fault: 11 ./MG5aMC_PY8_interface PY8Card.dat >&PY8_log.txt
> WARNING: program /Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/Events/100TeV_2kEvents/PY8_parallelization/run_PY8.sh 1 launch ends with non zero status: 139. Stop all computation
> Pythia8 shower jobs: 0 Idle, 3 Running, 1 Done [4m52s]
> Pythia8 shower jobs: 0 Idle, 2 Running, 2 Done [4m52s]
> Pythia8 shower jobs: 0 Idle, 1 Running, 3 Done [4m52s]
> Pythia8 shower jobs: 0 Idle, 0 Running, 4 Done [4m52s]
>
>
> Command "generate_events 100TeV_2kEvents" interrupted with error:
> Exception : program /Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/Events/100TeV_2kEvents/PY8_parallelization/run_PY8.sh 1 launch ends with non zero status: 139. Stop all computation
> Please report this bug on https://bugs.launchpad.net/mg5amcnlo
> More information is found in '/Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/100TeV_2kEvents_tag_1_debug.log'.
> Please attach this file to your report.
>
>
> quit
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mg5amcnlo/+bug/1803202/+subscriptions

The issue comes from the parallelisation of the Pythia8 shower, which unfortunately has to be managed by MadGraph since Pythia8 doesn't offer this functionality.

Notice that I turn on the multicore parallel Pythia8 shower only when a sufficiently large number of events needs to be showered, and this is probably why you don't find any problem with 100 events. You can test this by running the `shower pythia8 run_XX` command right *after* having specified `set nb_core 1` which disables the PY8 parallelisation and in effect should "solve" your issue.

The way I implemented this parallelisation is by splitting the original .lhe file into several smaller ones, shower them on separate cores, and the recombine the HEPMC files generated (which is a delicate task because these files are large).

Therefore, to help us diagnose what the problem is it would be helpful if you could investigate why the Pythia8 shower on these individual smaller LHE files crashed whereas it didn't on the larger file.
You can do so by monitoring more closely these run in the 'Events/run_XX' directory, where you can find the './run_shower.sh' script that would allow you to launch PY8 shower by hand, exactly like MadGraph does and also, when the parallelisation is active, you will find multiple folders named 'PY8_parallelisation' in that 'Events/run_xx' directory, in which you will find again a script for running the shower but also *PY8 log files* of theses runs on the split smaller LHE file which failed.
Looking into these "split logs" to understand why the PY8 shower failed in this "parallelised case", and possibly sharing here the interesting part of these logs, could help us fix this issue of PY8 parallelisation.

Thank you for your help with this and sorry for the late reply.

Just for information, I do not reproduce this behavior on Ubuntu 16.04.5 LTS (run on azure)
and on debian (run via singularity: https://www.singularity-hub.org/containers/5631)
and obviously on my mac.

Cheers,

Olivier

shenty1991 (shenty1991) wrote :

Hello Valentin,

Is this the log file that you want to investigate?

Thank you very much!

Yes, this would be the log file, however it does not seem to indicate any
error.
Could you try to run it manually using script inside that "split_0"
directory, with the command:

"./run_shower.sh"

And report here if this ran to completion?

On Tue, Dec 4, 2018 at 11:26 PM shenty1991 <email address hidden> wrote:

> Hello Valentin,
>
> Is this the log file that you want to investigate?
>
> Thank you very much!
>
> ** Attachment added: "log file in PY8_parallelization/split_0"
>
> https://bugs.launchpad.net/mg5amcnlo/+bug/1803202/+attachment/5219140/+files/PY8_log.txt
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1803202
>
> Title:
> pythia8: generate events interrupted with error
>
> Status in MadGraph5_aMC@NLO:
> New
>
> Bug description:
> Hello,
>
> As I am trying to run through pythia8, the terminal terminates with
> the following error. I tried 100 number of events small sample, it
> would work. I tried 50k, 2k, and they all fail.
>
>
>
> INFO: Running Pythia8 [arXiv:1410.3012]
> No user-defined value for Pythia8 parameter 'JetMatching:nJetMax'.
> Setting it automatically to 1.
> Splitting .lhe event file for PY8 parallelization...
> Submitting Pythia8 jobs...
> Pythia8 shower jobs: 0 Idle, 4 Running, 0 Done [1 seconds]
>
> /Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/Events/100TeV_2kEvents/PY8_parallelization/run_PY8.sh:
> line 2: 30834 Segmentation fault: 11 ./MG5aMC_PY8_interface PY8Card.dat
> >&PY8_log.txt
> WARNING: program
> /Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/Events/100TeV_2kEvents/PY8_parallelization/run_PY8.sh
> 1 launch ends with non zero status: 139. Stop all computation
> Pythia8 shower jobs: 0 Idle, 3 Running, 1 Done [4m52s]
> Pythia8 shower jobs: 0 Idle, 2 Running, 2 Done [4m52s]
> Pythia8 shower jobs: 0 Idle, 1 Running, 3 Done [4m52s]
> Pythia8 shower jobs: 0 Idle, 0 Running, 4 Done [4m52s]
>
>
> Command "generate_events 100TeV_2kEvents" interrupted with error:
> Exception : program
> /Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/Events/100TeV_2kEvents/PY8_parallelization/run_PY8.sh
> 1 launch ends with non zero status: 139. Stop all computation
> Please report this bug on https://bugs.launchpad.net/mg5amcnlo
> More information is found in
> '/Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/100TeV_2kEvents_tag_1_debug.log'.
> Please attach this file to your report.
>
>
> quit
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mg5amcnlo/+bug/1803202/+subscriptions
>

--
Valentin

shenty1991 (shenty1991) wrote :

Did you mean "./run_PY8.sh" ?

If yes, I tried this command under "split_0". My terminal shows nothing when the script was running. And attached is the new log file

Download full text (3.5 KiB)

Looking back into your original report I see that the following triggered a
crash:

/Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/Events/100TeV_2kEvents/PY8_parallelization/run_PY8.sh:
line 2: 30834 Segmentation fault: 11

When the parallelisation is active, only the `./run_PY8.sh` from within the
`split_XX` directories should be executed and not the main shower script
located at:

/Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/Events/100TeV_2kEvents/PY8_parallelization/run_PY8.sh

However, calling the above should normally be harmless, but it may be that
it is not harmless in your setup.
So could you do the following:

a) After having reproduced your crash, run the following main script
manually:

/Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/Events/100TeV_2kEvents/PY8_parallelization/run_PY8.sh

b) Then, within *each* of the split_XX directory, run the `./run_PY8.sh`
script.

Then let me know if any of the "manual" run above ended up in a
Segmentation fault, similar to the one which occurred when the parallelised
showering was steered by MadGraph.

Thank you for your help in investigating this matter.

On Wed, Dec 5, 2018 at 3:20 PM shenty1991 <email address hidden> wrote:

> Did you mean "./run_PY8.sh" ?
>
> If yes, I tried this command under "split_0". My terminal shows nothing
> when the script was running. And attached is the new log file
>
> ** Attachment added: "PY8_log.txt"
>
> https://bugs.launchpad.net/mg5amcnlo/+bug/1803202/+attachment/5219451/+files/PY8_log.txt
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1803202
>
> Title:
> pythia8: generate events interrupted with error
>
> Status in MadGraph5_aMC@NLO:
> New
>
> Bug description:
> Hello,
>
> As I am trying to run through pythia8, the terminal terminates with
> the following error. I tried 100 number of events small sample, it
> would work. I tried 50k, 2k, and they all fail.
>
>
>
> INFO: Running Pythia8 [arXiv:1410.3012]
> No user-defined value for Pythia8 parameter 'JetMatching:nJetMax'.
> Setting it automatically to 1.
> Splitting .lhe event file for PY8 parallelization...
> Submitting Pythia8 jobs...
> Pythia8 shower jobs: 0 Idle, 4 Running, 0 Done [1 seconds]
>
> /Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/Events/100TeV_2kEvents/PY8_parallelization/run_PY8.sh:
> line 2: 30834 Segmentation fault: 11 ./MG5aMC_PY8_interface PY8Card.dat
> >&PY8_log.txt
> WARNING: program
> /Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/Events/100TeV_2kEvents/PY8_parallelization/run_PY8.sh
> 1 launch ends with non zero status: 139. Stop all computation
> Pythia8 shower jobs: 0 Idle, 3 Running, 1 Done [4m52s]
> Pythia8 shower jobs: 0 Idle, 2 Running, 2 Done [4m52s]
> Pythia8 shower jobs: 0 Idle, 1 Running, 3 Done [4m52s]
> Pythia8 shower jobs: 0 Idle, 0 Running, 4 Done [4m52s]
>
>
> Command "generate_events 100TeV_2kEvents" interrupted with error:
> Exception : program
> /Users/shenty1991/Dropbox/MG5_aMC_v2_6_4/signal_ee2j_O2/Events/100TeV_2kEvents/PY8_parallelization/run_PY8.sh
> 1 launch ends with non zero status: 139. Stop all computation
...

Read more...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers