Otherwise, the debug file is not very precise for such function.
You should have a more clear error if you comment the
@multiple_try()
line (just above the definition of the function)
My guess that the problem reported is actually related to those lines
if not id.isdigit():
raise ClusterManagmentError, 'fail to submit to the cluster: \n%s' \ % (output[0] + ‘\n' + output[1])
Meaning that the code fails to identify the id used for the run (for whatever reason)
and then it crash due to some ill formatted output format leading to such error.
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I be
> able to use them without problems, since the crash occurred after the
> Results Summary appeared?
The crash occurs when the code tries to add systematics uncertainty information inside the events.
So you will miss that information. But for the rest this sample is fine.
> Was the error message caused by the change I
> made to cluster.py, which I thought was innocuous (and which was
> absolutely necessary for MadGraph to even get through the refine stage
> at all on our cluster)?
That I do not know.
Cheers,
Olivier
> On Jan 16, 2017, at 03:39, Matthew Reece <email address hidden> wrote:
>
> Public bug reported:
>
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
>
> Every attempt failed, probably because Odyssey's default memory and time
> allocations are too low for all of the subprocesses to finish. I can't
> find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
>
> command = ['sbatch', '-o', stdout,
> '-J', me_dir,
> '-e', stderr,
> '--mem=2500',
> '-t', '150', prog] + argument
>
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
>
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I be
> able to use them without problems, since the crash occurred after the
> Results Summary appeared? Was the error message caused by the change I
> made to cluster.py, which I thought was innocuous (and which was
> absolutely necessary for MadGraph to even get through the refine stage
> at all on our cluster)?
>
>
> INFO: Combining Events
> === Results Summary for run: run_01 tag: tag_1 ===
>
> Cross-section : 4.025 +- 0.003022 pb
> Nb of events : 50000
>
> INFO: Running Systematics computation
> INFO: Trying to download NNPDF23_lo_as_0130_qed
> NNPDF23_lo_as_0130_qed.tar.gz: 26.3 MB [100.0%]
> INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> Start waiting for update. (more info in debug mode)
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> Command "generate_events " interrupted with error:
> TypeError : [Fail 5 times]
> cannot concatenate 'str' and 'NoneType' objects
> Please report this bug on https://bugs.launchpad.net/mg5amcnlo
> More information is found in '/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/run_01_tag_1_debug.log'.
> Please attach this file to your report.
> quit
> INFO: storing files of previous run
> INFO: Done
> INFO:
>
> INFO:
>
> ** Affects: mg5amcnlo
> Importance: Undecided
> Status: New
>
> ** Attachment added: "run_01_tag_1_debug.log"
> https://bugs.launchpad.net/bugs/1656728/+attachment/4804868/+files/run_01_tag_1_debug.log
>
> --
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1656728
>
> Title:
> "cannot concatenate" error at end of run
>
> Status in MadGraph5_aMC@NLO:
> New
>
> Bug description:
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
>
> Every attempt failed, probably because Odyssey's default memory and
> time allocations are too low for all of the subprocesses to finish. I
> can't find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
>
> command = ['sbatch', '-o', stdout,
> '-J', me_dir,
> '-e', stderr,
> '--mem=2500',
> '-t', '150', prog] + argument
>
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
>
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I
> be able to use them without problems, since the crash occurred after
> the Results Summary appeared? Was the error message caused by the
> change I made to cluster.py, which I thought was innocuous (and which
> was absolutely necessary for MadGraph to even get through the refine
> stage at all on our cluster)?
>
>
>
> INFO: Combining Events
> === Results Summary for run: run_01 tag: tag_1 ===
>
> Cross-section : 4.025 +- 0.003022 pb
> Nb of events : 50000
>
> INFO: Running Systematics computation
> INFO: Trying to download NNPDF23_lo_as_0130_qed
> NNPDF23_lo_as_0130_qed.tar.gz: 26.3 MB [100.0%]
> INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> Start waiting for update. (more info in debug mode)
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> Command "generate_events " interrupted with error:
> TypeError : [Fail 5 times]
> cannot concatenate 'str' and 'NoneType' objects
> Please report this bug on https://bugs.launchpad.net/mg5amcnlo
> More information is found in '/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/run_01_tag_1_debug.log'.
> Please attach this file to your report.
> quit
> INFO: storing files of previous run
> INFO: Done
> INFO:
>
> INFO:
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mg5amcnlo/+bug/1656728/+subscriptions
Hi,
If you want to implement your own cluster type, the correct method is to use the plugin method: /cp3.irmp. ucl.ac. be/projects/ madgraph/ wiki/Plugin
https:/
Otherwise, the debug file is not very precise for such function.
You should have a more clear error if you comment the
@multiple_try()
line (just above the definition of the function)
My guess that the problem reported is actually related to those lines tError, 'fail to submit to the cluster: \n%s' \
% (output[0] + ‘\n' + output[1])
if not id.isdigit():
raise ClusterManagmen
Meaning that the code fails to identify the id used for the run (for whatever reason)
and then it crash due to some ill formatted output format leading to such error.
> There are events in Events/ run_01/ unweighted_ events. lhe.gz. Should I be
> able to use them without problems, since the crash occurred after the
> Results Summary appeared?
The crash occurs when the code tries to add systematics uncertainty information inside the events.
So you will miss that information. But for the rest this sample is fine.
> Was the error message caused by the change I
> made to cluster.py, which I thought was innocuous (and which was
> absolutely necessary for MadGraph to even get through the refine stage
> at all on our cluster)?
That I do not know.
Cheers,
Olivier
> On Jan 16, 2017, at 03:39, Matthew Reece <email address hidden> wrote: on.txt file: tag_1_debug. log run_01/ unweighted_ events. lhe.gz. Should I be lo_as_0130_ qed lo_as_0130_ qed.tar. gz: 26.3 MB [100.0%] lo_as_0130_ qed successfully downloaded and stored in /n/hetgfs1/ mreece/ share/LHAPDF get_job_ identifier runs unexpectedly. This should be fine but report this message if you have problem. get_job_ identifier runs unexpectedly. This should be fine but report this message if you have problem. get_job_ identifier runs unexpectedly. This should be fine but report this message if you have problem. get_job_ identifier runs unexpectedly. This should be fine but report this message if you have problem. get_job_ identifier runs unexpectedly. This should be fine but report this message if you have problem. /bugs.launchpad .net/mg5amcnlo nfs_hetgfs1/ mreece/ MG5_aMC_ v2_5_2/ zzj12_xqcut150h t500_clusterA1/ run_01_ tag_1_debug. log'. tag_1_debug. log" /bugs.launchpad .net/bugs/ 1656728/ +attachment/ 4804868/ +files/ run_01_ tag_1_debug. log /bugs.launchpad .net/bugs/ 1656728 on.txt file: tag_1_debug. log run_01/ unweighted_ events. lhe.gz. Should I lo_as_0130_ qed lo_as_0130_ qed.tar. gz: 26.3 MB [100.0%] lo_as_0130_ qed successfully downloaded and stored in /n/hetgfs1/ mreece/ share/LHAPDF get_job_ identifier runs unexpectedly. This should be fine but report this message if you have problem. get_job_ identifier runs unexpectedly. This should be fine but report this message if you have problem. get_job_ identifier runs unexpectedly. This should be fine but report this message if you have problem. get_job_ identifier runs unexpectedly. This should be fine but report this message if you have problem. get_job_ identifier runs unexpectedly. This should be fine but report this message if you have problem. /bugs.launchpad .net/mg5amcnlo nfs_hetgfs1/ mreece/ MG5_aMC_ v2_5_2/ zzj12_xqcut150h t500_clusterA1/ run_01_ tag_1_debug. log'. /bugs.launchpad .net/mg5amcnlo/ +bug/1656728/ +subscriptions
>
> Public bug reported:
>
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configurati
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
>
> Every attempt failed, probably because Odyssey's default memory and time
> allocations are too low for all of the subprocesses to finish. I can't
> find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
>
> command = ['sbatch', '-o', stdout,
> '-J', me_dir,
> '-e', stderr,
> '--mem=2500',
> '-t', '150', prog] + argument
>
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_
> is attached.)
>
> There are events in Events/
> able to use them without problems, since the crash occurred after the
> Results Summary appeared? Was the error message caused by the change I
> made to cluster.py, which I thought was innocuous (and which was
> absolutely necessary for MadGraph to even get through the refine stage
> at all on our cluster)?
>
>
> INFO: Combining Events
> === Results Summary for run: run_01 tag: tag_1 ===
>
> Cross-section : 4.025 +- 0.003022 pb
> Nb of events : 50000
>
> INFO: Running Systematics computation
> INFO: Trying to download NNPDF23_
> NNPDF23_
> INFO: NNPDF23_
> WARNING: cluster.
> Start waiting for update. (more info in debug mode)
> WARNING: cluster.
> WARNING: cluster.
> WARNING: cluster.
> WARNING: cluster.
> Command "generate_events " interrupted with error:
> TypeError : [Fail 5 times]
> cannot concatenate 'str' and 'NoneType' objects
> Please report this bug on https:/
> More information is found in '/net/hetgfs1/
> Please attach this file to your report.
> quit
> INFO: storing files of previous run
> INFO: Done
> INFO:
>
> INFO:
>
> ** Affects: mg5amcnlo
> Importance: Undecided
> Status: New
>
> ** Attachment added: "run_01_
> https:/
>
> --
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https:/
>
> Title:
> "cannot concatenate" error at end of run
>
> Status in MadGraph5_aMC@NLO:
> New
>
> Bug description:
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configurati
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
>
> Every attempt failed, probably because Odyssey's default memory and
> time allocations are too low for all of the subprocesses to finish. I
> can't find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
>
> command = ['sbatch', '-o', stdout,
> '-J', me_dir,
> '-e', stderr,
> '--mem=2500',
> '-t', '150', prog] + argument
>
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_
> is attached.)
>
> There are events in Events/
> be able to use them without problems, since the crash occurred after
> the Results Summary appeared? Was the error message caused by the
> change I made to cluster.py, which I thought was innocuous (and which
> was absolutely necessary for MadGraph to even get through the refine
> stage at all on our cluster)?
>
>
>
> INFO: Combining Events
> === Results Summary for run: run_01 tag: tag_1 ===
>
> Cross-section : 4.025 +- 0.003022 pb
> Nb of events : 50000
>
> INFO: Running Systematics computation
> INFO: Trying to download NNPDF23_
> NNPDF23_
> INFO: NNPDF23_
> WARNING: cluster.
> Start waiting for update. (more info in debug mode)
> WARNING: cluster.
> WARNING: cluster.
> WARNING: cluster.
> WARNING: cluster.
> Command "generate_events " interrupted with error:
> TypeError : [Fail 5 times]
> cannot concatenate 'str' and 'NoneType' objects
> Please report this bug on https:/
> More information is found in '/net/hetgfs1/
> Please attach this file to your report.
> quit
> INFO: storing files of previous run
> INFO: Done
> INFO:
>
> INFO:
>
> To manage notifications about this bug go to:
> https:/