MadGraph5_aMC@NLO

Bug #1656728
Comment #2

Comment 2 for bug 1656728

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) wrote on 2017-01-16: Re: [Bug 1656728] [NEW] "cannot concatenate" error at end of run

Hi,

If you want to implement your own cluster type, the correct method is to use the plugin method:
https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/Plugin

Otherwise, the debug file is not very precise for such function.
You should have a more clear error if you comment the
@multiple_try()
line (just above the definition of the function)

My guess that the problem reported is actually related to those lines
       if not id.isdigit():
           raise ClusterManagmentError, 'fail to submit to the cluster: \n%s' \
                   % (output[0] + ‘\n' + output[1])

Meaning that the code fails to identify the id used for the run (for whatever reason)
and then it crash due to some ill formatted output format leading to such error.

> There are events in Events/run_01/unweighted_events.lhe.gz. Should I be
> able to use them without problems, since the crash occurred after the
> Results Summary appeared?

The crash occurs when the code tries to add systematics uncertainty information inside the events.
So you will miss that information. But for the rest this sample is fine.

> Was the error message caused by the change I
> made to cluster.py, which I thought was innocuous (and which was
> absolutely necessary for MadGraph to even get through the refine stage
> at all on our cluster)?

That I do not know.

Cheers,

Olivier

> On Jan 16, 2017, at 03:39, Matthew Reece <email address hidden> wrote:
>
> Public bug reported:
>
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
>
> Every attempt failed, probably because Odyssey's default memory and time
> allocations are too low for all of the subprocesses to finish. I can't
> find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
>
> command = ['sbatch', '-o', stdout,
> '-J', me_dir,
> '-e', stderr,
> '--mem=2500',
> '-t', '150', prog] + argument
>
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
>
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I be
> able to use them without problems, since the crash occurred after the
> Results Summary appeared? Was the error message caused by the change I
> made to cluster.py, which I thought was innocuous (and which was
> absolutely necessary for MadGraph to even get through the refine stage
> at all on our cluster)?
>
>
> INFO: Combining Events
> === Results Summary for run: run_01 tag: tag_1 ===
>
> Cross-section : 4.025 +- 0.003022 pb
> Nb of events : 50000
>
> INFO: Running Systematics computation
> INFO: Trying to download NNPDF23_lo_as_0130_qed
> NNPDF23_lo_as_0130_qed.tar.gz: 26.3 MB [100.0%]
> INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> Start waiting for update. (more info in debug mode)
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> Command "generate_events " interrupted with error:
> TypeError : [Fail 5 times]
> cannot concatenate 'str' and 'NoneType' objects
> Please report this bug on https://bugs.launchpad.net/mg5amcnlo
> More information is found in '/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/run_01_tag_1_debug.log'.
> Please attach this file to your report.
> quit
> INFO: storing files of previous run
> INFO: Done
> INFO:
>
> INFO:
>
> ** Affects: mg5amcnlo
> Importance: Undecided
> Status: New
>
> ** Attachment added: "run_01_tag_1_debug.log"
> https://bugs.launchpad.net/bugs/1656728/+attachment/4804868/+files/run_01_tag_1_debug.log
>
> --
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1656728
>
> Title:
> "cannot concatenate" error at end of run
>
> Status in MadGraph5_aMC@NLO:
> New
>
> Bug description:
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
>
> Every attempt failed, probably because Odyssey's default memory and
> time allocations are too low for all of the subprocesses to finish. I
> can't find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
>
> command = ['sbatch', '-o', stdout,
> '-J', me_dir,
> '-e', stderr,
> '--mem=2500',
> '-t', '150', prog] + argument
>
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
>
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I
> be able to use them without problems, since the crash occurred after
> the Results Summary appeared? Was the error message caused by the
> change I made to cluster.py, which I thought was innocuous (and which
> was absolutely necessary for MadGraph to even get through the refine
> stage at all on our cluster)?
>
>
>
> INFO: Combining Events
> === Results Summary for run: run_01 tag: tag_1 ===
>
> Cross-section : 4.025 +- 0.003022 pb
> Nb of events : 50000
>
> INFO: Running Systematics computation
> INFO: Trying to download NNPDF23_lo_as_0130_qed
> NNPDF23_lo_as_0130_qed.tar.gz: 26.3 MB [100.0%]
> INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> Start waiting for update. (more info in debug mode)
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> Command "generate_events " interrupted with error:
> TypeError : [Fail 5 times]
> cannot concatenate 'str' and 'NoneType' objects
> Please report this bug on https://bugs.launchpad.net/mg5amcnlo
> More information is found in '/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/run_01_tag_1_debug.log'.
> Please attach this file to your report.
> quit
> INFO: storing files of previous run
> INFO: Done
> INFO:
>
> INFO:
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mg5amcnlo/+bug/1656728/+subscriptions

Hi,

If you want to implement your own cluster type, the correct method is to use the plugin method:
https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/Plugin

Otherwise, the debug file is not very precise for such function.
You should have a more clear error if you comment the 
   @multiple_try()
line (just above the definition of the function)

Meaning that the code fails to identify the id used for the run (for whatever reason) 
and then it crash due to some ill formatted output format leading to such error.

> There are events in Events/run_01/unweighted_events.lhe.gz. Should I be
> able to use them without problems, since the crash occurred after the
> Results Summary appeared?

The crash occurs when the code tries to add systematics uncertainty information inside the events.
So you will miss that information. But for the rest this sample is fine.

That I do not know.

Cheers,

Olivier

> On Jan 16, 2017, at 03:39, Matthew Reece <1656728@bugs.launchpad.net> wrote:
> 
> Public bug reported:
> 
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
> 
> Every attempt failed, probably because Odyssey's default memory and time
> allocations are too low for all of the subprocesses to finish. I can't
> find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
> 
>       command = ['sbatch', '-o', stdout,
>                  '-J', me_dir, 
>                  '-e', stderr, 
>                  '--mem=2500', 
>                  '-t', '150', prog] + argument
> 
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
> 
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I be
> able to use them without problems, since the crash occurred after the
> Results Summary appeared? Was the error message caused by the change I
> made to cluster.py, which I thought was innocuous (and which was
> absolutely necessary for MadGraph to even get through the refine stage
> at all on our cluster)?
> 
> 
> INFO: Combining Events 
> === Results Summary for run: run_01 tag: tag_1 ===
> 
>    Cross-section :   4.025 +- 0.003022 pb
>    Nb of events :  50000
> 
> INFO: Running Systematics computation 
> INFO: Trying to download NNPDF23_lo_as_0130_qed 
> NNPDF23_lo_as_0130_qed.tar.gz:    26.3 MB [100.0%] 
> INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> Start waiting for update. (more info in debug mode)
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> Command "generate_events " interrupted with error:
> TypeError : [Fail 5 times] 
> 	 cannot concatenate 'str' and 'NoneType' objects 
> Please report this bug on https://bugs.launchpad.net/mg5amcnlo
> More information is found in '/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/run_01_tag_1_debug.log'.
> Please attach this file to your report.
> quit
> INFO: storing files of previous run 
> INFO: Done 
> INFO:  
> 
> INFO:
> 
> ** Affects: mg5amcnlo
>    Importance: Undecided
>        Status: New
> 
> ** Attachment added: "run_01_tag_1_debug.log"
>  https://bugs.launchpad.net/bugs/1656728/+attachment/4804868/+files/run_01_tag_1_debug.log
> 
> -- 
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1656728
> 
> Title:
> "cannot concatenate" error at end of run
> 
> Status in MadGraph5_aMC@NLO:
> New
> 
> Bug description:
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
>  cluster_type = slurm
>  cluster_queue = serial_requeue
>  cluster_size = 50
> 
> Every attempt failed, probably because Odyssey's default memory and
> time allocations are too low for all of the subprocesses to finish. I
> can't find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
> 
>         command = ['sbatch', '-o', stdout,
>                    '-J', me_dir, 
>                    '-e', stderr, 
>                    '--mem=2500', 
>                    '-t', '150', prog] + argument
> 
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
> 
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I
> be able to use them without problems, since the crash occurred after
> the Results Summary appeared? Was the error message caused by the
> change I made to cluster.py, which I thought was innocuous (and which
> was absolutely necessary for MadGraph to even get through the refine
> stage at all on our cluster)?
> 
> 
> 
> INFO: Combining Events 
>   === Results Summary for run: run_01 tag: tag_1 ===
> 
>      Cross-section :   4.025 +- 0.003022 pb
>      Nb of events :  50000
> 
> INFO: Running Systematics computation 
> INFO: Trying to download NNPDF23_lo_as_0130_qed 
> NNPDF23_lo_as_0130_qed.tar.gz:    26.3 MB [100.0%] 
> INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> Start waiting for update. (more info in debug mode)
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> Command "generate_events " interrupted with error:
> TypeError : [Fail 5 times] 
> 	 cannot concatenate 'str' and 'NoneType' objects 
> Please report this bug on https://bugs.launchpad.net/mg5amcnlo
> More information is found in '/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/run_01_tag_1_debug.log'.
> Please attach this file to your report.
> quit
> INFO: storing files of previous run 
> INFO: Done 
> INFO:  
> 
> INFO:
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mg5amcnlo/+bug/1656728/+subscriptions