Bug #1656728 ""cannot concatenate” error at end of run” : Bugs : MadGraph5_aMC@NLO

Revision history for this message

Matthew Reece (mreece82) wrote on 2017-01-16:

#1

run_01_tag_1_debug.log Edit (26.0 KiB, text/plain)

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) wrote on 2017-01-16: Re: [Bug 1656728] [NEW] "cannot concatenate" error at end of run

#2

Download full text (7.6 KiB)

Hi,

If you want to implement your own cluster type, the correct method is to use the plugin method:
https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/Plugin

Otherwise, the debug file is not very precise for such function.
You should have a more clear error if you comment the
@multiple_try()
line (just above the definition of the function)

My guess that the problem reported is actually related to those lines
       if not id.isdigit():
           raise ClusterManagmentError, 'fail to submit to the cluster: \n%s' \
                   % (output[0] + ‘\n' + output[1])

Meaning that the code fails to identify the id used for the run (for whatever reason)
and then it crash due to some ill formatted output format leading to such error.

> There are events in Events/run_01/unweighted_events.lhe.gz. Should I be
> able to use them without problems, since the crash occurred after the
> Results Summary appeared?

The crash occurs when the code tries to add systematics uncertainty information inside the events.
So you will miss that information. But for the rest this sample is fine.

> Was the error message caused by the change I
> made to cluster.py, which I thought was innocuous (and which was
> absolutely necessary for MadGraph to even get through the refine stage
> at all on our cluster)?

That I do not know.

Cheers,

Olivier

> On Jan 16, 2017, at 03:39, Matthew Reece <email address hidden> wrote:
>
> Public bug reported:
>
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
>
> Every attempt failed, probably because Odyssey's default memory and time
> allocations are too low for all of the subprocesses to finish. I can't
> find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
>
> command = ['sbatch', '-o', stdout,
> '-J', me_dir,
> '-e', stderr,
> '--mem=2500',
> '-t', '150', prog] + argument
>
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
>
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I be
> able to use them without problems, since the crash occurred after the
> Results Summary appeared? Was the error message caused by the change I
> made to cluster.py, which I thought was innocuous (and which was
> absolutely necessary for MadGraph to even get through the refine stage
> at all on our cluster)?
>
>
> INFO: Combining Events
> === Results Summary for run: run_01 tag: tag_1 ===
>
> Cross-section : 4.025 +- 0.003022 pb
> Nb of events : 50000
>
> INFO: Running Systematics computation
> INFO: Trying to download NNPDF23_lo_as_0130_qed
> NNPDF23_lo_as_0130_qed.tar.gz: 26.3 MB [100.0%]
> INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF ...

Hi,

If you want to implement your own cluster type, the correct method is to use the plugin method:
https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/Plugin

Otherwise, the debug file is not very precise for such function.
You should have a more clear error if you comment the 
   @multiple_try()
line (just above the definition of the function)

My guess that the problem reported is actually related to those lines
       if not id.isdigit():
           raise ClusterManagmentError, 'fail to submit to the cluster: \n%s' \
                   % (output[0] + ‘\n' + output[1])

Meaning that the code fails to identify the id used for the run (for whatever reason) 
and then it crash due to some ill formatted output format leading to such error.

> There are events in Events/run_01/unweighted_events.lhe.gz. Should I be
> able to use them without problems, since the crash occurred after the
> Results Summary appeared?

The crash occurs when the code tries to add systematics uncertainty information inside the events.
So you will miss that information. But for the rest this sample is fine.

> Was the error message caused by the change I
> made to cluster.py, which I thought was innocuous (and which was
> absolutely necessary for MadGraph to even get through the refine stage
> at all on our cluster)?

That I do not know.

Cheers,

Olivier

> On Jan 16, 2017, at 03:39, Matthew Reece <1656728@bugs.launchpad.net> wrote:
> 
> Public bug reported:
> 
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
> 
> Every attempt failed, probably because Odyssey's default memory and time
> allocations are too low for all of the subprocesses to finish. I can't
> find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
> 
>       command = ['sbatch', '-o', stdout,
>                  '-J', me_dir, 
>                  '-e', stderr, 
>                  '--mem=2500', 
>                  '-t', '150', prog] + argument
> 
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
> 
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I be
> able to use them without problems, since the crash occurred after the
> Results Summary appeared? Was the error message caused by the change I
> made to cluster.py, which I thought was innocuous (and which was
> absolutely necessary for MadGraph to even get through the refine stage
> at all on our cluster)?
> 
> 
> INFO: Combining Events 
> === Results Summary for run: run_01 tag: tag_1 ===
> 
>    Cross-section :   4.025 +- 0.003022 pb
>    Nb of events :  50000
> 
> INFO: Running Systematics computation 
> INFO: Trying to download NNPDF23_lo_as_0130_qed 
> NNPDF23_lo_as_0130_qed.tar.gz:    26.3 MB [100.0%] 
> INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> Start waiting for update. (more info in debug mode)
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> Command "generate_events " interrupted with error:
> TypeError : [Fail 5 times] 
> 	 cannot concatenate 'str' and 'NoneType' objects 
> Please report this bug on https://bugs.launchpad.net/mg5amcnlo
> More information is found in '/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/run_01_tag_1_debug.log'.
> Please attach this file to your report.
> quit
> INFO: storing files of previous run 
> INFO: Done 
> INFO:  
> 
> INFO:
> 
> ** Affects: mg5amcnlo
>    Importance: Undecided
>        Status: New
> 
> ** Attachment added: "run_01_tag_1_debug.log"
>  https://bugs.launchpad.net/bugs/1656728/+attachment/4804868/+files/run_01_tag_1_debug.log
> 
> -- 
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1656728
> 
> Title:
> "cannot concatenate" error at end of run
> 
> Status in MadGraph5_aMC@NLO:
> New
> 
> Bug description:
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
>  cluster_type = slurm
>  cluster_queue = serial_requeue
>  cluster_size = 50
> 
> Every attempt failed, probably because Odyssey's default memory and
> time allocations are too low for all of the subprocesses to finish. I
> can't find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
> 
>         command = ['sbatch', '-o', stdout,
>                    '-J', me_dir, 
>                    '-e', stderr, 
>                    '--mem=2500', 
>                    '-t', '150', prog] + argument
> 
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
> 
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I
> be able to use them without problems, since the crash occurred after
> the Results Summary appeared? Was the error message caused by the
> change I made to cluster.py, which I thought was innocuous (and which
> was absolutely necessary for MadGraph to even get through the refine
> stage at all on our cluster)?
> 
> 
> 
> INFO: Combining Events 
>   === Results Summary for run: run_01 tag: tag_1 ===
> 
>      Cross-section :   4.025 +- 0.003022 pb
>      Nb of events :  50000
> 
> INFO: Running Systematics computation 
> INFO: Trying to download NNPDF23_lo_as_0130_qed 
> NNPDF23_lo_as_0130_qed.tar.gz:    26.3 MB [100.0%] 
> INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> Start waiting for update. (more info in debug mode)
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> Command "generate_events " interrupted with error:
> TypeError : [Fail 5 times] 
> 	 cannot concatenate 'str' and 'NoneType' objects 
> Please report this bug on https://bugs.launchpad.net/mg5amcnlo
> More information is found in '/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/run_01_tag_1_debug.log'.
> Please attach this file to your report.
> quit
> INFO: storing files of previous run 
> INFO: Done 
> INFO:  
> 
> INFO:
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mg5amcnlo/+bug/1656728/+subscriptions

Revision history for this message

Matthew Reece (mreece82) wrote on 2017-01-16:

#3

Hi Olivier,

Thanks for replying promptly. The lines you suspect are not the ones quoted in the bug report:

File "/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/bin/internal/misc.py", line 367, in deco_f_retry
raise error.__class__, '[Fail %i times] \n %s ' % (i+1, error)

Shouldn't this line read as follows?
raise error.__class__, '[Fail %i times] \n %s ' % (i+1, str(error))

Elsewhere in the same function you have used str(error) in printing output.

Best,
Matt

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) wrote on 2017-01-16: Re: [Bug 1656728] "cannot concatenate" error at end of run

#4

Download full text (4.6 KiB)

Hi,

> The lines you suspect are not the ones
> quoted in the bug report:

This is due to the “multiply_try" function. This is because of that that I’m not sure which line is responsible for the bug.

> File "/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/bin/internal/misc.py", line 367, in deco_f_retry
> raise error.__class__, '[Fail %i times] \n %s ' % (i+1, error)
>
> Shouldn't this line read as follows?
> raise error.__class__, ‘[Fail %i times] \n %s ' % (i+1, str(error))

I do not think so.

The printout seems to work correctly. So I do not think that this line is problematic.
Also in this case the %s should automatically called the str() function, so this is not required to put str(error).

Cheers,

Olivier
> On Jan 16, 2017, at 15:24, Matthew Reece <email address hidden> wrote:
>
> Hi Olivier,
>
> Thanks for replying promptly. The lines you suspect are not the ones
> quoted in the bug report:
>
> File "/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/bin/internal/misc.py", line 367, in deco_f_retry
> raise error.__class__, '[Fail %i times] \n %s ' % (i+1, error)
>
> Shouldn't this line read as follows?
> raise error.__class__, '[Fail %i times] \n %s ' % (i+1, str(error))
>
> Elsewhere in the same function you have used str(error) in printing
> output.
>
> Best,
> Matt
>
> --
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1656728
>
> Title:
> "cannot concatenate" error at end of run
>
> Status in MadGraph5_aMC@NLO:
> New
>
> Bug description:
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
>
> Every attempt failed, probably because Odyssey's default memory and
> time allocations are too low for all of the subprocesses to finish. I
> can't find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
>
> command = ['sbatch', '-o', stdout,
> '-J', me_dir,
> '-e', stderr,
> '--mem=2500',
> '-t', '150', prog] + argument
>
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
>
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I
> be able to use them without problems, since the crash occurred after
> the Results Summary appeared? Was the error message caused by the
> change I made to cluster.py, which I thought was innocuous (and which
> was absolutely necessary for MadGraph to even get through the refine
> stage at all on our cluster)?
>
>
>
> INFO: Combining Events
> === Results Summary for run: run_01 tag: tag_1 ===
>
> Cross-section : 4.025 +- 0.003022 pb
> Nb of events : 50000
>
> INFO: Runni...

Hi,

> The lines you suspect are not the ones
> quoted in the bug report:

This is due to the “multiply_try" function. This is because of that that I’m not sure which line is responsible for the bug.

>  File "/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/bin/internal/misc.py", line 367, in deco_f_retry
>    raise error.__class__, '[Fail %i times] \n %s ' % (i+1, error)
> 
> Shouldn't this line read as follows?
>    raise error.__class__, ‘[Fail %i times] \n %s ' % (i+1, str(error))

I do not think so.

The printout seems to work correctly. So I do not think that this line is problematic.
Also in this case the %s should automatically called the str() function, so this is not required to put str(error).

Cheers,

Olivier
> On Jan 16, 2017, at 15:24, Matthew Reece <1656728@bugs.launchpad.net> wrote:
> 
> Hi Olivier,
> 
> Thanks for replying promptly. The lines you suspect are not the ones
> quoted in the bug report:
> 
>  File "/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/bin/internal/misc.py", line 367, in deco_f_retry
>    raise error.__class__, '[Fail %i times] \n %s ' % (i+1, error)
> 
> Shouldn't this line read as follows?
>    raise error.__class__, '[Fail %i times] \n %s ' % (i+1, str(error))
> 
> Elsewhere in the same function you have used str(error) in printing
> output.
> 
> Best,
> Matt
> 
> -- 
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1656728
> 
> Title:
>  "cannot concatenate" error at end of run
> 
> Status in MadGraph5_aMC@NLO:
>  New
> 
> Bug description:
>  I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
>   cluster_type = slurm
>   cluster_queue = serial_requeue
>   cluster_size = 50
> 
>  Every attempt failed, probably because Odyssey's default memory and
>  time allocations are too low for all of the subprocesses to finish. I
>  can't find a way to change these in MadGraph without editing the code
>  directly, so I modified the call to sbatch in cluster.py:
> 
>          command = ['sbatch', '-o', stdout,
>                     '-J', me_dir, 
>                     '-e', stderr, 
>                     '--mem=2500', 
>                     '-t', '150', prog] + argument
> 
>  After this change, all of the running seems to proceed smoothly until
>  the very end, when I get the output below. (The run_01_tag_1_debug.log
>  is attached.)
> 
>  There are events in Events/run_01/unweighted_events.lhe.gz. Should I
>  be able to use them without problems, since the crash occurred after
>  the Results Summary appeared? Was the error message caused by the
>  change I made to cluster.py, which I thought was innocuous (and which
>  was absolutely necessary for MadGraph to even get through the refine
>  stage at all on our cluster)?
> 
> 
> 
>  INFO: Combining Events 
>    === Results Summary for run: run_01 tag: tag_1 ===
> 
>       Cross-section :   4.025 +- 0.003022 pb
>       Nb of events :  50000
> 
>  INFO: Running Systematics computation 
>  INFO: Trying to download NNPDF23_lo_as_0130_qed 
>  NNPDF23_lo_as_0130_qed.tar.gz:    26.3 MB [100.0%] 
>  INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF 
>  WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
>  Start waiting for update. (more info in debug mode)
>  WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
>  WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
>  WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
>  WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
>  Command "generate_events " interrupted with error:
>  TypeError : [Fail 5 times] 
>  	 cannot concatenate 'str' and 'NoneType' objects 
>  Please report this bug on https://bugs.launchpad.net/mg5amcnlo
>  More information is found in '/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/run_01_tag_1_debug.log'.
>  Please attach this file to your report.
>  quit
>  INFO: storing files of previous run 
>  INFO: Done 
>  INFO:  
> 
>  INFO:
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mg5amcnlo/+bug/1656728/+subscriptions

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) wrote on 2017-01-16:

#5

Download full text (4.6 KiB)

Hi,

> The lines you suspect are not the ones
> quoted in the bug report:

This is due to the “multiply_try" function. This is because of that that I’m not sure which line is responsible for the bug.

> File "/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/bin/internal/misc.py", line 367, in deco_f_retry
> raise error.__class__, '[Fail %i times] \n %s ' % (i+1, error)
>
> Shouldn't this line read as follows?
> raise error.__class__, ‘[Fail %i times] \n %s ' % (i+1, str(error))

I do not think so.

The printout seems to work correctly. So I do not think that this line is problematic.
Also in this case the %s should automatically called the str() function, so this is not required to put str(error).

Cheers,

Olivier
> On Jan 16, 2017, at 15:24, Matthew Reece <email address hidden> wrote:
>
> Hi Olivier,
>
> Thanks for replying promptly. The lines you suspect are not the ones
> quoted in the bug report:
>
> File "/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/bin/internal/misc.py", line 367, in deco_f_retry
> raise error.__class__, '[Fail %i times] \n %s ' % (i+1, error)
>
> Shouldn't this line read as follows?
> raise error.__class__, '[Fail %i times] \n %s ' % (i+1, str(error))
>
> Elsewhere in the same function you have used str(error) in printing
> output.
>
> Best,
> Matt
>
> --
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1656728
>
> Title:
> "cannot concatenate" error at end of run
>
> Status in MadGraph5_aMC@NLO:
> New
>
> Bug description:
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
>
> Every attempt failed, probably because Odyssey's default memory and
> time allocations are too low for all of the subprocesses to finish. I
> can't find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
>
> command = ['sbatch', '-o', stdout,
> '-J', me_dir,
> '-e', stderr,
> '--mem=2500',
> '-t', '150', prog] + argument
>
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
>
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I
> be able to use them without problems, since the crash occurred after
> the Results Summary appeared? Was the error message caused by the
> change I made to cluster.py, which I thought was innocuous (and which
> was absolutely necessary for MadGraph to even get through the refine
> stage at all on our cluster)?
>
>
>
> INFO: Combining Events
> === Results Summary for run: run_01 tag: tag_1 ===
>
> Cross-section : 4.025 +- 0.003022 pb
> Nb of events : 50000
>
> INFO: Running Systematics computation
> INFO...

Hi,

> The lines you suspect are not the ones
> quoted in the bug report:

This is due to the “multiply_try" function. This is because of that that I’m not sure which line is responsible for the bug.

> File "/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/bin/internal/misc.py", line 367, in deco_f_retry
>   raise error.__class__, '[Fail %i times] \n %s ' % (i+1, error)
> 
> Shouldn't this line read as follows?
>   raise error.__class__, ‘[Fail %i times] \n %s ' % (i+1, str(error))

I do not think so.

The printout seems to work correctly. So I do not think that this line is problematic.
Also in this case the %s should automatically called the str() function, so this is not required to put str(error).

Cheers,

Olivier
> On Jan 16, 2017, at 15:24, Matthew Reece <1656728@bugs.launchpad.net> wrote:
> 
> Hi Olivier,
> 
> Thanks for replying promptly. The lines you suspect are not the ones
> quoted in the bug report:
> 
> File "/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/bin/internal/misc.py", line 367, in deco_f_retry
>   raise error.__class__, '[Fail %i times] \n %s ' % (i+1, error)
> 
> Shouldn't this line read as follows?
>   raise error.__class__, '[Fail %i times] \n %s ' % (i+1, str(error))
> 
> Elsewhere in the same function you have used str(error) in printing
> output.
> 
> Best,
> Matt
> 
> -- 
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1656728
> 
> Title:
> "cannot concatenate" error at end of run
> 
> Status in MadGraph5_aMC@NLO:
> New
> 
> Bug description:
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
>  cluster_type = slurm
>  cluster_queue = serial_requeue
>  cluster_size = 50
> 
> Every attempt failed, probably because Odyssey's default memory and
> time allocations are too low for all of the subprocesses to finish. I
> can't find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
> 
>         command = ['sbatch', '-o', stdout,
>                    '-J', me_dir, 
>                    '-e', stderr, 
>                    '--mem=2500', 
>                    '-t', '150', prog] + argument
> 
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
> 
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I
> be able to use them without problems, since the crash occurred after
> the Results Summary appeared? Was the error message caused by the
> change I made to cluster.py, which I thought was innocuous (and which
> was absolutely necessary for MadGraph to even get through the refine
> stage at all on our cluster)?
> 
> 
> 
> INFO: Combining Events 
>   === Results Summary for run: run_01 tag: tag_1 ===
> 
>      Cross-section :   4.025 +- 0.003022 pb
>      Nb of events :  50000
> 
> INFO: Running Systematics computation 
> INFO: Trying to download NNPDF23_lo_as_0130_qed 
> NNPDF23_lo_as_0130_qed.tar.gz:    26.3 MB [100.0%] 
> INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> Start waiting for update. (more info in debug mode)
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> Command "generate_events " interrupted with error:
> TypeError : [Fail 5 times] 
> 	 cannot concatenate 'str' and 'NoneType' objects 
> Please report this bug on https://bugs.launchpad.net/mg5amcnlo
> More information is found in '/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/run_01_tag_1_debug.log'.
> Please attach this file to your report.
> quit
> INFO: storing files of previous run 
> INFO: Done 
> INFO:  
> 
> INFO:
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mg5amcnlo/+bug/1656728/+subscriptions

Revision history for this message

Matthew Reece (mreece82) wrote on 2017-01-17:

#6

I'm still confused about exactly what is happening in the runs that fail, but by disabling LHAPDF, I was able to successfully run on our cluster with the modification I made to cluster.py. It is unclear to me if the error reflects something problematic about my LHAPDF installation or is a side effect of my modification to cluster.py. (I have successfully run some simpler processes, like p p > t t~, with LHAPDF turned on, so it doesn't always fail.)

In any case, I can get by without the systematics information, so I guess my current setup is good enough for now.

Thanks for your help.

Matt

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) wrote on 2017-01-17: Re: [Bug 1656728] Re: "cannot concatenate" error at end of run

#7

Download full text (4.0 KiB)

Hi,

then you can set in the run_card
none = systematic_program

This will remove the computation of the systematics.

Cheers,

Olivier

> On Jan 17, 2017, at 06:09, Matthew Reece <email address hidden> wrote:
>
> I'm still confused about exactly what is happening in the runs that
> fail, but by disabling LHAPDF, I was able to successfully run on our
> cluster with the modification I made to cluster.py. It is unclear to me
> if the error reflects something problematic about my LHAPDF installation
> or is a side effect of my modification to cluster.py. (I have
> successfully run some simpler processes, like p p > t t~, with LHAPDF
> turned on, so it doesn't always fail.)
>
> In any case, I can get by without the systematics information, so I
> guess my current setup is good enough for now.
>
> Thanks for your help.
>
> Matt
>
> --
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1656728
>
> Title:
> "cannot concatenate" error at end of run
>
> Status in MadGraph5_aMC@NLO:
> New
>
> Bug description:
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
>
> Every attempt failed, probably because Odyssey's default memory and
> time allocations are too low for all of the subprocesses to finish. I
> can't find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
>
> command = ['sbatch', '-o', stdout,
> '-J', me_dir,
> '-e', stderr,
> '--mem=2500',
> '-t', '150', prog] + argument
>
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
>
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I
> be able to use them without problems, since the crash occurred after
> the Results Summary appeared? Was the error message caused by the
> change I made to cluster.py, which I thought was innocuous (and which
> was absolutely necessary for MadGraph to even get through the refine
> stage at all on our cluster)?
>
>
>
> INFO: Combining Events
> === Results Summary for run: run_01 tag: tag_1 ===
>
> Cross-section : 4.025 +- 0.003022 pb
> Nb of events : 50000
>
> INFO: Running Systematics computation
> INFO: Trying to download NNPDF23_lo_as_0130_qed
> NNPDF23_lo_as_0130_qed.tar.gz: 26.3 MB [100.0%]
> INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> Start waiting for update. (more info in debug mode)
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but rep...

Hi,

then you can set in the run_card
none = systematic_program

This will remove the computation of the systematics.

Cheers,

Olivier

> On Jan 17, 2017, at 06:09, Matthew Reece <1656728@bugs.launchpad.net> wrote:
> 
> I'm still confused about exactly what is happening in the runs that
> fail, but by disabling LHAPDF, I was able to successfully run on our
> cluster with the modification I made to cluster.py. It is unclear to me
> if the error reflects something problematic about my LHAPDF installation
> or is a side effect of my modification to cluster.py. (I have
> successfully run some simpler processes, like p p > t t~, with LHAPDF
> turned on, so it doesn't always fail.)
> 
> In any case, I can get by without the systematics information, so I
> guess my current setup is good enough for now.
> 
> Thanks for your help.
> 
> Matt
> 
> -- 
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1656728
> 
> Title:
>  "cannot concatenate" error at end of run
> 
> Status in MadGraph5_aMC@NLO:
>  New
> 
> Bug description:
>  I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
>   cluster_type = slurm
>   cluster_queue = serial_requeue
>   cluster_size = 50
> 
>  Every attempt failed, probably because Odyssey's default memory and
>  time allocations are too low for all of the subprocesses to finish. I
>  can't find a way to change these in MadGraph without editing the code
>  directly, so I modified the call to sbatch in cluster.py:
> 
>          command = ['sbatch', '-o', stdout,
>                     '-J', me_dir, 
>                     '-e', stderr, 
>                     '--mem=2500', 
>                     '-t', '150', prog] + argument
> 
>  After this change, all of the running seems to proceed smoothly until
>  the very end, when I get the output below. (The run_01_tag_1_debug.log
>  is attached.)
> 
>  There are events in Events/run_01/unweighted_events.lhe.gz. Should I
>  be able to use them without problems, since the crash occurred after
>  the Results Summary appeared? Was the error message caused by the
>  change I made to cluster.py, which I thought was innocuous (and which
>  was absolutely necessary for MadGraph to even get through the refine
>  stage at all on our cluster)?
> 
> 
> 
>  INFO: Combining Events 
>    === Results Summary for run: run_01 tag: tag_1 ===
> 
>       Cross-section :   4.025 +- 0.003022 pb
>       Nb of events :  50000
> 
>  INFO: Running Systematics computation 
>  INFO: Trying to download NNPDF23_lo_as_0130_qed 
>  NNPDF23_lo_as_0130_qed.tar.gz:    26.3 MB [100.0%] 
>  INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF 
>  WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
>  Start waiting for update. (more info in debug mode)
>  WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
>  WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
>  WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
>  WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
>  Command "generate_events " interrupted with error:
>  TypeError : [Fail 5 times] 
>  	 cannot concatenate 'str' and 'NoneType' objects 
>  Please report this bug on https://bugs.launchpad.net/mg5amcnlo
>  More information is found in '/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/run_01_tag_1_debug.log'.
>  Please attach this file to your report.
>  quit
>  INFO: storing files of previous run 
>  INFO: Done 
>  INFO:  
> 
>  INFO:
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mg5amcnlo/+bug/1656728/+subscriptions

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) wrote on 2017-01-17:

#8

Download full text (3.9 KiB)

Hi,

then you can set in the run_card
none = systematic_program

This will remove the computation of the systematics.

Cheers,

Olivier

> On Jan 17, 2017, at 06:09, Matthew Reece <email address hidden> wrote:
>
> I'm still confused about exactly what is happening in the runs that
> fail, but by disabling LHAPDF, I was able to successfully run on our
> cluster with the modification I made to cluster.py. It is unclear to me
> if the error reflects something problematic about my LHAPDF installation
> or is a side effect of my modification to cluster.py. (I have
> successfully run some simpler processes, like p p > t t~, with LHAPDF
> turned on, so it doesn't always fail.)
>
> In any case, I can get by without the systematics information, so I
> guess my current setup is good enough for now.
>
> Thanks for your help.
>
> Matt
>
> --
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1656728
>
> Title:
> "cannot concatenate" error at end of run
>
> Status in MadGraph5_aMC@NLO:
> New
>
> Bug description:
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
>
> Every attempt failed, probably because Odyssey's default memory and
> time allocations are too low for all of the subprocesses to finish. I
> can't find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
>
> command = ['sbatch', '-o', stdout,
> '-J', me_dir,
> '-e', stderr,
> '--mem=2500',
> '-t', '150', prog] + argument
>
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
>
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I
> be able to use them without problems, since the crash occurred after
> the Results Summary appeared? Was the error message caused by the
> change I made to cluster.py, which I thought was innocuous (and which
> was absolutely necessary for MadGraph to even get through the refine
> stage at all on our cluster)?
>
>
>
> INFO: Combining Events
> === Results Summary for run: run_01 tag: tag_1 ===
>
> Cross-section : 4.025 +- 0.003022 pb
> Nb of events : 50000
>
> INFO: Running Systematics computation
> INFO: Trying to download NNPDF23_lo_as_0130_qed
> NNPDF23_lo_as_0130_qed.tar.gz: 26.3 MB [100.0%]
> INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> Start waiting for update. (more info in debug mode)
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem...

Hi,

then you can set in the run_card
none = systematic_program

This will remove the computation of the systematics.

Cheers,

Olivier

> On Jan 17, 2017, at 06:09, Matthew Reece <1656728@bugs.launchpad.net> wrote:
> 
> I'm still confused about exactly what is happening in the runs that
> fail, but by disabling LHAPDF, I was able to successfully run on our
> cluster with the modification I made to cluster.py. It is unclear to me
> if the error reflects something problematic about my LHAPDF installation
> or is a side effect of my modification to cluster.py. (I have
> successfully run some simpler processes, like p p > t t~, with LHAPDF
> turned on, so it doesn't always fail.)
> 
> In any case, I can get by without the systematics information, so I
> guess my current setup is good enough for now.
> 
> Thanks for your help.
> 
> Matt
> 
> -- 
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1656728
> 
> Title:
> "cannot concatenate" error at end of run
> 
> Status in MadGraph5_aMC@NLO:
> New
> 
> Bug description:
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
>  cluster_type = slurm
>  cluster_queue = serial_requeue
>  cluster_size = 50
> 
> Every attempt failed, probably because Odyssey's default memory and
> time allocations are too low for all of the subprocesses to finish. I
> can't find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
> 
>         command = ['sbatch', '-o', stdout,
>                    '-J', me_dir, 
>                    '-e', stderr, 
>                    '--mem=2500', 
>                    '-t', '150', prog] + argument
> 
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
> 
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I
> be able to use them without problems, since the crash occurred after
> the Results Summary appeared? Was the error message caused by the
> change I made to cluster.py, which I thought was innocuous (and which
> was absolutely necessary for MadGraph to even get through the refine
> stage at all on our cluster)?
> 
> 
> 
> INFO: Combining Events 
>   === Results Summary for run: run_01 tag: tag_1 ===
> 
>      Cross-section :   4.025 +- 0.003022 pb
>      Nb of events :  50000
> 
> INFO: Running Systematics computation 
> INFO: Trying to download NNPDF23_lo_as_0130_qed 
> NNPDF23_lo_as_0130_qed.tar.gz:    26.3 MB [100.0%] 
> INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> Start waiting for update. (more info in debug mode)
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem. 
> Command "generate_events " interrupted with error:
> TypeError : [Fail 5 times] 
> 	 cannot concatenate 'str' and 'NoneType' objects 
> Please report this bug on https://bugs.launchpad.net/mg5amcnlo
> More information is found in '/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/run_01_tag_1_debug.log'.
> Please attach this file to your report.
> quit
> INFO: storing files of previous run 
> INFO: Done 
> INFO:  
> 
> INFO:
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mg5amcnlo/+bug/1656728/+subscriptions

Olivier Mattelaer (olivier-mattelaer) on 2017-02-06

Changed in mg5amcnlo:
status:	New → Invalid

MadGraph5_aMC@NLO

"cannot concatenate" error at end of run

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches