"cannot concatenate" error at end of run

Bug #1656728 reported by Matthew Reece
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MadGraph5_aMC@NLO
Invalid
Undecided
Unassigned

Bug Description

I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
 cluster_type = slurm
 cluster_queue = serial_requeue
 cluster_size = 50

Every attempt failed, probably because Odyssey's default memory and time allocations are too low for all of the subprocesses to finish. I can't find a way to change these in MadGraph without editing the code directly, so I modified the call to sbatch in cluster.py:

        command = ['sbatch', '-o', stdout,
                   '-J', me_dir,
                   '-e', stderr,
                   '--mem=2500',
                   '-t', '150', prog] + argument

After this change, all of the running seems to proceed smoothly until the very end, when I get the output below. (The run_01_tag_1_debug.log is attached.)

There are events in Events/run_01/unweighted_events.lhe.gz. Should I be able to use them without problems, since the crash occurred after the Results Summary appeared? Was the error message caused by the change I made to cluster.py, which I thought was innocuous (and which was absolutely necessary for MadGraph to even get through the refine stage at all on our cluster)?

INFO: Combining Events
  === Results Summary for run: run_01 tag: tag_1 ===

     Cross-section : 4.025 +- 0.003022 pb
     Nb of events : 50000

INFO: Running Systematics computation
INFO: Trying to download NNPDF23_lo_as_0130_qed
NNPDF23_lo_as_0130_qed.tar.gz: 26.3 MB [100.0%]
INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF
WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
Start waiting for update. (more info in debug mode)
WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
Command "generate_events " interrupted with error:
TypeError : [Fail 5 times]
  cannot concatenate 'str' and 'NoneType' objects
Please report this bug on https://bugs.launchpad.net/mg5amcnlo
More information is found in '/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/run_01_tag_1_debug.log'.
Please attach this file to your report.
quit
INFO: storing files of previous run
INFO: Done
INFO:

INFO:

Revision history for this message
Matthew Reece (mreece82) wrote :
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) wrote : Re: [Bug 1656728] [NEW] "cannot concatenate" error at end of run
Download full text (7.6 KiB)

Hi,

If you want to implement your own cluster type, the correct method is to use the plugin method:
https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/Plugin

Otherwise, the debug file is not very precise for such function.
You should have a more clear error if you comment the
   @multiple_try()
line (just above the definition of the function)

My guess that the problem reported is actually related to those lines
       if not id.isdigit():
           raise ClusterManagmentError, 'fail to submit to the cluster: \n%s' \
                   % (output[0] + ‘\n' + output[1])

Meaning that the code fails to identify the id used for the run (for whatever reason)
and then it crash due to some ill formatted output format leading to such error.

> There are events in Events/run_01/unweighted_events.lhe.gz. Should I be
> able to use them without problems, since the crash occurred after the
> Results Summary appeared?

The crash occurs when the code tries to add systematics uncertainty information inside the events.
So you will miss that information. But for the rest this sample is fine.

> Was the error message caused by the change I
> made to cluster.py, which I thought was innocuous (and which was
> absolutely necessary for MadGraph to even get through the refine stage
> at all on our cluster)?

That I do not know.

Cheers,

Olivier

> On Jan 16, 2017, at 03:39, Matthew Reece <email address hidden> wrote:
>
> Public bug reported:
>
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
>
> Every attempt failed, probably because Odyssey's default memory and time
> allocations are too low for all of the subprocesses to finish. I can't
> find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
>
> command = ['sbatch', '-o', stdout,
> '-J', me_dir,
> '-e', stderr,
> '--mem=2500',
> '-t', '150', prog] + argument
>
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
>
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I be
> able to use them without problems, since the crash occurred after the
> Results Summary appeared? Was the error message caused by the change I
> made to cluster.py, which I thought was innocuous (and which was
> absolutely necessary for MadGraph to even get through the refine stage
> at all on our cluster)?
>
>
> INFO: Combining Events
> === Results Summary for run: run_01 tag: tag_1 ===
>
> Cross-section : 4.025 +- 0.003022 pb
> Nb of events : 50000
>
> INFO: Running Systematics computation
> INFO: Trying to download NNPDF23_lo_as_0130_qed
> NNPDF23_lo_as_0130_qed.tar.gz: 26.3 MB [100.0%]
> INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF ...

Read more...

Revision history for this message
Matthew Reece (mreece82) wrote :

Hi Olivier,

Thanks for replying promptly. The lines you suspect are not the ones quoted in the bug report:

  File "/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/bin/internal/misc.py", line 367, in deco_f_retry
    raise error.__class__, '[Fail %i times] \n %s ' % (i+1, error)

Shouldn't this line read as follows?
    raise error.__class__, '[Fail %i times] \n %s ' % (i+1, str(error))

Elsewhere in the same function you have used str(error) in printing output.

Best,
Matt

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) wrote : Re: [Bug 1656728] "cannot concatenate" error at end of run
Download full text (4.6 KiB)

Hi,

> The lines you suspect are not the ones
> quoted in the bug report:

This is due to the “multiply_try" function. This is because of that that I’m not sure which line is responsible for the bug.

> File "/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/bin/internal/misc.py", line 367, in deco_f_retry
> raise error.__class__, '[Fail %i times] \n %s ' % (i+1, error)
>
> Shouldn't this line read as follows?
> raise error.__class__, ‘[Fail %i times] \n %s ' % (i+1, str(error))

I do not think so.

The printout seems to work correctly. So I do not think that this line is problematic.
Also in this case the %s should automatically called the str() function, so this is not required to put str(error).

Cheers,

Olivier
> On Jan 16, 2017, at 15:24, Matthew Reece <email address hidden> wrote:
>
> Hi Olivier,
>
> Thanks for replying promptly. The lines you suspect are not the ones
> quoted in the bug report:
>
> File "/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/bin/internal/misc.py", line 367, in deco_f_retry
> raise error.__class__, '[Fail %i times] \n %s ' % (i+1, error)
>
> Shouldn't this line read as follows?
> raise error.__class__, '[Fail %i times] \n %s ' % (i+1, str(error))
>
> Elsewhere in the same function you have used str(error) in printing
> output.
>
> Best,
> Matt
>
> --
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1656728
>
> Title:
> "cannot concatenate" error at end of run
>
> Status in MadGraph5_aMC@NLO:
> New
>
> Bug description:
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
>
> Every attempt failed, probably because Odyssey's default memory and
> time allocations are too low for all of the subprocesses to finish. I
> can't find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
>
> command = ['sbatch', '-o', stdout,
> '-J', me_dir,
> '-e', stderr,
> '--mem=2500',
> '-t', '150', prog] + argument
>
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
>
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I
> be able to use them without problems, since the crash occurred after
> the Results Summary appeared? Was the error message caused by the
> change I made to cluster.py, which I thought was innocuous (and which
> was absolutely necessary for MadGraph to even get through the refine
> stage at all on our cluster)?
>
>
>
> INFO: Combining Events
> === Results Summary for run: run_01 tag: tag_1 ===
>
> Cross-section : 4.025 +- 0.003022 pb
> Nb of events : 50000
>
> INFO: Runni...

Read more...

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) wrote :
Download full text (4.6 KiB)

Hi,

> The lines you suspect are not the ones
> quoted in the bug report:

This is due to the “multiply_try" function. This is because of that that I’m not sure which line is responsible for the bug.

> File "/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/bin/internal/misc.py", line 367, in deco_f_retry
> raise error.__class__, '[Fail %i times] \n %s ' % (i+1, error)
>
> Shouldn't this line read as follows?
> raise error.__class__, ‘[Fail %i times] \n %s ' % (i+1, str(error))

I do not think so.

The printout seems to work correctly. So I do not think that this line is problematic.
Also in this case the %s should automatically called the str() function, so this is not required to put str(error).

Cheers,

Olivier
> On Jan 16, 2017, at 15:24, Matthew Reece <email address hidden> wrote:
>
> Hi Olivier,
>
> Thanks for replying promptly. The lines you suspect are not the ones
> quoted in the bug report:
>
> File "/net/hetgfs1/nfs_hetgfs1/mreece/MG5_aMC_v2_5_2/zzj12_xqcut150ht500_clusterA1/bin/internal/misc.py", line 367, in deco_f_retry
> raise error.__class__, '[Fail %i times] \n %s ' % (i+1, error)
>
> Shouldn't this line read as follows?
> raise error.__class__, '[Fail %i times] \n %s ' % (i+1, str(error))
>
> Elsewhere in the same function you have used str(error) in printing
> output.
>
> Best,
> Matt
>
> --
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1656728
>
> Title:
> "cannot concatenate" error at end of run
>
> Status in MadGraph5_aMC@NLO:
> New
>
> Bug description:
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
>
> Every attempt failed, probably because Odyssey's default memory and
> time allocations are too low for all of the subprocesses to finish. I
> can't find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
>
> command = ['sbatch', '-o', stdout,
> '-J', me_dir,
> '-e', stderr,
> '--mem=2500',
> '-t', '150', prog] + argument
>
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
>
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I
> be able to use them without problems, since the crash occurred after
> the Results Summary appeared? Was the error message caused by the
> change I made to cluster.py, which I thought was innocuous (and which
> was absolutely necessary for MadGraph to even get through the refine
> stage at all on our cluster)?
>
>
>
> INFO: Combining Events
> === Results Summary for run: run_01 tag: tag_1 ===
>
> Cross-section : 4.025 +- 0.003022 pb
> Nb of events : 50000
>
> INFO: Running Systematics computation
> INFO...

Read more...

Revision history for this message
Matthew Reece (mreece82) wrote :

I'm still confused about exactly what is happening in the runs that fail, but by disabling LHAPDF, I was able to successfully run on our cluster with the modification I made to cluster.py. It is unclear to me if the error reflects something problematic about my LHAPDF installation or is a side effect of my modification to cluster.py. (I have successfully run some simpler processes, like p p > t t~, with LHAPDF turned on, so it doesn't always fail.)

In any case, I can get by without the systematics information, so I guess my current setup is good enough for now.

Thanks for your help.

Matt

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) wrote : Re: [Bug 1656728] Re: "cannot concatenate" error at end of run
Download full text (4.0 KiB)

Hi,

then you can set in the run_card
none = systematic_program

This will remove the computation of the systematics.

Cheers,

Olivier

> On Jan 17, 2017, at 06:09, Matthew Reece <email address hidden> wrote:
>
> I'm still confused about exactly what is happening in the runs that
> fail, but by disabling LHAPDF, I was able to successfully run on our
> cluster with the modification I made to cluster.py. It is unclear to me
> if the error reflects something problematic about my LHAPDF installation
> or is a side effect of my modification to cluster.py. (I have
> successfully run some simpler processes, like p p > t t~, with LHAPDF
> turned on, so it doesn't always fail.)
>
> In any case, I can get by without the systematics information, so I
> guess my current setup is good enough for now.
>
> Thanks for your help.
>
> Matt
>
> --
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1656728
>
> Title:
> "cannot concatenate" error at end of run
>
> Status in MadGraph5_aMC@NLO:
> New
>
> Bug description:
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
>
> Every attempt failed, probably because Odyssey's default memory and
> time allocations are too low for all of the subprocesses to finish. I
> can't find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
>
> command = ['sbatch', '-o', stdout,
> '-J', me_dir,
> '-e', stderr,
> '--mem=2500',
> '-t', '150', prog] + argument
>
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
>
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I
> be able to use them without problems, since the crash occurred after
> the Results Summary appeared? Was the error message caused by the
> change I made to cluster.py, which I thought was innocuous (and which
> was absolutely necessary for MadGraph to even get through the refine
> stage at all on our cluster)?
>
>
>
> INFO: Combining Events
> === Results Summary for run: run_01 tag: tag_1 ===
>
> Cross-section : 4.025 +- 0.003022 pb
> Nb of events : 50000
>
> INFO: Running Systematics computation
> INFO: Trying to download NNPDF23_lo_as_0130_qed
> NNPDF23_lo_as_0130_qed.tar.gz: 26.3 MB [100.0%]
> INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> Start waiting for update. (more info in debug mode)
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but rep...

Read more...

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) wrote :
Download full text (3.9 KiB)

Hi,

then you can set in the run_card
none = systematic_program

This will remove the computation of the systematics.

Cheers,

Olivier

> On Jan 17, 2017, at 06:09, Matthew Reece <email address hidden> wrote:
>
> I'm still confused about exactly what is happening in the runs that
> fail, but by disabling LHAPDF, I was able to successfully run on our
> cluster with the modification I made to cluster.py. It is unclear to me
> if the error reflects something problematic about my LHAPDF installation
> or is a side effect of my modification to cluster.py. (I have
> successfully run some simpler processes, like p p > t t~, with LHAPDF
> turned on, so it doesn't always fail.)
>
> In any case, I can get by without the systematics information, so I
> guess my current setup is good enough for now.
>
> Thanks for your help.
>
> Matt
>
> --
> You received this bug notification because you are subscribed to
> MadGraph5_aMC@NLO.
> https://bugs.launchpad.net/bugs/1656728
>
> Title:
> "cannot concatenate" error at end of run
>
> Status in MadGraph5_aMC@NLO:
> New
>
> Bug description:
> I am trying to run MadGraph on Harvard's Odyssey cluster, which uses slurm. First I tried to do this by editing the me5_configuration.txt file:
> cluster_type = slurm
> cluster_queue = serial_requeue
> cluster_size = 50
>
> Every attempt failed, probably because Odyssey's default memory and
> time allocations are too low for all of the subprocesses to finish. I
> can't find a way to change these in MadGraph without editing the code
> directly, so I modified the call to sbatch in cluster.py:
>
> command = ['sbatch', '-o', stdout,
> '-J', me_dir,
> '-e', stderr,
> '--mem=2500',
> '-t', '150', prog] + argument
>
> After this change, all of the running seems to proceed smoothly until
> the very end, when I get the output below. (The run_01_tag_1_debug.log
> is attached.)
>
> There are events in Events/run_01/unweighted_events.lhe.gz. Should I
> be able to use them without problems, since the crash occurred after
> the Results Summary appeared? Was the error message caused by the
> change I made to cluster.py, which I thought was innocuous (and which
> was absolutely necessary for MadGraph to even get through the refine
> stage at all on our cluster)?
>
>
>
> INFO: Combining Events
> === Results Summary for run: run_01 tag: tag_1 ===
>
> Cross-section : 4.025 +- 0.003022 pb
> Nb of events : 50000
>
> INFO: Running Systematics computation
> INFO: Trying to download NNPDF23_lo_as_0130_qed
> NNPDF23_lo_as_0130_qed.tar.gz: 26.3 MB [100.0%]
> INFO: NNPDF23_lo_as_0130_qed successfully downloaded and stored in /n/hetgfs1/mreece/share/LHAPDF
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> Start waiting for update. (more info in debug mode)
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem.
> WARNING: cluster.get_job_identifier runs unexpectedly. This should be fine but report this message if you have problem...

Read more...

Changed in mg5amcnlo:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.