madgraph crash when combining runs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MadGraph5_aMC@NLO |
Invalid
|
Undecided
|
Unassigned |
Bug Description
In madgraph 2.6.1 I am generatinh 100k events and after completion of the jobs and during the combination of runs I got a crash:
INFO: Idle: 0, Running: 1, Completed: 159 [ 9h 2m ]
INFO: Idle: 0, Running: 1, Completed: 159 [ 9h 12m ]
INFO: All jobs finished
INFO: Idle: 0, Running: 0, Completed: 160 [ 9h 22m ]
INFO: Combining runs
Error when reading /afs/cern.
Command "generate_events " interrupted with error:
ValueError : empty string for float()
Please report this bug on https:/
More information is found in '/afs/cern.
Please attach this file to your report.
quit
INFO:
The folder G103i2 contains 4 files:
-bash-4.1$ more results.dat
end-code not correct 2
-bash-4.1$
-bash-4.1$ more input_sg.txt
5000 9 3
-3745428.24213
2
1
0
103
-bash-4.1$
-bash-4.1$ more moffset.dat
61
-bash-4.1$
events.lhe is empty
The STDOUT of ajob160 is the following
-bash-4.1$ more /afs/cern.
@(#)CERN job starter $Date: 2010/06/23 14:22:16 $
Working directory is </pool/
At line 398 of file unwgt.f (unit = 25, file = '')
Fortran runtime error: Connection timed out
rm: cannot remove `results.dat': No such file or directory
ERROR DETECTED
Job finished at Wed Feb 14 19:03:35 CET 2018 on node
under linux version Scientific Linux CERN SLC release 6.9 (Carbon)
CERN statistics: This process used approximately : 0:09:22 KSI2K hours (562 KSI2K seconds)
Is there a way to rerun only the specific job and continue from this point without rerunning all the jobs?
Or combine the rest of the events i.e. excluding this directory?
Hi,
A connection timeout probably means that some disk space which is not possible to mount on the machine that you were using. Therefore I do not think that I can do anything concerning this.
If you relaunch the same job, do you have the same output/bug?
I guess that you should not reproduce this (but if unlikely this is a hardware problem on some node and that you go back on the same node)
Cheers,
Olivier