Reproducibility problem in 2.6.2 (in combine?)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MadGraph5_aMC@NLO |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Dear authors,
I'm running tests with MG5_aMC 2.6.2 - in case you have a CERN account, it's this one:
/afs/cern.
We see some non-reproducibi
Looking at the LHE files, they seem to contain the same events, but in different order. Just diffing the two files, you can see an example right away:
1487a1488,1496
> 6 1 +8.9242237e-05 3.75379100e+02 7.95049700e-03 1.05265500e-01
> 21 -1 0 0 501 502 +0.0000000000e+00 +0.0000000000e+00 +2.1196628570e+01 2.1196628570e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00
> 2 -1 0 0 502 0 -0.0000000000e+00 -0.0000000000e+00 -1.6619322556e+03 1.6619322556e+03 0.0000000000e+00 0.0000e+00 -1.0000e+00
> 24 2 1 2 0 0 +8.9075749711e+01 +3.5726986208e+01 -1.0898807218e+03 1.1239759236e+03 2.5743150990e+02 0.0000e+00 0.0000e+00
> 1000023 1 3 3 0 0 +8.0116545705e+01 +1.1392922116e+02 -6.6081705506e+02 6.8029530983e+02 8.2000000000e+01 0.0000e+00 -1.0000e+00
> 1000024 1 3 3 0 0 +8.9592040060e+00 -7.8202234949e+01 -4.2906366674e+02 4.4368061374e+02 8.1000000000e+01 0.0000e+00 1.0000e+00
> 1 1 1 2 501 0 -8.9075749711e+01 -3.5726986208e+01 -5.5085490525e+02 5.5915296062e+02 0.0000000000e+00 0.0000e+00 -1.0000e+00
> </event>
> <event>
1506,1532c1515,1542
< 6 1 +8.9242237e-05 3.75379100e+02 7.95049700e-03 1.05265500e-01
< 21 -1 0 0 501 502 +0.0000000000e+00 +0.0000000000e+00 +2.1196628570e+01 2.1196628570e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00
< 2 -1 0 0 502 0 -0.0000000000e+00 -0.0000000000e+00 -1.6619322556e+03 1.6619322556e+03 0.0000000000e+00 0.0000e+00 -1.0000e+00
< 24 2 1 2 0 0 +8.9075749711e+01 +3.5726986208e+01 -1.0898807218e+03 1.1239759236e+03 2.5743150990e+02 0.0000e+00 0.0000e+00
< 1000023 1 3 3 0 0 +8.0116545705e+01 +1.1392922116e+02 -6.6081705506e+02 6.8029530983e+02 8.2000000000e+01 0.0000e+00 -1.0000e+00
< 1000024 1 3 3 0 0 +8.9592040060e+00 -7.8202234949e+01 -4.2906366674e+02 4.4368061374e+02 8.1000000000e+01 0.0000e+00 1.0000e+00
< 1 1 1 2 501 0 -8.9075749711e+01 -3.5726986208e+01 -5.5085490525e+02 5.5915296062e+02 0.0000000000e+00 0.0000e+00 -1.0000e+00
< </event>
I believe that's the same event, but note that the line numbers are ~20 apart -- this event is first in one file and fourth in the other. I believe this indicates some non-reproducibility or a race condition in combine somewhere.
Because these events then enter programs like Pythia8 -- or even when they enter MadSpin -- the events are processed in order, the final events then appear random (different seeds are applied to the same particles). That means the events are effectively not reproducible when processed through, so this is an issue for our bug hunting and for various other reasons.
If you could help us out with this it would be very much appreciated!
Thanks,
Zach
Changed in mg5amcnlo: | |
status: | Fix Committed → Fix Released |
Hi,
This is actually the case since a couple of version.
If you want to ensure 100% reproducibility you also have to set the python seed in top of the fortran seed that are set within the run_card.
Cheers,
Olivier
> On 4 May 2018, at 23:23, Zachary Marshall <email address hidden> wrote: ch/sw/lcg/ external/ MCGenerators_ lcgcmt67c/ madgraph5amc/ 2.6.2.atlas/ x86_64- slc6-gcc47- opt/ lity, even when setting a random number seed.
>
> Public bug reported:
>
> Dear authors,
>
> I'm running tests with MG5_aMC 2.6.2 - in case you have a CERN account, it's this one:
> /afs/cern.
>
> We see some non-reproducibi
> I attach a tarball of the cards directories so that you have the run,
> param, and proc cards. For the actual runs, iseed was set to 1234 (of
> course, the code resets it to 0 after the run).
>
> Looking at the LHE files, they seem to contain the same events, but in
> different order. Just diffing the two files, you can see an example
> right away:
>
> 1487a1488,1496
>> 6 1 +8.9242237e-05 3.75379100e+02 7.95049700e-03 1.05265500e-01
>> 21 -1 0 0 501 502 +0.0000000000e+00 +0.0000000000e+00 +2.1196628570e+01 2.1196628570e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00
>> 2 -1 0 0 502 0 -0.0000000000e+00 -0.0000000000e+00 -1.6619322556e+03 1.6619322556e+03 0.0000000000e+00 0.0000e+00 -1.0000e+00
>> 24 2 1 2 0 0 +8.9075749711e+01 +3.5726986208e+01 -1.0898807218e+03 1.1239759236e+03 2.5743150990e+02 0.0000e+00 0.0000e+00
>> 1000023 1 3 3 0 0 +8.0116545705e+01 +1.1392922116e+02 -6.6081705506e+02 6.8029530983e+02 8.2000000000e+01 0.0000e+00 -1.0000e+00
>> 1000024 1 3 3 0 0 +8.9592040060e+00 -7.8202234949e+01 -4.2906366674e+02 4.4368061374e+02 8.1000000000e+01 0.0000e+00 1.0000e+00
>> 1 1 1 2 501 0 -8.9075749711e+01 -3.5726986208e+01 -5.5085490525e+02 5.5915296062e+02 0.0000000000e+00 0.0000e+00 -1.0000e+00
>> </event>
>> <event>
> 1506,1532c1515,1542
> < 6 1 +8.9242237e-05 3.75379100e+02 7.95049700e-03 1.05265500e-01
> < 21 -1 0 0 501 502 +0.0000000000e+00 +0.0000000000e+00 +2.1196628570e+01 2.1196628570e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00
> < 2 -1 0 0 502 0 -0.0000000000e+00 -0.0000000000e+00 -1.6619322556e+03 1.6619322556e+03 0.0000000000e+00 0.0000e+00 -1.0000e+00
> < 24 2 1 2 0 0 +8.9075749711e+01 +3.5726986208e+01 -1.0898807218e+03 1.1239759236e+03 2.5743150990e+02 0.0000e+00 0.0000e+00
> < 1000023 1 3 3 0 0 +8.0116545705e+01 +1.1392922116e+02 -6.6081705506e+02 6.8029530983e+02 8.2000000000e+01 0.0000e+00 -1.0000e+00
> < 1000024 1 3 3 0 0 +8.9592040060e+00 -7.8202234949e+01 -4.2906366674e+02 4.4368061374e+02 8.1000000000e+01 0.0000e+00 1.0000e+00
> < 1 1 1 2 501 0 -8.9075749711e+01 -3.5726986208e+01 -5.5085490525e+02 5.5915296062e+02 0.0000000000e+00 0.0000e+00 -1.0000e+00
> < </event>
>
>
> I believe that's the same event, but note that the line numbers are ~20 apart -- this event is first in one file and fourth in the other. I believe this indicates some non-reproducibility or a race condition in combine somewhere.
>
> Becau...