race condition in simcontrol.py -i for large campaigns
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
openWNS Wrowser |
Fix Committed
|
Undecided
|
Daniel Bültmann |
Bug Description
There is a race condition in simcontrol.py which results in simulations beeing marked as crashed, although they finished correctly.
This happens mostly for large campaign.
In simcontrol.py, the jobInfo method, which is called when invoked with parameter '-i' first calls consistency check. Consistency check queries the database for the ID of all scenarios. After that it asks SGE for the status of the job (if it is in state Queued or Running). If qstat reports "job unknown", then the simulation is marked as crashed.
It can happen that a simulation regularly finishes in between the database query and the call to qstat with the respective jobID. The probability increases if a large number of jobs need to be checked, i.e. for large scenarios.
Also executing consistencyCheck every time when asking for the status is very time consuming.
I suggest to remove the call to 'consistency check' from the 'jobInfo' method. Although it does not fix the race condition, it is likely that falsely reported 'Crashed' scenarios will not occur anymore. Also it greatly improves the responsiveness of 'simcontrol.py -i'.
Let me know if I should provide the respective modifications.
Related branches
Changed in openwns-wrowser: | |
status: | Confirmed → Fix Committed |
Phone call with Klaus.
I will provide such a patch. Users can still execute --consistency-check and we will get rid of the disadvantages described above