race condition in simcontrol.py -i for large campaigns

Bug #661125 reported by Daniel Bültmann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openWNS Wrowser
Fix Committed
Undecided
Daniel Bültmann

Bug Description

There is a race condition in simcontrol.py which results in simulations beeing marked as crashed, although they finished correctly.
This happens mostly for large campaign.

In simcontrol.py, the jobInfo method, which is called when invoked with parameter '-i' first calls consistency check. Consistency check queries the database for the ID of all scenarios. After that it asks SGE for the status of the job (if it is in state Queued or Running). If qstat reports "job unknown", then the simulation is marked as crashed.

It can happen that a simulation regularly finishes in between the database query and the call to qstat with the respective jobID. The probability increases if a large number of jobs need to be checked, i.e. for large scenarios.

Also executing consistencyCheck every time when asking for the status is very time consuming.

I suggest to remove the call to 'consistency check' from the 'jobInfo' method. Although it does not fix the race condition, it is likely that falsely reported 'Crashed' scenarios will not occur anymore. Also it greatly improves the responsiveness of 'simcontrol.py -i'.

Let me know if I should provide the respective modifications.

Related branches

Revision history for this message
Daniel Bültmann (daniel.bueltmann) wrote :

Phone call with Klaus.

I will provide such a patch. Users can still execute --consistency-check and we will get rid of the disadvantages described above

Changed in openwns-wrowser:
status: New → Confirmed
assignee: nobody → Daniel Bültmann (daniel.bueltmann)
milestone: none → 0.9
Changed in openwns-wrowser:
status: Confirmed → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.