Monasca agent is not thread safe

Bug #1476313 reported by Ryan Bak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Monasca
Triaged
Undecided
David Schroeder

Bug Description

The services_checks class of the monasca agent uses the threading library but has no thread safety built in. This can cause a number of unpredictable behaviors. Below are some examples:

Throwing Errors:
The main thread of the agent runs _clean which loops through jobs_status, but this dictionary can be changed by other threads in _process if there is an error. This results in an error because _clean is querying on a key that no longer exists in the dictionary. This is an easy bug to hit and we have hit several times while using the nagios plugin.

Lost Data:
In the submit_metric method of aggregator, if "context not in self.metrics", self.metrics[context] is set to a new metrics_class, and shortly after that the metric_class.sample() method is run. However if two threads with identical context were to arrive at this at the same time it would be possible for both threads determine that context is not in self.metrics, one thread adds context to the dictionary and runs sample, and then the other thread overwrites context in the dictionary, causing the data first data point to be lost. I'm not sure if we have ever hit this problem, but I'm also not sure that it would be easily detectable.

Allan G (greental)
Changed in monasca:
assignee: nobody → David Schroeder (david-schroeder)
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.