Comment 2 for bug 461269

Revision history for this message
Gary Poster (gary) wrote : Re: oops reports should be grouped by oops signature not exception type and exception value

As I discussed with Diogo, I am not convinced that this will be an improvement. We would like better OOPS grouping (into "infestations"), but this proposal simply makes the grouping more granular along a particular axis. In discussion, Diogo had no data analysis to indicate that this would in fact result in better grouping. It is another heuristic, and good heuristics are a matter of statistics, and we don't have any statistics.

I think working on OOPS grouping will involve two components. The first will be to try and identify better "first guess" heuristics by doing some OOPS analyses. The second will be to provide a way to dynamically teach the oops tools to group things differently when the heuristic, inevitably, fails. That might be ways to link OOPS signatures into a single group, bayesian approaches, regexes, or something else.

One idea: we come up with a number of axes for signatures, like exception type, exception value, pageid, normalized full traceback, and so on. Each axis has a unique weight. An infestation is identified, by default, with only a couple of axes (perhaps exception type and normalized exception value, as it does now). It links to the collected values in the other axes that it contains. When you want to teach the oops tools that an oops should be grouped differently, you can change the axes used for identifying an infestation, and you can specify one or more matching values. There is a constraint that infestation signature rules cannot overlap (two cannot match the same exact signature). When you get a new OOPS, you find the infestations that match the different axes. An infestation with more matches wins. If an infestation has the same number of matches, the one with the heaviest different axis wins. It's an idea; maybe not a good one. It's pretty manual, for one thing, and would require a non-trivial amount of work.

Anyway, this is not a clear-cut problem in my mind.