Errors

Bug #1069827
Comment #4

Comment 4 for bug 1069827

Revision history for this message

Jef Spaleta (jspaleta) wrote on 2012-10-26:

"If Ubuntu's install base doubled overnight and nothing else changed, the number of reports per day would double too, but the real error rate would be identical. What we need is a calculation that would reflect that as closely as possible."

Right... which is why I'm suggesting a surrogate for install base. The calculation you want to perform in your own words is error rate normalized to some time evolving population. The dominator must be sensitive to spikiness on the same time scale as the numerator when you normalize. The numerator must capture the release spikes. Not perfectly, but with the same general spectral content.

Normalizing to any running average is going to make you more sensitive to the numerator's spikiness. The 90 day running average his not the correct demoniator for what you want for what you want to achieve. You have to find a surrgate for installbase that tracks the spikiness of the installbase size. And the discounting idea, doesn't solve the underlying problem. You want to normalize against something the captures the spikiness of the number of machines in the wild who can report. A weighted average of previously reporting machines will not capture that..no matter how complicated you make it.

Again I would ask you to step back and define what you are trying to achieve in a metric. Why are the spikiness in the unnormalized error rate bad for you? When do expect the normalize error rate to actually show an increase instead of a flat line? You are bundling in a lot of assumptions into that running average methodology. If you are building a methodology to give the curve to give you exactly what you expect for no other reason than you expect it, you aren't building a valid methodology.

There maybe something more esoteric that you can do with a matched spectral filter ( averaging is just a flat spectral filter) if you can come up with an expected response over a release life cycle you could build a filtered response based on that expectation.
If you could provide me with the error rate data from the full 11.10 cycle I could use it to generate a spectral filter and see what happens to the 12.04 data. But even this can only be used to ask questions about how one cycle compares to another..because its expectation based normalization. Which is not what I think you want to measure, though its still not clear to me what you want. All I know for sure is, the running average of historica data is not going to capture spikiness, averaging smooths..its what it does.

-jef

Right... which is why I'm suggesting a surrogate for install base.  The calculation you want to perform in your own words is error rate normalized to some time evolving population. The dominator must be sensitive to spikiness on the same time scale as the numerator when you normalize. The numerator must capture the release spikes.  Not perfectly, but with the same general spectral content.

Normalizing to any running average is going to make you more sensitive to the numerator's spikiness.  The 90 day running average his not the correct demoniator for what you want for what you want to achieve. You have to find a surrgate for installbase that tracks the spikiness of the installbase size.  And the discounting idea, doesn't solve the underlying problem. You want to normalize against something the captures the spikiness of the number of machines in the wild who can report. A weighted average of previously reporting machines will not capture that..no matter how complicated you make it.

Again I would ask you to step back and define what you are trying to achieve in a metric. Why are the spikiness in the unnormalized error rate  bad for you?  When do expect the normalize error rate to actually show an increase instead of a flat line?  You are bundling in a lot of assumptions into that running average methodology. If you are building a methodology to give the curve to give you exactly what you expect for no other reason than you expect it, you aren't building a valid methodology.

There maybe something more esoteric that you can do with a matched spectral filter ( averaging is just a flat spectral filter) if you can come up with an expected response over a release life cycle you could build a filtered response based on that expectation. 
If you could provide me with the error rate data from the full 11.10 cycle I could use it to generate a spectral filter and see what happens to the 12.04 data. But even this can only be used to ask questions about how one cycle compares to another..because its expectation based normalization.  Which is not what I think you want to measure, though its still not clear to me what you want. All I know for sure is, the running average of historica data is not going to capture spikiness, averaging smooths..its what it does.

-jef