inappropriate statistical parameters in ping output

Bug #1376606 reported by Steffen Michalek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
iputils (Ubuntu)
New
Undecided
Unassigned

Bug Description

The output of the ping command gives to the user several statistical paramters of the measured values (seen, in statistical sense, as a sample of a statistical population) e.g.
> rtt min/avg/max/mdev = 423.152/728.492/1306.341/220.001 ms, pipe 2

One of them is surely not appropriate in order to describe data in that case, one is probably inappropriate.

First, the mean of a sample is a good estimator (in statistical sence) for the mean of an underlying statistical distribution (in statistical sense).
But, this is only true for statistical distributions that do possess a so-called "first moment" (or "expectation value") or mean.

Some do not. And for that cases, giving the mean of the sample is misleading, because it is only an unreliable, fluctuating property of the random sample - and not of the statistical population!
The mean of the random sample does not converge (e.g. with increasing sample size) to a location paramter of the underlying population or distribution.
An user will interprete the given value as information of some kind of "middle" of the latencies, that will occure in the data conneciton. And this interpretation is wrong. Therefor, the statistical parameter "avg", mean of the sample, is misleading and therefor inappropriate.

Latency measurements are a standard case, where distrubutions occure, that do not possess first moments or expectation values (or, at least, do contain a large amount of outliers).
In such cases, the more robust (and easier) measure of location, called "median" should be used, see
http://en.wikipedia.org/wiki/Median
http://en.wikipedia.org/wiki/File:Comparison_mean_median_mode.svg

(As a second reason, the skew of the latency measurements also indicates, that a sample mean is not a good choice for an estimator for the measure of location of the distribution.)

Second, a better measure of dispersion should be used. Wikipedia:
"When the median is used as a location parameter in descriptive statistics, there are several choices for a measure of variability: the range, the interquartile range, the mean absolute deviation, and the median absolute deviation."

I would argue for the median absolute deviation.
(I wrote "probalby inappropriate", because "mdev" does not indicate a specific statistical technical term, so I do not know, what ist calculated. If it is "(square root of) sample variance" or "estimator for standard deviation", then it is surely inappropriate.)

Ubuntu release: 14.04.1 LTS
iputils-ping: 3:20121221-4ubuntu1.1

summary: - inappropriate statistical paramters in ping output
+ inappropriate statistical parameters in ping output
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.