Comment 0 for bug 1569416

Revision history for this message
Alexander Maretskiy (maretskiy) wrote :

See the code below and result of its execution.
Sometimes we have up to 80% difference with real percentile result.
This happens when we have very long list of mixed small and big values.

$ cat percentile_issue.py
from rally.common import streaming_algorithms as st

data = (
    list(range(10)),
    list(range(10)) * 10,
    list(range(10)) * 100,
    list(range(10)) * 1000,
    list(range(10)) * 10000,
    [1, 2, 3, 4, 99999] * 10000,
)

for lst in data:
    p = st.PercentileComputation(.95, len(lst))
    for i in lst:
        p.add(i)

    streaming = p.result()
    real = st.utils.percentile(lst, .95)

    diff = float(abs(real - streaming)) / max(real, streaming) * 100
    if diff > 5: # <= 1% is Ok
        print "%-8.2f %-8.2f Differs by %.1f%%" % (real, streaming, diff)
    else:
        print "%-8.2f %.2f" % (real, streaming)

$ python percentile_issue.py
8.55 8.55
9.00 9.00
9.00 9.00
9.00 9.00
9.00 4.50 Differs by 50.0%
99999.00 20001.80 Differs by 80.0%