PercentileComputation sometimes gives incorrect result

Bug #1569416 reported by Alexander Maretskiy on 2016-04-12
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Rally
Undecided
Unassigned

Bug Description

See the code below and result of its execution.
Sometimes we have up to 80% difference with real percentile result.
This happens when we have very long list of mixed small and big values.

$ cat percentile_issue.py
from rally.common import streaming_algorithms as st

data = (
    list(range(10)),
    list(range(10)) * 10,
    list(range(10)) * 100,
    list(range(10)) * 1000,
    list(range(10)) * 10000,
    [1, 2, 3, 4, 99999] * 10000,
)

for lst in data:
    p = st.PercentileComputation(.95, len(lst))
    for i in lst:
        p.add(i)

    streaming = p.result()
    real = st.utils.percentile(lst, .95)

    diff = float(abs(real - streaming)) / max(real, streaming) * 100
    if diff > 5:
        print "%-8.2f %-8.2f Differs by %.1f%%" % (real, streaming, diff)
    else:
        print "%-8.2f %.2f" % (real, streaming)

$ python percentile_issue.py
8.55 8.55
9.00 9.00
9.00 9.00
9.00 9.00
9.00 4.50 Differs by 50.0%
99999.00 20001.80 Differs by 80.0%

description: updated
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers