Rally task detailed 95%ile Response Times different from HTML graphics report

Bug #1510175 reported by Alex Krzos
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Rally
Fix Released
High
Alexander Maretskiy

Bug Description

Viewing a task's response timings with rally task detailed or upon task completion shows a different 95%ile response timing than after plotting HTML graphics with rally task report.

Example:

# rally task detailed ac5f0429-396a-40e8-86bb-52c44ece3184
...
+-----------------------------------------------------------------------------+
| Response Times (sec) |
+--------+-------+--------+--------+--------+-------+-------+---------+-------+
| action | min | median | 90%ile | 95%ile | max | avg | success | count |
+--------+-------+--------+--------+--------+-------+-------+---------+-------+
| total | 0.166 | 0.517 | 1.544 | 1.938 | 3.616 | 0.715 | 100.0% | 5000 |
+--------+-------+--------+--------+--------+-------+-------+---------+-------+
...

vs

# rally task report ac5f0429-396a-40e8-86bb-52c44ece3184 --out output.html; cat output.html | grep "95%ile"
...
"table": {"rows": [["total", 0.
.544, 1.941, 3.616, 0.715, "100.0%", 5000.0]], "cols": ["Action", "Min (sec)", "Median (sec)", "90%ile (sec)", "95%ile (sec)", "Max (sec)", "Avg (sec)", "Success", "Count"]},
...

In the above example, rally task detailed shows a 95%ile response timing of 1.938 and rally task report ends up showing a 95%ile response timing of 1.941.

Revision history for this message
Alex Krzos (akrzos) wrote :
Revision history for this message
Alex Krzos (akrzos) wrote :
Revision history for this message
Alexander Maretskiy (maretskiy) wrote :

This is a kind of known issues and the root cause of that is the fact that cli and html report use different approaches of calculating statistics.

Cli command processes the whole data at once (old manner) but html report generation uses streaming processing (it gives benefits in case of huge amount of data). However small difference is possible in percentile values, which is not critical in most cases.

Revision history for this message
Joe Talerico (jtaleric) wrote :

@maretskiy Sure small percentage, but it gives the user doubt in the results. Why can't the approaches to presenting results be the same?

Revision history for this message
Andriy Kurilin (andreykurilin) wrote :

Agreed with jtaleric

Changed in rally:
status: New → Triaged
importance: Undecided → Low
Revision history for this message
Roger Lopez (r-lopez) wrote :

To add,

it is not only the small percentage differences but seeing flip flop in success rate. For example, in the following did the following NovaServers.boot_and_list_server scenario and log reports 100% success of nova.boot_server and 47.5% success of nova.list_servers, however, the HTML reports the opposite.

Log:

+------------------------------------------------------------------------------------------------+
| Response Times (sec) |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
| action | min | median | 90%ile | 95%ile | max | avg | success | count |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
| nova.boot_server | 86.67 | 328.305 | 337.028 | 338.162 | 339.256 | 272.641 | 100.0% | 183 |
| nova.list_servers | 3.337 | 4.649 | 5.227 | 5.506 | 6.123 | 4.659 | 47.5% | 183 |
| total | 91.243 | 208.544 | 291.381 | 326.766 | 334.988 | 209.405 | 47.5% | 183 |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
Load duration: 339.872634888
Full duration: 1070.80222201

HTML snippet attached.

Changed in rally:
importance: Low → High
Revision history for this message
Alexander Maretskiy (maretskiy) wrote :

The most accurate results are from "rally task detailed", but output of "rally task report" processes data in streaming manner, so arbitrary amount of data can be processed - this is important for cases when we have to process millions of iterations (this can be done successfully, even with low memory usage).

So preferrable way is one used in "rally task report", however it is not accurate right now.

This bug should be fixed y improving this streaming computation algorithm more accurate:

https://github.com/openstack/rally/blob/master/rally/common/streaming_algorithms.py#L154-L181

Revision history for this message
Kyle Jorgensen (kyle-jorgensen) wrote :

I am also experiencing an issue where the 'rally task detailed' results don't display all the iterations of my scenario execution, however the the 'rally task report' HTML/JSON output does display all the data. Would that be related to this bug?

Changed in rally:
assignee: nobody → Alexander Maretskiy (maretskiy)
Revision history for this message
Alexander Maretskiy (maretskiy) wrote :
Changed in rally:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.