Rally report durations is difficult to understand and potentially incorrect
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Rally |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Looking at a rally report for a run with failed durations it is difficult to understand the graph and the aggregated metrics on the response timings.
Issues I am finding:
1. Under "Total durations" for scenarios which have more than one timed action, and the second action has count < first action count still shows a total of the first action:
Ex:
Action ...(metrics)... count
ceilometer.
ceilometer.
total ... 2000
* Should the last part of the table still display 2000?
* Do note that I understand the success % is used in conjunction with the count to show the number of failures. *
2. I added an SLA and the Max displayed of on the SLA is > than the max displayed in the "total durations" table. Are metrics in the "total durations" tables purely calculated from successful atomic tasks or does it include failed tasks? (Some failures occur much faster than the actual expected response thus failures could display faster response timings.) Why would the max iteration timing be displayed in the SLA area but not reflected in the "total durations" table?
In short is there better documentation explaining how to read the rally report format to lead to less confusion?
You are right that results in report are not obvious, however this bug looks like feature request, so I would insist to mark it as `Invalid'. /github. com/openstack/ rally/tree/ master/ doc/feature_ request). /wiki.openstack .org/wiki/ Meetings/ Rally).
It would be nice to have feature request submitted instead (https:/
Also, you can participate on our weekly IRC meeting and discuss any problems and ideas (https:/
There is no descriptive report explanation in documentation - this is what I'm actually working on.
I believe that descriptive chapter will appear soon.
Another problem is that iteration result does not save data about what exact atomic action is failed, the only data is whether iteration is failed (some exception is raised anywhere) or not. That is the main reason why `Success' value looks strange.
This is not a bug since this is explicitly implemented, however this must be improved.
Regarding your questions (of course, this will be also reflected in upcoming documentation chapter):
We have results in the table:
Action Success Count create_ meter 91.8 % 2000 get_stats 96.3 % 1907
ceilometer.
ceilometer.
total 91.8 % 2000
Also we know that there are 163 failures (so 163 iterations of 2000 has some exception raised)
and we also know that scenario include two atomic actions: create_meter and get_stats (follows immediately after create_meter).
After invesigation of Failures tab, we can see that 93 exceptions have been raised by create_meter
and 70 by get_stats.
>> * Should the last part of the table still display 2000?
Yes.
Since create_meter is the first action, it have been run in each iteration, so we have Count 2000 create_ meter. No matter whether the action is successful, the Count shows how many
for ceilometer.
times it is actually run.
Since in 2000 iteration create_meter has raised exception 93 times, another action `get_stats'
in these 93 iterations has not been started - that is why get_stats has Count 1907.
Now, the whole number of iterations started (no matter if there were successful or not) is 2000
so there is total Count 2000.
Agree, word "total" looks confusing since "total" usually means summarized results.
>> * Do note that I understand the success % is used in conjunction
with the count to show the number of failures. *
Correct, but values look really confusing.
Since iteration failure is saved for the whole iteration, not for atomics, we have the following results:
Total Success is 91.8 % - this means that there are 91.8 % of 2000 iterations have run without any exception,
so the rest 8.2 % is our 163 errors (both failed create_meter and get_stats).
Create_meter Success is also 91.8 % because this action runs first is scenario so it starts in any case and
its success is bound with the whole iteration.
Get_stats has run only 1907 times and has raised exceptions 93 times - that is why its Success is 96.3 %