Rally report durations is difficult to understand and potentially incorrect

Bug #1607804 reported by Alex Krzos
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Rally
Invalid
Undecided
Unassigned

Bug Description

Looking at a rally report for a run with failed durations it is difficult to understand the graph and the aggregated metrics on the response timings.

Issues I am finding:

1. Under "Total durations" for scenarios which have more than one timed action, and the second action has count < first action count still shows a total of the first action:
 Ex:
Action ...(metrics)... count
ceilometer.create_meter: ... 2000
ceilometer.get_stats ... 1907
total ... 2000

* Should the last part of the table still display 2000?

* Do note that I understand the success % is used in conjunction with the count to show the number of failures. *

2. I added an SLA and the Max displayed of on the SLA is > than the max displayed in the "total durations" table. Are metrics in the "total durations" tables purely calculated from successful atomic tasks or does it include failed tasks? (Some failures occur much faster than the actual expected response thus failures could display faster response timings.) Why would the max iteration timing be displayed in the SLA area but not reflected in the "total durations" table?

In short is there better documentation explaining how to read the rally report format to lead to less confusion?

Revision history for this message
Alex Krzos (akrzos) wrote :
Revision history for this message
Alexander Maretskiy (maretskiy) wrote :

You are right that results in report are not obvious, however this bug looks like feature request, so I would insist to mark it as `Invalid'.
It would be nice to have feature request submitted instead (https://github.com/openstack/rally/tree/master/doc/feature_request).
Also, you can participate on our weekly IRC meeting and discuss any problems and ideas (https://wiki.openstack.org/wiki/Meetings/Rally).

There is no descriptive report explanation in documentation - this is what I'm actually working on.
I believe that descriptive chapter will appear soon.

Another problem is that iteration result does not save data about what exact atomic action is failed, the only data is whether iteration is failed (some exception is raised anywhere) or not. That is the main reason why `Success' value looks strange.
This is not a bug since this is explicitly implemented, however this must be improved.

Regarding your questions (of course, this will be also reflected in upcoming documentation chapter):

 We have results in the table:

 Action Success Count
 ceilometer.create_meter 91.8 % 2000
 ceilometer.get_stats 96.3 % 1907
 total 91.8 % 2000

 Also we know that there are 163 failures (so 163 iterations of 2000 has some exception raised)
 and we also know that scenario include two atomic actions: create_meter and get_stats (follows immediately after create_meter).

 After invesigation of Failures tab, we can see that 93 exceptions have been raised by create_meter
 and 70 by get_stats.

 >> * Should the last part of the table still display 2000?

 Yes.

 Since create_meter is the first action, it have been run in each iteration, so we have Count 2000
 for ceilometer.create_meter. No matter whether the action is successful, the Count shows how many
 times it is actually run.

 Since in 2000 iteration create_meter has raised exception 93 times, another action `get_stats'
 in these 93 iterations has not been started - that is why get_stats has Count 1907.

 Now, the whole number of iterations started (no matter if there were successful or not) is 2000
 so there is total Count 2000.
 Agree, word "total" looks confusing since "total" usually means summarized results.

 >> * Do note that I understand the success % is used in conjunction
    with the count to show the number of failures. *

 Correct, but values look really confusing.

 Since iteration failure is saved for the whole iteration, not for atomics, we have the following results:

 Total Success is 91.8 % - this means that there are 91.8 % of 2000 iterations have run without any exception,
 so the rest 8.2 % is our 163 errors (both failed create_meter and get_stats).

 Create_meter Success is also 91.8 % because this action runs first is scenario so it starts in any case and
 its success is bound with the whole iteration.

 Get_stats has run only 1907 times and has raised exceptions 93 times - that is why its Success is 96.3 %

Changed in rally:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.