Comment 1 for bug 1069840

Revision history for this message
June Yi (gochist) wrote :

Let's assume that we put metric data like below.

Timestamp / Metric Value
2012-10-23 00:00:02 / 30.0
2012-10-23 00:00:59 / 10.0
2012-10-23 00:02:01 / 11.5
2012-10-23 00:03:03 / 14.2

Currently, Synaps aggregates those raw metric data into the dataframe which has 1 minute resolution.

Timestamp / SampleCount / Average / Min / Max / Sum
2012-10-23 00:00:00 / 2 / 20.0 / 10.0 / 30.0 / 40.0
2012-10-23 00:01:00 / NaN / NaN / NaN / NaN / NaN
2012-10-23 00:02:00 / 1 / 11.5 / 11.5 / 11.5 / 11.5
2012-10-23 00:03:00 / 1 / 14.2 / 14.2 / 14.2 / 14.2

Synaps will evaluate alarms based on the dataframe above, appling rolling functions that is provided by pandas.

When it rolls up the SampleCount data using 'rolling_count' with 2 minutes of window, the result will be like below.
I call it 'Rolling Sample Count'.

result of rolling count (window: 2min)
2012-10-23 00:00:00 / 1
2012-10-23 00:01:00 / 1
2012-10-23 00:02:00 / 1
2012-10-23 00:03:00 / 2

And if it rolls them up using 'rolling_sum' with 2 minutes of window, the result will be like below.
Before rolling, I filled NaN as 0.
I call it 'Total Sample Count'

result of rolling sum (window: 2min)
2012-10-23 00:00:00 / 2
2012-10-23 00:01:00 / 2
2012-10-23 00:02:00 / 1
2012-10-23 00:03:00 / 2

I think both kinds of sample count are valuable.

'Total Sample Count' tells total sample counts of raw metric data.
'Rolling Sample Count' tells total sample counts of aggregated data.

Currently, Synaps provides 'Total Sample Count' for SampleCount.
That's why it uses rolling sum function for Sample Count.