Ceilometer

Aggregation transformer sometimes produces incorrect values

Bug #1539163 reported by Dan Travis on 2016-01-28

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Ceilometer	Fix Released	Undecided	Dan Travis

Bug Description

When chaining the aggregation transformer with the rate of change transformer, there appears to be a case where ceilometer will produce incorrectly transformed samples.

This issue was discovered while attempting to forward high rate data to an external system, while collecting aggregated data in Ceilometer's storage system. The attached pipeline.yaml below demonstrates this type of configuration.

This appears to occur due to the fact that the aggregation transformer uses the timestamp of the first sample received for a meter type rather than the last. Consider this test case (with an explanation in comments):

class AggregatorTransformerTestCase(base.BaseTestCase):
    SAMPLE = sample.Sample(
        name='cpu',
        type=sample.TYPE_CUMULATIVE,
        unit='ns',
        volume='1234567',
        user_id='56c5692032f34041900342503fecab30',
        project_id='ac9494df2d9d4e709bac378cceabaf23',
        resource_id='1ca738a1-c49c-4401-8346-5c60ebdb03f4',
        timestamp="2015-10-29 14:12:15.485877+00:00",
        resource_metadata={}
    )

    def test_rate_of_change_calculation_two_resources(self):
        resource_id = ['1ca738a1-c49c-4401-8346-5c60ebdb03f4',
                       '5dd418a6-c6a9-49c9-9cef-b357d72c71dd']

aggregator = conversions.AggregatorTransformer(size="2")
rate_of_change_transformer = conversions.RateOfChangeTransformer()

sample_time = timeutils.parse_isotime('2016-01-01T12:00:00+00:00')

        for offset in range(2):
            sample = copy.copy(self.SAMPLE)
            sample.timestamp = timeutils.isotime(sample_time)
            sample.resource_id = resource_id[0]
            sample.volume = offset
            aggregator.handle_sample(context.get_admin_context(), sample)

sample_time = sample_time + datetime.timedelta(0, 1)

        aggregated_samples = aggregator.flush(context.get_admin_context())
        self.assertEqual(len(aggregated_samples), 1)
        """
        aggregated_samples[0] contains a sample with the value from the second
        sample aggregated, but the timestamp from the first sample.
        """
        rate_of_change_transformer.handle_sample(context.get_admin_context(),
                                                 aggregated_samples[0])

        for offset in range(2):
            sample = copy.copy(self.SAMPLE)
            sample.timestamp = timeutils.isotime(sample_time)
            sample.resource_id = resource_id[offset]
            sample.volume = 2
            aggregator.handle_sample(context.get_admin_context(), sample)

sample_time = sample_time + datetime.timedelta(0, 1)

        aggregated_samples = aggregator.flush(context.get_admin_context())
        self.assertEqual(len(aggregated_samples), 2)
        """
        aggregated_samples contains two samples. One from our second
        resource, and one from our first resource.
        """

        for sample in aggregated_samples:
            if sample.resource_id == resource_id[0]:
                rateOfChange = rate_of_change_transformer.handle_sample(
                    context.get_admin_context(), sample)

        """
        Given that for each sample the time was incremented by 1 second and
        the volume was incremented by 1, you would expect a rate of change
        of 1. However, the value 0.5 is calculated. This is because the
        first aggregated point passed to the rate of change transformer
        containted the timestamp from the first point and the value from the
        second. This effected the time delta used in the rate of change
        calculation, and produced an incorrect value.
        """
        self.assertEqual(rateOfChange.volume, 1)

I believe there's a fairly straightforward resolution to this. Basically, add an additional parameter to the constructor to allow the user to selectively include the timestamp from either the first or last sample provided to the aggregation transformer.

Revision history for this message

Dan Travis (datravis) wrote on 2016-01-28:

example_pipeline.yaml Edit (1.3 KiB, text/plain)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-28: Fix proposed to ceilometer (master)

Fix proposed to branch: master
Review: https://review.openstack.org/273672

Changed in ceilometer:
assignee:	nobody → Dan Travis (datravis)
status:	New → In Progress

Revision history for this message

Dan Travis (datravis) wrote on 2016-01-28:

This was found on Master, but I believe this effects Liberty as well.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-02-17: Fix merged to ceilometer (master)

Reviewed: https://review.openstack.org/273672
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=501143062b82da180b0f9c40e4a505d833892e92
Submitter: Jenkins
Branch: master

commit 501143062b82da180b0f9c40e4a505d833892e92
Author: Dan Travis <email address hidden>
Date: Thu Jan 28 16:36:27 2016 +0000

Adds timestamp option to Aggregation transformer

    Adds an argument to the Aggregation transformer constructor
    that allows a user to specify whether to include the timestamp
    from either the first or last sample received for a given
    aggregated sample.

    This addresses an issue with transformer chaining where incorrect
    values will sometimes be produced by the Rate of Change
    transformer when chaining the Aggregation transformer with the
    Rate of Change transformer.

Change-Id: Ib163a80a7e6ddaf58d7cc555fb4f4d87d570b1a1
Closes-Bug: #1539163