A lot of unneeded events/signals send when using autoscaling group

Bug #1445361 reported by Marcin Zbik
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
Medium
Angus Salkeld
Kilo
Fix Released
Medium
Angus Salkeld

Bug Description

When using autoscaling group with alarms which have short check time, for example: check CPU usage every one minute Heat (Ceilometer?) generates huge amount of events (in horizon) like:
"alarm state changed from alarm to alarm (Remaining as alarm due to 1 samples outside threshold, most recent: 0.294406779661)"

What is more heat sends (AMQP) notification "Orchestration Event: Autoscaling end" every minute, like that information above.

All of this generates a lot of redundant, and really unnecessary data.

Revision history for this message
Angus Salkeld (asalkeld) wrote :
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Could you provide your templates, and a reason why your alarms might be triggering constantly? Alarms should only be triggered when something changes.

Changed in heat:
status: New → Incomplete
Revision history for this message
Marcin Zbik (zbikmarc+launchpad) wrote :

I will do it but have some problems with environment. Need to fix it first.

Revision history for this message
Marcin Zbik (zbikmarc+launchpad) wrote :

heat_template_version: '2013-05-23'
resources:
  alarm_1:
    properties:
      alarm_actions:
      - get_attr: [server_1_scaling_1, alarm_url]
      comparison_operator: gt
      evaluation_periods: 1
      meter_name: cpu_util
      period: 60
      query:
      - field: metadata.user_metadata.stack_id
        op: eq
        value: {get_param: 'OS::stack_id'}
      - {field: metadata.user_metadata.resource_name, op: eq, value: server_1}
      statistic: avg
      threshold: 80
    type: OS::Ceilometer::Alarm
  alarm_2:
    properties:
      alarm_actions:
      - get_attr: [server_1_scaling_2, alarm_url]
      comparison_operator: lt
      evaluation_periods: 1
      meter_name: cpu_util
      period: 60
      query:
      - field: metadata.user_metadata.stack_id
        op: eq
        value: {get_param: 'OS::stack_id'}
      - {field: metadata.user_metadata.resource_name, op: eq, value: server_1}
      statistic: avg
      threshold: '20'
    type: OS::Ceilometer::Alarm
  server_1:
    properties:
      desired_capacity: 1
      max_size: 10
      min_size: 1
      resource:
        properties:
          flavor: m1.small
          image: precise-server-cloudimg-amd64
          key_name: id_rsa
          metadata:
            metering.resource_name: server_1
            metering.stack_id: {get_param: 'OS::stack_id'}
          networks:
          - {network: d5dc235a-5373-4aaa-bf76-88ffbbb7295c}
          user_data: {get_resource: config}
          user_data_format: RAW
        type: OS::Nova::Server
    type: OS::Heat::AutoScalingGroup
  server_1_scaling_1:
    properties:
      adjustment_type: change_in_capacity
      auto_scaling_group_id: {get_resource: server_1}
      cooldown: 300
      scaling_adjustment: 1
    type: OS::Heat::ScalingPolicy
  server_1_scaling_2:
    properties:
      adjustment_type: change_in_capacity
      auto_scaling_group_id: {get_resource: server_1}
      cooldown: 300
      scaling_adjustment: '-1'
    type: OS::Heat::ScalingPolicy

Revision history for this message
Marcin Zbik (zbikmarc+launchpad) wrote :

Even with cooldown set to 300 stack event is created every 60 sec. (when CPU usage is lower than 20%)

Changed in heat:
status: Incomplete → Triaged
importance: Undecided → Medium
Angus Salkeld (asalkeld)
Changed in heat:
assignee: nobody → Angus Salkeld (asalkeld)
milestone: none → liberty-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/180375

Changed in heat:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/180375
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=6a48c45bfd189086017e437c01b0bf7c7d3eb0fe
Submitter: Jenkins
Branch: master

commit 6a48c45bfd189086017e437c01b0bf7c7d3eb0fe
Author: Angus Salkeld <email address hidden>
Date: Wed May 6 11:28:44 2015 +1000

    Don't create events when signals don't perform an action

    If we get repeat signals that cause no concrete actions
    this can cause large quantities of senseless events that
    suppress useful events.

    Change-Id: I79374d27648319f241f36ab041784fab37823ddb
    Closes-bug: #1445361

Changed in heat:
status: In Progress → Fix Committed
tags: added: kilo-backport-potential
Thierry Carrez (ttx)
Changed in heat:
status: Fix Committed → Fix Released
Angus Salkeld (asalkeld)
tags: removed: kilo-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/225536

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/kilo)

Reviewed: https://review.openstack.org/225536
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=0fa81c2f950a37328b9386e2683580c9f5d48ebc
Submitter: Jenkins
Branch: stable/kilo

commit 0fa81c2f950a37328b9386e2683580c9f5d48ebc
Author: Angus Salkeld <email address hidden>
Date: Fri Sep 25 11:13:41 2015 +1000

    Don't create events when signals don't perform an action

    If we get repeat signals that cause no concrete actions
    this can cause large quantities of senseless events that
    suppress useful events.

    Change-Id: I79374d27648319f241f36ab041784fab37823ddb
    Closes-bug: #1445361
    (cherry picked from commit 6a48c45bfd189086017e437c01b0bf7c7d3eb0fe)

Thierry Carrez (ttx)
Changed in heat:
milestone: liberty-1 → 5.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.