notifications.error should be consumed or not be sent

Bug #1364708 reported by ZhiQiang Fan
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Ceilometer
Fix Released
Medium
gordon chung

Bug Description

In a havana environment, I have noticed that there will be a message sent to notifications.error when nova did live-migrate but fail. After tracking the code, I think in stable/icehouse branch, the same issue exists.

In Ceilometer stable/icehouse, we just comsumes messages send to notifications.info topic, see:
https://github.com/openstack/ceilometer/blob/stable/icehouse/ceilometer/compute/notifications/__init__.py#L43

In Nova stable/icehouse, if instance migrate fails, or something other important vm related action fails, a message will be sent to notifications.error, see:
https://github.com/openstack/nova/blob/stable/icehouse/nova/conductor/manager.py#L773
https://github.com/openstack/nova/blob/stable/icehouse/nova/scheduler/utils.py#L101
https://github.com/openstack/oslo.messaging/blob/master/oslo/messaging/notify/notifier.py#L256

The problem is that, Ceilometer has connected to many OpenStack services, if Nova has this issue, other services may too.

I don't know why service sends a notification which will not be consumed, this only makes operator confuse and (may be low but true) adds AMQP's burden.

So my opinion would be:
1) modify Ceilometer's code, let's consume the notifications.{warn, error, critical} besides notications.info, and drop them directly
2) notify other project's contributor, let them stop to send such notifications

ZhiQiang Fan (aji-zqfan)
Changed in ceilometer:
assignee: nobody → ZhiQiang Fan (aji-zqfan)
Revision history for this message
wingwj (wingwj) wrote :

I agree with the first one.
But I don't think dropping them is a good idea. If we dealt with the notifications.info messages, no reason to ignore higher level ones. Maybe we can discuss the topic in ML.

Thanks.

Revision history for this message
ZhiQiang Fan (aji-zqfan) wrote :
Revision history for this message
gordon chung (chungg) wrote :

welcome back ZhiQiang Fan!... i'm not sure it's a bug or not... in most environments, ceilometer is not the only consumer of notifications (ie. could be for 3rd party purposes, to let a project/service other than ceilometer know of error)... i'm not sure which case its for... we may need to check with nova but i would think it's a useful event for ceilometer to consume

Changed in ceilometer:
importance: Undecided → Low
status: New → Triaged
Revision history for this message
gordon chung (chungg) wrote :

hi zqfan, are you planning on working on this item? if not, i can look at it as i'd like to have it in for kilo

Changed in ceilometer:
importance: Low → Medium
Revision history for this message
ZhiQiang Fan (aji-zqfan) wrote :

sure, go ahead

gordon chung (chungg)
Changed in ceilometer:
assignee: ZhiQiang Fan (aji-zqfan) → gordon chung (chungg)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ceilometer (master)

Fix proposed to branch: master
Review: https://review.openstack.org/153362

Changed in ceilometer:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ceilometer (master)

Reviewed: https://review.openstack.org/153362
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=f7ed2c2a16e8dded6bd2258ea0a5e5954a13bada
Submitter: Jenkins
Branch: master

commit f7ed2c2a16e8dded6bd2258ea0a5e5954a13bada
Author: gordon chung <email address hidden>
Date: Thu Feb 5 15:27:38 2015 -0500

    start recording error notifications

    projects (specifically nova) send notifications on error topic when
    an error occurs. we should capture this in events.

    Change-Id: Ic42cbce948b8b409f83934146407b2480602921d
    Closes-Bug: #1364708

Changed in ceilometer:
status: In Progress → Fix Committed
Eoghan Glynn (eglynn)
Changed in ceilometer:
milestone: none → kilo-3
Thierry Carrez (ttx)
Changed in ceilometer:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in ceilometer:
milestone: kilo-3 → 2015.1.0
Revision history for this message
Flávio Ramalho (flaviosr) wrote :

Does this really got fixed?

Running Kilo 2015.1.0 and getting:

# rabbitmqctl list_queues | grep notifications
notifications.error 49
notifications.info 0

Also, there is no consumer for the notifications.error queue.

Revision history for this message
gordon chung (chungg) wrote :

do you have store_events enabled?

Revision history for this message
Flávio Ramalho (flaviosr) wrote :

I do.

Here is my ceilometer.conf

$ cat /etc/ceilometer/ceilometer.conf | egrep -v "^\s*(#|$)"
[DEFAULT]
auth_strategy = keystone
verbose = true
store_events = true
rpc_backend = rabbit
[database]
connection = mongodb://-:-@ceilometer:27017/ceilometer
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_user = ceilometer
admin_password = -
admin_tenant_name = service
[service_credentials]
os_auth_url = http://controller:5000/v2.0
os_username = ceilometer
os_tenant_name = service
os_password = -
os_endpoint_type = internalURL
os_region_name = lsd-0
[publisher]
telemetry_secret = -
[matchmaker_redis]
[matchmaker_ring]
[oslo_messaging_amqp]
[oslo_messaging_qpid]
[oslo_messaging_rabbit]
amqp_durable_queues = true
rabbit_hosts = 192.168.0.6:5672,192.168.0.7:5672
rabbit_userid = openstack
rabbit_password = -
rabbit_ha_queues = true

Revision history for this message
gordon chung (chungg) wrote :

hi Flavio,

store_events should actually be under [notification] section rather than [DEFAULT]. it could also be under [collector] section but this is deprecated.

Revision history for this message
Flávio Ramalho (flaviosr) wrote :

Hi Gordon,

After moving store_events to [notifications], ceilometer-agent-notification started throwing:

ceilometer.openstack.common.threadgroup TypeError: coercing to Unicode: need string or buffer, NoneType found

Putting event_pipeline.yaml, from https://github.com/openstack/ceilometer/blob/stable/kilo/etc/ceilometer/event_pipeline.yaml, in /etc/ceilometer (fixed in https://review.openstack.org/#/c/152525/) solved and everything is working perfectly.

Thanks Gordon.

Revision history for this message
gordon chung (chungg) wrote :

no problem.

i should mention, the error messages are only visible as events, not meters (as there are no (known) measurements in error notifications

Revision history for this message
George (lmihaiescu) wrote :

Hi Gordon,

Is this fix going to be backported to Juno?

Thank you,
George

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to ceilometer (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/345390

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ceilometer (master)

Reviewed: https://review.openstack.org/345390
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=a3eb0fafadcdcbb8f48f3db13ffc103717bae99b
Submitter: Jenkins
Branch: master

commit a3eb0fafadcdcbb8f48f3db13ffc103717bae99b
Author: Mehdi Abaakouk <email address hidden>
Date: Thu Jul 21 14:33:10 2016 +0200

    consumes error notif. when event are disabled

    When we fixes #1364708, this was worked only when store_event=True.

    This change fixes the issue when store_event=False.

    Change-Id: I6748397718be03e3f93ae2ccaa99642decdd9745
    Related-bug: #1364708

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to ceilometer (stable/mitaka)

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/349836

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to ceilometer (stable/liberty)

Related fix proposed to branch: stable/liberty
Review: https://review.openstack.org/349838

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ceilometer (stable/liberty)

Reviewed: https://review.openstack.org/349838
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=09afee9a8fd665097f912defc844bb80e54fc23d
Submitter: Jenkins
Branch: stable/liberty

commit 09afee9a8fd665097f912defc844bb80e54fc23d
Author: Mehdi Abaakouk <email address hidden>
Date: Thu Jul 21 14:33:10 2016 +0200

    consumes error notif. when event are disabled

    When we fixes #1364708, this was worked only when store_event=True.

    This change fixes the issue when store_event=False.

    Change-Id: I6748397718be03e3f93ae2ccaa99642decdd9745
    Related-bug: #1364708
    (cherry picked from commit a3eb0fafadcdcbb8f48f3db13ffc103717bae99b)

tags: added: in-stable-liberty
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ceilometer (stable/mitaka)

Reviewed: https://review.openstack.org/349836
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=54a30501a9406742427b731d4f7fb2469529720e
Submitter: Jenkins
Branch: stable/mitaka

commit 54a30501a9406742427b731d4f7fb2469529720e
Author: Mehdi Abaakouk <email address hidden>
Date: Thu Jul 21 14:33:10 2016 +0200

    consumes error notif. when event are disabled

    When we fixes #1364708, this was worked only when store_event=True.

    This change fixes the issue when store_event=False.

    Change-Id: I6748397718be03e3f93ae2ccaa99642decdd9745
    Related-bug: #1364708
    (cherry picked from commit a3eb0fafadcdcbb8f48f3db13ffc103717bae99b)

tags: added: in-stable-mitaka
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.