ceilometer-compute service terminates in case of exception

Bug #1218889 reported by Stefano Zilli
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ceilometer
Fix Released
High
Harri Hämäläinen
Grizzly
Fix Released
High
Mathieu Gagné

Bug Description

In case of exception during the poll_and_publish call ceilometer-compute service terminates. It should be better to manage the exception and keep the service alive as an outage of another service, like keystone, could lead to the death of all the running ceilometer agents.

gordon chung (chungg)
Changed in ceilometer:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ceilometer (master)

Fix proposed to branch: master
Review: https://review.openstack.org/46203

Changed in ceilometer:
assignee: nobody → Harri Hämäläinen (hhamalai)
status: Confirmed → In Progress
Revision history for this message
Harri Hämäläinen (hhamalai) wrote :

Pollsters are handling exceptions as should, but nova_client can still raise an exception which will finally cause the termination of Ceilometer compute agent. This might happen as said due to an outage of another service or temporary fault in system.

The abstract problem here is how Ceilometer should react in this case. No measurements can be done on host if Ceilometer cannot query the host information. On the other hand no measurements can be done if the Ceilometer compute agent is down either, so I prefer the persistency of the Ceilometer compute agent and logging of the error condition.

Julien Danjou (jdanjou)
Changed in ceilometer:
milestone: none → havana-rc1
importance: Undecided → High
Mathieu Gagné (mgagne)
tags: added: grizzly-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ceilometer (master)

Reviewed: https://review.openstack.org/46203
Committed: http://github.com/openstack/ceilometer/commit/0e01da763ab7b3783ae86aa6582c38ebdc6e2378
Submitter: Jenkins
Branch: master

commit 0e01da763ab7b3783ae86aa6582c38ebdc6e2378
Author: Harri Hämäläinen <email address hidden>
Date: Thu Sep 12 09:34:34 2013 +0300

    Catch exceptions from nova client in poll_and_publish

    Ceilometer compute agent dies if nova client raises an exception while it is
    retrieving server instances. This might happen e.g. when some OpenStack API is
    temporarily unavailable

    Fixes LP Bug #1218889

    Change-Id: I808dcfae18d23240f8e095d6c97c8dede7dede8f

Changed in ceilometer:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ceilometer (stable/grizzly)

Fix proposed to branch: stable/grizzly
Review: https://review.openstack.org/46793

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ceilometer (stable/grizzly)

Reviewed: https://review.openstack.org/46793
Committed: http://github.com/openstack/ceilometer/commit/b2fe778c66d06d25f515cbbcb7da61939d68fc28
Submitter: Jenkins
Branch: stable/grizzly

commit b2fe778c66d06d25f515cbbcb7da61939d68fc28
Author: Harri Hämäläinen <email address hidden>
Date: Thu Sep 12 09:34:34 2013 +0300

    Catch exceptions from nova client in poll_and_publish

    Ceilometer compute agent dies if nova client raises an exception while it is
    retrieving server instances. This might happen e.g. when some OpenStack API is
    temporarily unavailable

    Conflicts:
     tests/agentbase.py
     tests/compute/test_manager.py

    Fixes LP Bug #1218889
    Change-Id: I808dcfae18d23240f8e095d6c97c8dede7dede8f
    (cherry picked from commit 0e01da763ab7b3783ae86aa6582c38ebdc6e2378)

tags: added: in-stable-grizzly
Thierry Carrez (ttx)
Changed in ceilometer:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in ceilometer:
milestone: havana-rc1 → 2013.2
Alan Pevec (apevec)
tags: removed: grizzly-backport-potential in-stable-grizzly
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.