[17.11][ocata] Host power off event does not get passed from nova-os-api to ceilometer and, in turn, to aodh | nova-compute <-> designate relation breaks event delivery to ceilometer
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Nova Compute Charm |
Fix Released
|
Undecided
|
Dmitrii Shcherbakov |
Bug Description
A bundle is identical to this:
nova-cc -> ceilometer-
After consecutively powering an instance on and off I did not get an event passed to ceilometer-
Tried on two clouds (a downsized non-HA cloud with 5 machines and HTTPS on and a full HA cloud without HTTPS)
Alarm definition:
openstack alarm create --type event --name instance_off --description 'Instance powered OFF' --event-type "compute.
Because of that alarm never went from "insufficient data" to "alarm" state (https:/
+------
| alarm_id | type | name | state | severity | enabled |
+------
| 7db14634-
+------
After doing
juju config nova-cloud-
and retrying I successfully got an alarm to the proper state:
openstack alarm list
+------
| alarm_id | type | name | state | severity | enabled |
+------
| 7db14634-
+------
Seems like something is not reloaded or not set up properly and events do not get to ceilometer at all.
messagingv2 is present in nova.conf
I am deploying a dummy environment to reproduce it for the third time. I verified this functionality with the previous charm release and a bundle in the "spell" below so it might be a regression.
Steps:
sudo add-apt-repository -y cloud-archive:ocata
sudo apt update && sudo apt install -yqq python-
conjure-up dshcherb/
#!/usr/bin/env bash
export OS_AUTH_URL=http://`juju run --unit keystone/0 "unit-get private-
export OS_REGION_
export OS_PROJECT_
export OS_PROJECT_
export OS_USER_
export OS_USERNAME=admin
export OS_PASSWORD=
export OS_INTERFACE=public
export OS_IDENTITY_
export OS_AUTH_
openstack flavor create --public small --id auto --ram 512 --disk 1 --vcpus 2
openstack server create --image xenial-lxd --nic net-id=ubuntu-net --flavor small testsrv --key-name ubuntu-keypair
openstack alarm create --type event --name instance_off --description 'Instance powered OFF' --event-type "compute.
openstack server stop testsrv
# wait
openstack alarm list
no longer affects: | charm-nova-cloud-controller |
Changed in charm-nova-compute: | |
status: | New → In Progress |
Changed in charm-nova-compute: | |
assignee: | nobody → Dmitrii Shcherbakov (dmitriis) |
Changed in charm-nova-compute: | |
milestone: | none → 18.02 |
Changed in charm-nova-compute: | |
status: | Fix Committed → Fix Released |
If I enable admin plugins in rabbitmq and explore what happens, I see only samples in the cloud where the problem exists, not notifications which seems to indicate that the theory with nova services not emitting them (or not being able to submit) is correct.
# rabbitmq-plugins enable rabbitmq_management
# # good cloud stats.publish message_ stats.deliver -u $u -p $p | grep notifications critical | | | sample | 906 | 906 | designate. info | 330 | | notifications. error | 1 | | notifications. info | 17
# python `updatedb && locate rabbitmqadmin` -V openstack list queues name message_
| notifications.audit | | |
| notifications.
| notifications.debug | | |
| notifications.error | 1 | 1 |
| notifications.info | 52 | 52 |
| notifications.
| notifications.warn | | |
| notifications_
| versioned_
| versioned_
I can also confirm that those stats increase on the "good" cloud after I start and stop an instance (2 times for start and 2 times for power off: .start and .end events respectively).
# # bad cloud stats.publish message_ stats.deliver -u $u -p $p | grep notifications critical | | | sample | 36 | 36 |
# python `updatedb && locate rabbitmqadmin` -V openstack list queues name message_
| notifications.audit | | |
| notifications.
| notifications.debug | | |
| notifications.error | | |
| notifications.info | | |
| notifications.
| notifications.warn | | |
| not...