aodh-api is restarted every 5 minutes

Bug #1689710 reported by Xav Paice
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack AODH Charm
Incomplete
Low
Unassigned

Bug Description

Xenial, Mitaka, juju 2.1.2, charm version stable/17.02

Every 5 mins in the aodh api logs, we see what looks very much like a restart of the process. Confirming that with ps shows that to be the case.

At the same minute past each hour, we see update-status hook running (http://pastebin.ubuntu.com/24547150/) in the juju unit log on the host running aodh.

Looking at the charm code, it appears that the write to configs triggers aodh.reload_and_restart(), which includes ch_host.service_restart('aodh-api').

If we could avoid re-writing the config every time the update-status hook that would help immensely.

Xav Paice (xavpaice)
description: updated
Xav Paice (xavpaice)
description: updated
Revision history for this message
Drew Freiberger (afreiberger) wrote :
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

@afreiberger As you've already determined, the keystone issue seems to be caused by the keystone-ldap charm which is a reactive charm. aodh is ALSO a reactive charm. This is also happening with designate, which is ALSO a reactive charm.

The common theme is that they are all reactive charms and use the charms.openstack library. What's probably happening in each of the charms is that a reactive handler is triggering the same behaviour over and over again, and that behaviour has the unintended side-effect of triggering a restart. An example of this was in the designate charm, where the common haproxy code in charms.openstack was causing the haproxy.conf file to be re-written every update status, with the same information, but unfortunately in a non-deterministic order (this is almost fixed).

The options are either working through all of the charms and stopping the unintended behaviour (which is tricky, as charms.reactive _wants_ all handlers that have true conditions to run on every hook invocation, and it's hard to 'gate' behaviour to only run once - the states get a bit out of control), or fix layer-openstack so that it includes a custom reactive 'update-status' handler than DOESN'T run the charms.reactive handler system but does allow the introspection of interface (relation) states.

I'll raise a bug in charms.openstack to track this, and reference it back here.

Revision history for this message
Xav Paice (xavpaice) wrote :

Note that https://code.launchpad.net/~billy-olsen/charm-helpers/lp1698343/+merge/326495 includes a charmhelpers fix which maybe the solution to all this - but needs a fresh release of charmhelpers since it's not included in the current one on Pypi. Current version is 0.16.0.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Related bug in charms.openstack (the base of the OpenStack reactive charms): https://bugs.launchpad.net/charms.openstack/+bug/1702316

Changed in charm-aodh:
status: New → Triaged
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :
Download full text (4.6 KiB)

This bug is now fixed in master:

root@juju-7dbb96-0:/var/log/aodh# date
Fri Aug 25 15:02:30 UTC 2017
root@juju-7dbb96-0:/var/log/aodh# ps ax | grep aodh
 1782 ? Ss 0:00 bash /var/lib/juju/init/jujud-unit-aodh-0/exec-start.sh
 1786 ? Sl 0:01 /var/lib/juju/tools/unit-aodh-0/jujud unit --data-dir /var/lib/juju --unit-name aodh/0 --debug
21191 ? Ss 0:00 /usr/bin/python /usr/bin/aodh-api --config-file=/etc/aodh/aodh.conf --log-file=/var/log/aodh/aodh-api.log
21200 ? Ss 0:00 /usr/bin/python /usr/bin/aodh-evaluator --config-file=/etc/aodh/aodh.conf --log-file=/var/log/aodh/aodh-evaluator.log
21209 ? Ssl 0:00 /usr/bin/python /usr/bin/aodh-notifier --config-file=/etc/aodh/aodh.conf --log-file=/var/log/aodh/aodh-notifier.log
21218 ? Ssl 0:00 /usr/bin/python /usr/bin/aodh-listener --config-file=/etc/aodh/aodh.conf --log-file=/var/log/aodh/aodh-listener.log
22666 pts/0 S+ 0:00 tail -f unit-aodh-0.log
23801 pts/1 S+ 0:00 grep --color=auto aodh
root@juju-7dbb96-0:/var/log/aodh# date
Fri Aug 25 15:09:47 UTC 2017
root@juju-7dbb96-0:/var/log/aodh# ps ax | grep aodh
 1782 ? Ss 0:00 bash /var/lib/juju/init/jujud-unit-aodh-0/exec-start.sh
 1786 ? Sl 0:01 /var/lib/juju/tools/unit-aodh-0/jujud unit --data-dir /var/lib/juju --unit-name aodh/0 --debug
21191 ? Ss 0:00 /usr/bin/python /usr/bin/aodh-api --config-file=/etc/aodh/aodh.conf --log-file=/var/log/aodh/aodh-api.log
21200 ? Ss 0:00 /usr/bin/python /usr/bin/aodh-evaluator --config-file=/etc/aodh/aodh.conf --log-file=/var/log/aodh/aodh-evaluator.log
21209 ? Ssl 0:00 /usr/bin/python /usr/bin/aodh-notifier --config-file=/etc/aodh/aodh.conf --log-file=/var/log/aodh/aodh-notifier.log
21218 ? Ssl 0:00 /usr/bin/python /usr/bin/aodh-listener --config-file=/etc/aodh/aodh.conf --log-file=/var/log/aodh/aodh-listener.log
22666 pts/0 S+ 0:00 tail -f unit-aodh-0.log
25851 pts/1 S+ 0:00 grep --color=auto aodh

Note that the pids are the same with 8 minutes difference + an update-status happening every 5 minutes:

2017-08-25 15:04:16 INFO juju-log Reactive main running for hook update-status
2017-08-25 15:04:16 INFO juju-log Invoking reactive handler: reactive/layer_openstack.py:54:default_update_status
2017-08-25 15:04:16 INFO juju-log Invoking reactive handler: reactive/aodh_handlers.py:70:render_unclustered
2017-08-25 15:04:17 WARNING juju-log DEPRECATION: should not use port_map parameter in APIConfigurationAdapter.__init__()
2017-08-25 15:04:17 WARNING juju-log DEPRECATION: should not use service_name parameter in APIConfigurationAdapter.__init__()
2017-08-25 15:04:17 INFO juju-log Creating choice loader with dirs: [['templates/'], ['/var/lib/juju/agents/unit-aodh-0/.venv/lib/python3.5/site-packages/charmhelpers/contrib/openstack/templates']]
2017-08-25 15:04:17 WARNING juju-log Not adding haproxy listen stanza for aodh-api_int port is already in use
2017-08-25 15:04:17 WARNING juju-log Not adding haproxy listen stanza for aodh-api_public port is already in use
2017-08-25 15:04:17 INFO juju-log Writing file /etc/haproxy/haproxy.cfg root:root 444
2017-08-25 15:04:17 INF...

Read more...

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Changing to incomplete and lowering the priority as I believe this is now resolved due to changes in charms.openstack around ordering of things, etc.

Changed in charm-aodh:
importance: Undecided → Low
status: Triaged → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.