Juju Charms Collection
nova-compute package

OS-charms should check for expected services/processes before setting workload status to a ready state.

Bug #1524388 reported by Ryan Beisner on 2015-12-09

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	nova-compute (Juju Charms Collection)	Fix Released	High	Alex Kavanagh	Juju Charms Collection 16.04

Bug Description

As of the 15.10 charms, workload status can be set to "Unit is Ready," even when a critical service has failed to start.

Taking it one step further: a hook should probably fail in those cases.

I've observed bug leaks in the following, where this type of sanity check within the charm would have raised red flags before charm commits or SRUs:
- nova-compute
- swift-*
- rabbitmq-server

This also impacts automation and testability of our charms in that:

1. The amulet tests, mojo spec tests, and other tests, wait for the charm to advertise "I'm Ready" via workload status before commencing tests. Service checks in the Amulet tests will catch this leak, but other functional tests which may not inspect or exercise all relevant processes may not catch it.

2. Systems of automation, such as autopilot, mojo specs, and generic bundle deployment would be better-served by early failure, ie. a failed hook or a not-ready service, before moving on to next steps of the deployment automation.

This is targeted to the nova-compute charm for initial discussion. However, all OpenStack charms should be considered for this enhancement.

See original description

Tags:

Related branches

lp:~ajkavanagh/charm-helpers/add-service-checks-lp1524388

Merged into lp:charm-helpers at revision 531

Liam Young (community): Approve on 2016-02-17

David Ames (community): Approve on 2016-02-16

lp:~ajkavanagh/charms/trusty/keystone/add-service-checks-lp1524388

Merged into lp:~openstack-charmers-archive/charms/trusty/keystone/next at revision 209

David Ames (community): Approve on 2016-02-19

Ryan Beisner (1chb1n) on 2015-12-09

description:

updated

Revision history for this message

Alex Kavanagh (ajkavanagh) wrote on 2016-01-21:

I've done some digging through 4 charms and there appears to be (or perhaps the beginnings of) a pattern that defines the following useful three functions 'services()', 'restart_map()' and 'assess_status(configs)'.

The charmhelpers.core.host module provides a 'service_running(<service_name_string>)' function that returns True/False if the service is running.

The charmelpers.core.host module also provides 'service(<action string>, <service name string>)' that uses the OS systemctl (systemd) of service commands to perform an action (like start, stop, restart, etc.). This is call blocks until the OS command finishes. Thus, either a 'restart' or 'start' will succeed or fail (quickly), unless the service later fails.

The proposal, therefore, is to either:

a) Modify assess_status(...) in all of the charms to call something like:

all_running = reduce(operator.and_, [service_running(s) for s in services()], True)
if not all_running:
<set state to some failed state>

(obviously, for efficiency, we might want to bail on the first 'not running' service, so we could re-write that as a for // break.)

b) Change set_os_workload_status(...) to test for whether the services that should be running are running, and set a failed state if they are not. This would require a charm sync across all the charms, but might be simpler from a conceptual perspective.

However, I'm not sure (enough) how set_os_workload_status(...) is used to know whether this is a breaking change to how it was designed to be used.

Thoughts?

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2016-01-26:

I'd lean toward (a).

IMHO, the self-aware(tm) charm will possess some basic functionality checks to know if it is running properly, before declaring itself ready. This may involve introspection of more than just running services or processes. It could also be checking for a listening socket, or some arbitrary test method. I think we will be best served by having all three. But a process check is a good start.

Idea: start out by just checking for running processes. Construct a mapping of <charm>: [<expected_processes>] in the form of a centralized helper dict (or yaml file) and process check helper. There may be another layer required in that data, as process names and their existence may vary across Ubuntu releases and/or OpenStack releases.

ex. assess_status remains blocked and status is updated if not expected_processes_are_running('keystone', UBUNTU_RELEASE, OS_RELEASE), then some retries, and ultimately a hook is failed after exhausting a generous retry threshold.

This could lead nicely into a new self-check action, where the same would basically re-trigger.

All of this foo would still require a sync into the charms, but no harm there.

Alex Kavanagh (ajkavanagh) on 2016-02-10

Changed in nova-compute (Juju Charms Collection):
status:	New → In Progress

David Ames (thedac) on 2016-02-16

Changed in nova-compute (Juju Charms Collection):
assignee:	nobody → Alex Kavanagh (ajkavanagh)
milestone:	none → 16.04
importance:	Undecided → High

Revision history for this message

Alex Kavanagh (ajkavanagh) wrote on 2016-03-18:

Note that for the moment, port checks are being disabled as some services are asynchronous with respect to their service start/stop scripts/functions.

Changed in nova-compute (Juju Charms Collection):
status:	In Progress → Fix Committed

James Page (james-page) on 2016-04-22

Changed in nova-compute (Juju Charms Collection):
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Juju Charms Collectionnova-compute package

OS-charms should check for expected services/processes before setting workload status to a ready state.

Bug Description

Related branches

Other bug subscribers

Remote bug watches

Juju Charms Collection
nova-compute package