I've done some digging through 4 charms and there appears to be (or perhaps the beginnings of) a pattern that defines the following useful three functions 'services()', 'restart_map()' and 'assess_status(configs)'.
The charmhelpers.core.host module provides a 'service_running(<service_name_string>)' function that returns True/False if the service is running.
The charmelpers.core.host module also provides 'service(<action string>, <service name string>)' that uses the OS systemctl (systemd) of service commands to perform an action (like start, stop, restart, etc.). This is call blocks until the OS command finishes. Thus, either a 'restart' or 'start' will succeed or fail (quickly), unless the service later fails.
The proposal, therefore, is to either:
a) Modify assess_status(...) in all of the charms to call something like:
all_running = reduce(operator.and_, [service_running(s) for s in services()], True)
if not all_running:
<set state to some failed state>
(obviously, for efficiency, we might want to bail on the first 'not running' service, so we could re-write that as a for // break.)
OR
b) Change set_os_workload_status(...) to test for whether the services that should be running are running, and set a failed state if they are not. This would require a charm sync across all the charms, but might be simpler from a conceptual perspective.
However, I'm not sure (enough) how set_os_workload_status(...) is used to know whether this is a breaking change to how it was designed to be used.
I've done some digging through 4 charms and there appears to be (or perhaps the beginnings of) a pattern that defines the following useful three functions 'services()', 'restart_map()' and 'assess_ status( configs) '.
The charmhelpers. core.host module provides a 'service_ running( <service_ name_string> )' function that returns True/False if the service is running.
The charmelpers. core.host module also provides 'service(<action string>, <service name string>)' that uses the OS systemctl (systemd) of service commands to perform an action (like start, stop, restart, etc.). This is call blocks until the OS command finishes. Thus, either a 'restart' or 'start' will succeed or fail (quickly), unless the service later fails.
The proposal, therefore, is to either:
a) Modify assess_status(...) in all of the charms to call something like:
all_running = reduce( operator. and_, [service_running(s) for s in services()], True)
if not all_running:
<set state to some failed state>
(obviously, for efficiency, we might want to bail on the first 'not running' service, so we could re-write that as a for // break.)
OR
b) Change set_os_ workload_ status( ...) to test for whether the services that should be running are running, and set a failed state if they are not. This would require a charm sync across all the charms, but might be simpler from a conceptual perspective.
However, I'm not sure (enough) how set_os_ workload_ status( ...) is used to know whether this is a breaking change to how it was designed to be used.
Thoughts?