Restarting service before config blocks hooks

Bug #1641464 reported by Andrew McLeod
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Nova Compute Proxy Charm
Won't Fix
Medium
Unassigned

Bug Description

https://github.com/openstack/charm-nova-compute-proxy/blob/master/hooks/fabfile.py#L46

The enable/start/restart service calls will fire before config has taken place, which blocks any further hooks from executing. This can be bypassed simply by forcing the restart into the background.

This fix would be pretty inelegant, and ideally the charm shouldn't attempt to start the service until all the relations are complete, but this is a potential 1 character fix.

Tags: uosci
Vance Morris (vmorris)
Changed in charm-nova-compute-proxy:
status: New → Confirmed
Revision history for this message
Ryan Beisner (1chb1n) wrote :

I think this is actually systemd waiting a 'long time' for a service to start, which will not be able to start, because it is not yet configured.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

In Ubuntu packaging, we decided to make the systemd behavior more like the previous upstart behavior, ie. make it much more aggressive so that a service which is not configured will fail faster. ex.

http://paste.ubuntu.com/23553874/

Revision history for this message
Vance Morris (vmorris) wrote :

Can the charm not try to start the service until the appropriate configurations are in place?
I don't think that systemd is attempting to start the service immediately following the yum installation -- in my juju deploy logs, right after the installation of the packages, while still in the config-changed hook, I see the tasks 'enable_service' and 'start_service' run.

This is well before the relations are detected and configurations are made as a result.

Revision history for this message
Vance Morris (vmorris) wrote :

I should clarify - I don't think it's anything but the charm that's enabling and starting the openstack-nova-compute.service. Perhaps this was already understood, but I wanted to clarify it for myself.

Revision history for this message
Vance Morris (vmorris) wrote :

Thanks for that link Ryan -- shouldn't there also be a TimeoutStartSec= set then?

https://www.freedesktop.org/software/systemd/man/systemd.service.html#TimeoutStartSec=

Revision history for this message
Andrew McLeod (admcleod) wrote :

I agree that the charm shouldn't start the service if it is not configured, and should probably attempt to stop the service if it was previously started and is now missing relations or configuration etc.

I manually attempted to start/stop the openstack-nova-compute.service when unconfigured and as expected systemctl will hang, because the service definition contains a Timeout:

/etc/systemd/system/multi-user.target.wants/openstack-nova-compute.service

TimeoutStartSec=0

Which overrides the system wide defaults.

Changing this to another value (tested with 15, perhaps 60 seconds would be better) means the hooks will fire, and may error due to an unfinished configuration / incomplete relation(s) state but these will attempt to resolve themselves - making this (modifying timeout in service definition file) a convenient work-around until the charm is fixed.

Revision history for this message
Vance Morris (vmorris) wrote :

I just ran a new deployment, paying attention to the juju debug-log this time.

Following the verification of *MOST* packages (89/99), the process hangs. No indication that the config-changed hook has executed at all.

Checking on the compute node:
[root@zs93k23 ~]# systemctl status openstack-nova-compute.service
● openstack-nova-compute.service - OpenStack Nova Compute Server
   Loaded: loaded (/usr/lib/systemd/system/openstack-nova-compute.service; enabled; vendor preset: disabled)
   Active: activating (start) since Tue 2016-11-29 17:11:56 EST; 2min 6s ago
 Main PID: 63372 (nova-compute)
   CGroup: /system.slice/openstack-nova-compute.service
           └─63372 /usr/bin/python2 /usr/bin/nova-compute

Nov 29 17:11:56 zs93k23 systemd[1]: Starting OpenStack Nova Compute Server...

Running systemctl stop on the service, I immediately notice the unit.nova-compute-proxy/0.config-changed hook fire.

The behaviour is different this time though -- the service stays stopped and I had to start it manually -- it did start up normally!

juju debug-log with a few notes: http://paste.ubuntu.com/23555412/

Revision history for this message
Andrew McLeod (admcleod) wrote :

The behaviour with respect to which service is starting vs what is reported in the logs doesn't seem to be entirely in sync. Vance, If you happen to do a redeploy, before killing the process can you test setting TimeoutStartSec=15 in /etc/systemd/system/multi-user.target.wants/openstack-nova-compute.service, then "systemctl daemon-reload", pkill -f systemctl at that point, and then (as was the case when I tested this) all hooks will fire appropriately (although sometimes with errors).

Revision history for this message
Vance Morris (vmorris) wrote :

It's possible that the mismatch here in #7 is due to my having removed the compute node, manually cleaned the OS database, and then redeployed using the proxy charm. I will try this work around next time I deploy.

Revision history for this message
Vance Morris (vmorris) wrote :

@admcleod, your workaround in #8 is fine:

Deployed proxy charm, adding a new compute node (same name as previously removed node btw). The service hangs in the same manner as before:
[root@zs93k23 ~]# systemctl status openstack-nova-compute.service
● openstack-nova-compute.service - OpenStack Nova Compute Server
   Loaded: loaded (/usr/lib/systemd/system/openstack-nova-compute.service; enabled; vendor preset: disabled)
   Active: activating (start) since Wed 2016-11-30 17:11:08 EST; 21s ago
 Main PID: 12932 (nova-compute)
   CGroup: /system.slice/openstack-nova-compute.service
           └─12932 /usr/bin/python2 /usr/bin/nova-compute

Nov 30 17:11:08 zs93k23 systemd[1]: Starting OpenStack Nova Compute Server...

Applying workaround:
[root@zs93k23 ~]# vim /etc/systemd/system/multi-user.target.wants/openstack-nova-compute.service
[root@zs93k23 ~]# grep Timeout /etc/systemd/system/multi-user.target.wants/openstack-nova-compute.service
TimeoutStartSec=15
[root@zs93k23 ~]# systemctl daemon-reload

Approximately 15 seconds later, I see the hooks continue to fire in the unit debug log and shortly after this:

[root@zs93k23 ~]# systemctl status openstack-nova-compute.service
● openstack-nova-compute.service - OpenStack Nova Compute Server
   Loaded: loaded (/usr/lib/systemd/system/openstack-nova-compute.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2016-11-30 17:13:43 EST; 327ms ago
 Main PID: 13769 (nova-compute)
   CGroup: /system.slice/openstack-nova-compute.service
           └─13769 /usr/bin/python2 /usr/bin/nova-compute

Nov 30 17:13:41 zs93k23 systemd[1]: Starting OpenStack Nova Compute Server...
Nov 30 17:13:43 zs93k23 nova-compute[13769]: Option "logdir" from group "DEFAULT" is deprecated. Use option "log-dir" from group "DEFAULT".
Nov 30 17:13:43 zs93k23 nova-compute[13769]: /usr/lib/python2.7/site-packages/pkg_resources/__init__.py:187: RuntimeWarning: You have iterated over ...
Nov 30 17:13:43 zs93k23 nova-compute[13769]: stacklevel=1,
Nov 30 17:13:43 zs93k23 systemd[1]: Started OpenStack Nova Compute Server.
Hint: Some lines were ellipsized, use -l to show in full.

James Page (james-page)
Changed in charm-nova-compute-proxy:
importance: Undecided → Medium
status: Confirmed → Triaged
Ryan Beisner (1chb1n)
Changed in charm-nova-compute-proxy:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.