reload.yml is sending sighup to non-sighupable services
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kolla |
Fix Released
|
Critical
|
Steven Dake | ||
Liberty |
Fix Released
|
Critical
|
Steven Dake | ||
Mitaka |
Fix Released
|
Critical
|
Steven Dake |
Bug Description
The kolla gate is locking up when sending a sighup to nova-novncproxy. An example log
http://
See first timestamp:
2016-09-10 03:54:14.354580
see last timestamp:
2016-09-10 04:47:47.129644
The delta in the timestamp log is 53 minutes. The gate timer is 90 minutes.
The kolla code in question is as follows:
https:/
specifically:
command: docker exec -t nova_novncproxy kill -1 1
I highly doubt dumbinit is smart enough to propagate a SIGHUP to all of its children. I'm not sure that would even work well.
I would suggest using killall for this use case i.e. killall -SIGHUP nova-api or killal -SIGHUP nova_novncproxy
This reared its head as a result of two changes:
dumbinit was added to the codebase as pid 1
the gate was modified to run reconfigure and upgrade - both of which rely on pid1 being one of the processes to reload.
I think this logic is flawed. nova-api for example can have several processes that it spawns, and I am unclear that it actually sends signals to all of the nova-api processes in the container - just the first one. So reconfigure probably doesn't work at all in the stable branches either.
Changed in kolla: | |
status: | New → Confirmed |
importance: | Undecided → Critical |
milestone: | none → newton-rc1 |
assignee: | nobody → Steven Dake (sdake) |
Partially fixed by: /review. openstack. org/#/c/ 368334/
https:/
There may be other scenarios like this hiding in the codebase.