kolla

reload.yml is sending sighup to non-sighupable services

Bug #1622117 reported by Steven Dake on 2016-09-10

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
kolla	Fix Released	Critical	Steven Dake	kolla newton-rc1 "rc1"
Liberty	Fix Released	Critical	Steven Dake	kolla 1.1.3 "liberty"
Mitaka	Fix Released	Critical	Steven Dake	kolla 2.0.3 "mitaka"

Bug Description

The kolla gate is locking up when sending a sighup to nova-novncproxy. An example log
http://logs.openstack.org/90/368290/1/check/gate-kolla-dsvm-deploy-oraclelinux-source-centos-7-nv/fb5b27c/console.html

See first timestamp:
2016-09-10 03:54:14.354580

see last timestamp:
2016-09-10 04:47:47.129644

The delta in the timestamp log is 53 minutes. The gate timer is 90 minutes.

The kolla code in question is as follows:
https://github.com/openstack/kolla/blob/master/ansible/roles/nova/tasks/reload.yml#L16

specifically:
command: docker exec -t nova_novncproxy kill -1 1

I highly doubt dumbinit is smart enough to propagate a SIGHUP to all of its children. I'm not sure that would even work well.

I would suggest using killall for this use case i.e. killall -SIGHUP nova-api or killal -SIGHUP nova_novncproxy

This reared its head as a result of two changes:
dumbinit was added to the codebase as pid 1
the gate was modified to run reconfigure and upgrade - both of which rely on pid1 being one of the processes to reload.

I think this logic is flawed. nova-api for example can have several processes that it spawns, and I am unclear that it actually sends signals to all of the nova-api processes in the container - just the first one. So reconfigure probably doesn't work at all in the stable branches either.

Steven Dake (sdake) on 2016-09-10

Changed in kolla:
status:	New → Confirmed
importance:	Undecided → Critical
milestone:	none → newton-rc1
assignee:	nobody → Steven Dake (sdake)

Revision history for this message

Steven Dake (sdake) wrote on 2016-09-10:

Partially fixed by:
https://review.openstack.org/#/c/368334/

There may be other scenarios like this hiding in the codebase.

summary:	- reload.yml is wrong throughout codebase, especially with dumbinit + reload.yml is sending sighup to non-sighupable services in reload.yml
summary:	- reload.yml is sending sighup to non-sighupable services in reload.yml + reload.yml is sending sighup to non-sighupable services

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-10: Fix merged to kolla (master)

Reviewed: https://review.openstack.org/368334
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=8cd59db6f641c4ee494dae6916b30207476c4a9e
Submitter: Jenkins
Branch: master

commit 8cd59db6f641c4ee494dae6916b30207476c4a9e
Author: Michal (inc0) Jastrzebski <email address hidden>
Date: Sat Sep 10 16:14:18 2016 +0000

Remove novncproxy and spice from reload

Since these services don't use RPC, we don't need to reload them. And
these caused problems in gates.

Change-Id: I6967bdc7da0d0c3c06873e3d554124ca995f4c13
Closes-Bug: #1622117

Changed in kolla:
status:	Confirmed → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-10: Fix proposed to kolla (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/368339

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-10: Fix proposed to kolla (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/368340

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-14: Fix merged to kolla (stable/mitaka)

Reviewed: https://review.openstack.org/368339
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=1638942cea12aaf9f8398ed783c1e40377bf801c
Submitter: Jenkins
Branch: stable/mitaka

commit 1638942cea12aaf9f8398ed783c1e40377bf801c
Author: Michal (inc0) Jastrzebski <email address hidden>
Date: Sat Sep 10 16:14:18 2016 +0000

Remove novncproxy and spice from reload

Since these services don't use RPC, we don't need to reload them. And
these caused problems in gates.

    Change-Id: I6967bdc7da0d0c3c06873e3d554124ca995f4c13
    Closes-Bug: #1622117
    (cherry picked from commit 8cd59db6f641c4ee494dae6916b30207476c4a9e)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-14: Fix merged to kolla (stable/liberty)

Reviewed: https://review.openstack.org/368340
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=b9b99f74a6678c0a6a2bdd992f26400676d21166
Submitter: Jenkins
Branch: stable/liberty

commit b9b99f74a6678c0a6a2bdd992f26400676d21166
Author: Michal (inc0) Jastrzebski <email address hidden>
Date: Sat Sep 10 16:14:18 2016 +0000

Remove novncproxy and spice from reload

Since these services don't use RPC, we don't need to reload them. And
these caused problems in gates.

    Change-Id: I6967bdc7da0d0c3c06873e3d554124ca995f4c13
    Closes-Bug: #1622117
    (cherry picked from commit 8cd59db6f641c4ee494dae6916b30207476c4a9e)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-03-23: Fix included in openstack/kolla 2.0.3

This issue was fixed in the openstack/kolla 2.0.3 release.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.