reload.yml is sending sighup to non-sighupable services

Bug #1622117 reported by Steven Dake
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla
Fix Released
Critical
Steven Dake
Liberty
Fix Released
Critical
Steven Dake
Mitaka
Fix Released
Critical
Steven Dake

Bug Description

The kolla gate is locking up when sending a sighup to nova-novncproxy. An example log
http://logs.openstack.org/90/368290/1/check/gate-kolla-dsvm-deploy-oraclelinux-source-centos-7-nv/fb5b27c/console.html

See first timestamp:
2016-09-10 03:54:14.354580

see last timestamp:
2016-09-10 04:47:47.129644

The delta in the timestamp log is 53 minutes. The gate timer is 90 minutes.

The kolla code in question is as follows:
https://github.com/openstack/kolla/blob/master/ansible/roles/nova/tasks/reload.yml#L16

specifically:
  command: docker exec -t nova_novncproxy kill -1 1

I highly doubt dumbinit is smart enough to propagate a SIGHUP to all of its children. I'm not sure that would even work well.

I would suggest using killall for this use case i.e. killall -SIGHUP nova-api or killal -SIGHUP nova_novncproxy

This reared its head as a result of two changes:
dumbinit was added to the codebase as pid 1
the gate was modified to run reconfigure and upgrade - both of which rely on pid1 being one of the processes to reload.

I think this logic is flawed. nova-api for example can have several processes that it spawns, and I am unclear that it actually sends signals to all of the nova-api processes in the container - just the first one. So reconfigure probably doesn't work at all in the stable branches either.

Steven Dake (sdake)
Changed in kolla:
status: New → Confirmed
importance: Undecided → Critical
milestone: none → newton-rc1
assignee: nobody → Steven Dake (sdake)
Revision history for this message
Steven Dake (sdake) wrote :

Partially fixed by:
https://review.openstack.org/#/c/368334/

There may be other scenarios like this hiding in the codebase.

summary: - reload.yml is wrong throughout codebase, especially with dumbinit
+ reload.yml is sending sighup to non-sighupable services in reload.yml
summary: - reload.yml is sending sighup to non-sighupable services in reload.yml
+ reload.yml is sending sighup to non-sighupable services
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla (master)

Reviewed: https://review.openstack.org/368334
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=8cd59db6f641c4ee494dae6916b30207476c4a9e
Submitter: Jenkins
Branch: master

commit 8cd59db6f641c4ee494dae6916b30207476c4a9e
Author: Michal (inc0) Jastrzebski <email address hidden>
Date: Sat Sep 10 16:14:18 2016 +0000

    Remove novncproxy and spice from reload

    Since these services don't use RPC, we don't need to reload them. And
    these caused problems in gates.

    Change-Id: I6967bdc7da0d0c3c06873e3d554124ca995f4c13
    Closes-Bug: #1622117

Changed in kolla:
status: Confirmed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/368339

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/368340

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla (stable/mitaka)

Reviewed: https://review.openstack.org/368339
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=1638942cea12aaf9f8398ed783c1e40377bf801c
Submitter: Jenkins
Branch: stable/mitaka

commit 1638942cea12aaf9f8398ed783c1e40377bf801c
Author: Michal (inc0) Jastrzebski <email address hidden>
Date: Sat Sep 10 16:14:18 2016 +0000

    Remove novncproxy and spice from reload

    Since these services don't use RPC, we don't need to reload them. And
    these caused problems in gates.

    Change-Id: I6967bdc7da0d0c3c06873e3d554124ca995f4c13
    Closes-Bug: #1622117
    (cherry picked from commit 8cd59db6f641c4ee494dae6916b30207476c4a9e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla (stable/liberty)

Reviewed: https://review.openstack.org/368340
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=b9b99f74a6678c0a6a2bdd992f26400676d21166
Submitter: Jenkins
Branch: stable/liberty

commit b9b99f74a6678c0a6a2bdd992f26400676d21166
Author: Michal (inc0) Jastrzebski <email address hidden>
Date: Sat Sep 10 16:14:18 2016 +0000

    Remove novncproxy and spice from reload

    Since these services don't use RPC, we don't need to reload them. And
    these caused problems in gates.

    Change-Id: I6967bdc7da0d0c3c06873e3d554124ca995f4c13
    Closes-Bug: #1622117
    (cherry picked from commit 8cd59db6f641c4ee494dae6916b30207476c4a9e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla 2.0.3

This issue was fixed in the openstack/kolla 2.0.3 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.