SIGHUP handling fails with multiprocessing

Bug #1042823 reported by David Kranz on 2012-08-28
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Glance
Critical
Stuart McLaren

Bug Description

If you call 'glance-control api start' and then 'glance-control api reload' you get an infinite cascade of these messages in the log. This only applies
to deployments with the config opt workers!=0.

2012-08-28 11:54:19 4965 ERROR [eventlet.wsgi.server] Removing dead child 5018
2012-08-28 11:54:19 4965 INFO [eventlet.wsgi.server] Started child 5023
2012-08-28 11:54:19 4966 ERROR [eventlet.wsgi.server] Removing dead child 5019
2012-08-28 11:54:19 4966 INFO [eventlet.wsgi.server] Started child 5024
2012-08-28 11:54:19 4965 ERROR [eventlet.wsgi.server] Removing dead child 5020
2012-08-28 11:54:19 4965 INFO [eventlet.wsgi.server] Started child 5025
2012-08-28 11:54:19 4965 ERROR [eventlet.wsgi.server] Removing dead child 5023
2012-08-28 11:54:19 4965 INFO [eventlet.wsgi.server] Started child 5026
2012-08-28 11:54:19 4966 ERROR [eventlet.wsgi.server] Removing dead child 5024
2012-08-28 11:54:19 4966 INFO [eventlet.wsgi.server] Started child 5027
2012-08-28 11:54:19 4965 ERROR [eventlet.wsgi.server] Removing dead child 5025
2012-08-28 11:54:19 4965 INFO [eventlet.wsgi.server] Started child 5028
2012-08-28 11:54:19 4965 ERROR [eventlet.wsgi.server] Removing dead child 5026
2012-08-28 11:54:19 4965 INFO [eventlet.wsgi.server] Started child 5029
2012-08-28 11:54:19 4966 ERROR [eventlet.wsgi.server] Removing dead child 5027
2012-08-28 11:54:19 4966 INFO [eventlet.wsgi.server] Started child 5030
2012-08-28 11:54:19 4965 ERROR [eventlet.wsgi.server] Removing dead child 5028
2012-08-28 11:54:19 4965 INFO [eventlet.wsgi.server] Started child 5031
2012-08-28 11:54:19 4966 ERROR [eventlet.wsgi.server] Removing dead child 5030
2012-08-28 11:54:19 4966 INFO [eventlet.wsgi.server] Started child 5032
201

David Kranz (david-kranz) wrote :

Note that this does not happen when workers=0 in glance-api.conf.

Brian Waldon (bcwaldon) wrote :

Definitely reproducible just as David has documented.

Changed in glance:
importance: Undecided → Critical
milestone: none → folsom-rc1
status: New → Triaged
Brian Waldon (bcwaldon) wrote :

Separated out the traceback into bug 1044119

description: updated
Brian Waldon (bcwaldon) on 2012-08-30
description: updated
Brian Waldon (bcwaldon) on 2012-09-03
summary: - glance-control <service> reload doesn't work
+ glance-control <service> reload doesn't work with multiprocessing
Brian Waldon (bcwaldon) on 2012-09-05
summary: - glance-control <service> reload doesn't work with multiprocessing
+ SIGHUP handling fails with multiprocessing
Changed in glance:
assignee: nobody → Stuart McLaren (stuart-mclaren)

Fix proposed to branch: master
Review: https://review.openstack.org/12566

Changed in glance:
status: Triaged → In Progress
Stuart McLaren (stuart-mclaren) wrote :

When the SIGHUP was received, the child process which was in run_child() would return to the main
while loop (while len(self.children) < CONF.workers:, in function 'start) and would end up calling run_child() again.
This would was creating a new generation of 'grandchildren' processes.

Reviewed: https://review.openstack.org/12566
Committed: http://github.com/openstack/glance/commit/3684d8d42a1f7f972a91763f5a85465c575ebbd8
Submitter: Jenkins
Branch: master

commit 3684d8d42a1f7f972a91763f5a85465c575ebbd8
Author: Stuart McLaren <email address hidden>
Date: Fri Sep 7 10:52:26 2012 +0000

    Handle multi-process SIGHUP correctly

    Child processes were returning to the main while loop
    and spawning a next generation of processes. Instead
    they should exit cleanly.

    Fix for bug 1042823.

    Change-Id: I8c0d150640c78487b43279a44d0d6d3ac3e386cc

Changed in glance:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2012-09-11
Changed in glance:
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2012-09-27
Changed in glance:
milestone: folsom-rc1 → 2012.2

I spent much time drop the database and restarting in different orders.

Then came across this bug report.
Doesn't to matter what I set workers to. Still the same thing with a log full of child processes.

Anyone help?

Stuart.

ps there are quite a few errors in http://docs.openstack.org/trunk/openstack-compute/install/apt/content/setting-up-tenants-users-and-roles-manually.html

after going through once manually I didn't fancy it again _ instead of - in operators and some missing user instead of user-id...

Crue Jones (cruejones) wrote :

verified that my install has this patch and yet problem still exists - or maybe this is another bug that just looks similar?

2012-10-25 09:10:54 21970 ERROR eventlet.wsgi.server [-] Removing dead child 24014
2012-10-25 09:10:54 24015 DEBUG glance.common.config [-] Loading glance-registry-keystone from /etc/glance/glance-api-paste.ini load_paste_app /usr/lib/python2.7/dist-packages/glance/common/config.py:185
2012-10-25 09:10:54 21970 ERROR eventlet.wsgi.server [-] Removing dead child 24015
2012-10-25 09:10:54 24016 DEBUG glance.common.config [-] Loading glance-registry-keystone from /etc/glance/glance-api-paste.ini load_paste_app /usr/lib/python2.7/dist-packages/glance/common/config.py:185

get hundreds of the above streaming non-stop to registry.log.

Stuart McLaren (stuart-mclaren) wrote :

I'm not able to reproduce this with folsom/stable on ubuntu12. I wonder is it an environment/config issue you're seeing?

Crue Jones (cruejones) wrote :

I followed http://docs.openstack.org/trunk/openstack-compute/install/apt/content/configure-glance-files.html on ubuntu 12.10. I did notice there were some issues with the docs though - for instance keystone args changed and the ones listed no longer work (ie. user vs. user_id ) - wonder if the doc has other issues withing glance configs .

Are you doing a 'glance-control registry reload' or are you just starting the service (for the first time) when you see the log output?

Crue Jones (cruejones) wrote :

started service normally for the first time - brand new install and config. also restarted multiple times with standard "service <> restart and through reboots.

If you're ok with attaching the your config files in /etc/glance I can try and see if I can reproduce using your config settings.

Jiangping He (jiangping-he) wrote :

I have exactly same problem as Crue Jones. I did following extra steps according to the user guide and it seems fine. I think It still needs a confirm from an expert.

1) Update /etc/glance/glance-registry-paste.ini and configure the admin_* values under [filter:authtoken]
[filter:authtoken]
paste.filter_factory = keystone.middleware.auth_token:filter_factory
admin_tenant_name = service
admin_user = glance
admin_password = glance

2) Update /etc/glance/glance-registry.conf under [paste_deploy]
[paste_deploy]
# Name of the paste configuration file that defines the available pipelines
config_file = /etc/glance/glance-registry-paste.ini
# ______ instead of
# config_file = /etc/glance/glance-api-paste.ini
# in the user guide

Brian Waldon (bcwaldon) wrote :

Try setting workers=0 in your glance-registry and glance-api configs, then starting the services. That would hopefully dump whatever exception is being raised to stdout/stderr

Crue Jones (cruejones) wrote :

Sorry for the late response (Sandy hit my area hard). The above proposed solution from Jiangping He seems to have done the trick. Should the docs be updated to reflect this?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers