SIGHUP handling fails with multiprocessing

Bug #1042823 reported by David Kranz
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Glance
Fix Released
Critical
Stuart McLaren

Bug Description

If you call 'glance-control api start' and then 'glance-control api reload' you get an infinite cascade of these messages in the log. This only applies
to deployments with the config opt workers!=0.

2012-08-28 11:54:19 4965 ERROR [eventlet.wsgi.server] Removing dead child 5018
2012-08-28 11:54:19 4965 INFO [eventlet.wsgi.server] Started child 5023
2012-08-28 11:54:19 4966 ERROR [eventlet.wsgi.server] Removing dead child 5019
2012-08-28 11:54:19 4966 INFO [eventlet.wsgi.server] Started child 5024
2012-08-28 11:54:19 4965 ERROR [eventlet.wsgi.server] Removing dead child 5020
2012-08-28 11:54:19 4965 INFO [eventlet.wsgi.server] Started child 5025
2012-08-28 11:54:19 4965 ERROR [eventlet.wsgi.server] Removing dead child 5023
2012-08-28 11:54:19 4965 INFO [eventlet.wsgi.server] Started child 5026
2012-08-28 11:54:19 4966 ERROR [eventlet.wsgi.server] Removing dead child 5024
2012-08-28 11:54:19 4966 INFO [eventlet.wsgi.server] Started child 5027
2012-08-28 11:54:19 4965 ERROR [eventlet.wsgi.server] Removing dead child 5025
2012-08-28 11:54:19 4965 INFO [eventlet.wsgi.server] Started child 5028
2012-08-28 11:54:19 4965 ERROR [eventlet.wsgi.server] Removing dead child 5026
2012-08-28 11:54:19 4965 INFO [eventlet.wsgi.server] Started child 5029
2012-08-28 11:54:19 4966 ERROR [eventlet.wsgi.server] Removing dead child 5027
2012-08-28 11:54:19 4966 INFO [eventlet.wsgi.server] Started child 5030
2012-08-28 11:54:19 4965 ERROR [eventlet.wsgi.server] Removing dead child 5028
2012-08-28 11:54:19 4965 INFO [eventlet.wsgi.server] Started child 5031
2012-08-28 11:54:19 4966 ERROR [eventlet.wsgi.server] Removing dead child 5030
2012-08-28 11:54:19 4966 INFO [eventlet.wsgi.server] Started child 5032
201

Revision history for this message
David Kranz (david-kranz) wrote :

Note that this does not happen when workers=0 in glance-api.conf.

Revision history for this message
Brian Waldon (bcwaldon) wrote :

Definitely reproducible just as David has documented.

Changed in glance:
importance: Undecided → Critical
milestone: none → folsom-rc1
status: New → Triaged
Revision history for this message
Brian Waldon (bcwaldon) wrote :

Separated out the traceback into bug 1044119

description: updated
Brian Waldon (bcwaldon)
description: updated
Brian Waldon (bcwaldon)
summary: - glance-control <service> reload doesn't work
+ glance-control <service> reload doesn't work with multiprocessing
Brian Waldon (bcwaldon)
summary: - glance-control <service> reload doesn't work with multiprocessing
+ SIGHUP handling fails with multiprocessing
Changed in glance:
assignee: nobody → Stuart McLaren (stuart-mclaren)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (master)

Fix proposed to branch: master
Review: https://review.openstack.org/12566

Changed in glance:
status: Triaged → In Progress
Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :

When the SIGHUP was received, the child process which was in run_child() would return to the main
while loop (while len(self.children) < CONF.workers:, in function 'start) and would end up calling run_child() again.
This would was creating a new generation of 'grandchildren' processes.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to glance (master)

Reviewed: https://review.openstack.org/12566
Committed: http://github.com/openstack/glance/commit/3684d8d42a1f7f972a91763f5a85465c575ebbd8
Submitter: Jenkins
Branch: master

commit 3684d8d42a1f7f972a91763f5a85465c575ebbd8
Author: Stuart McLaren <email address hidden>
Date: Fri Sep 7 10:52:26 2012 +0000

    Handle multi-process SIGHUP correctly

    Child processes were returning to the main while loop
    and spawning a next generation of processes. Instead
    they should exit cleanly.

    Fix for bug 1042823.

    Change-Id: I8c0d150640c78487b43279a44d0d6d3ac3e386cc

Changed in glance:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in glance:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in glance:
milestone: folsom-rc1 → 2012.2
Revision history for this message
StuartIanNaylor (stuartiannaylor) wrote :

I spent much time drop the database and restarting in different orders.

Then came across this bug report.
Doesn't to matter what I set workers to. Still the same thing with a log full of child processes.

Anyone help?

Stuart.

ps there are quite a few errors in http://docs.openstack.org/trunk/openstack-compute/install/apt/content/setting-up-tenants-users-and-roles-manually.html

after going through once manually I didn't fancy it again _ instead of - in operators and some missing user instead of user-id...

Revision history for this message
Crue Jones (cruejones) wrote :

verified that my install has this patch and yet problem still exists - or maybe this is another bug that just looks similar?

2012-10-25 09:10:54 21970 ERROR eventlet.wsgi.server [-] Removing dead child 24014
2012-10-25 09:10:54 24015 DEBUG glance.common.config [-] Loading glance-registry-keystone from /etc/glance/glance-api-paste.ini load_paste_app /usr/lib/python2.7/dist-packages/glance/common/config.py:185
2012-10-25 09:10:54 21970 ERROR eventlet.wsgi.server [-] Removing dead child 24015
2012-10-25 09:10:54 24016 DEBUG glance.common.config [-] Loading glance-registry-keystone from /etc/glance/glance-api-paste.ini load_paste_app /usr/lib/python2.7/dist-packages/glance/common/config.py:185

get hundreds of the above streaming non-stop to registry.log.

Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :

I'm not able to reproduce this with folsom/stable on ubuntu12. I wonder is it an environment/config issue you're seeing?

Revision history for this message
Crue Jones (cruejones) wrote :

I followed http://docs.openstack.org/trunk/openstack-compute/install/apt/content/configure-glance-files.html on ubuntu 12.10. I did notice there were some issues with the docs though - for instance keystone args changed and the ones listed no longer work (ie. user vs. user_id ) - wonder if the doc has other issues withing glance configs .

Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :

Are you doing a 'glance-control registry reload' or are you just starting the service (for the first time) when you see the log output?

Revision history for this message
Crue Jones (cruejones) wrote :

started service normally for the first time - brand new install and config. also restarted multiple times with standard "service <> restart and through reboots.

Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :

If you're ok with attaching the your config files in /etc/glance I can try and see if I can reproduce using your config settings.

Revision history for this message
Jiangping He (jiangping-he) wrote :

I have exactly same problem as Crue Jones. I did following extra steps according to the user guide and it seems fine. I think It still needs a confirm from an expert.

1) Update /etc/glance/glance-registry-paste.ini and configure the admin_* values under [filter:authtoken]
[filter:authtoken]
paste.filter_factory = keystone.middleware.auth_token:filter_factory
admin_tenant_name = service
admin_user = glance
admin_password = glance

2) Update /etc/glance/glance-registry.conf under [paste_deploy]
[paste_deploy]
# Name of the paste configuration file that defines the available pipelines
config_file = /etc/glance/glance-registry-paste.ini
# ______ instead of
# config_file = /etc/glance/glance-api-paste.ini
# in the user guide

Revision history for this message
Brian Waldon (bcwaldon) wrote :

Try setting workers=0 in your glance-registry and glance-api configs, then starting the services. That would hopefully dump whatever exception is being raised to stdout/stderr

Revision history for this message
Crue Jones (cruejones) wrote :

Sorry for the late response (Sandy hit my area hard). The above proposed solution from Jiangping He seems to have done the trick. Should the docs be updated to reflect this?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.