wsgi.Server() starts but is broken on osx (test_multiprocessing never ends)

Bug #994609 reported by Patrick Mezard
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Glance
Fix Released
Medium
Patrick Mezard

Bug Description

  $ ./run_tests.sh glance.tests.functional.v1.test_multiprocessing:TestMultiprocessing.test_multiprocessing

never ends on OSX (10.6, macports python 2.7.3). The process loops endlessly, filling syslog with tracebacks ending with:

 Traceback (most recent call last):
   File "/Users/pmezard/dev/openstack/glance/bin/glance-api", line 52, in <module>
     server.wait()
   File "/Users/pmezard/dev/openstack/glance/glance/common/wsgi.py", line 210, in wait
     self.wait_on_children()
   File "/Users/pmezard/dev/openstack/glance/glance/common/wsgi.py", line 195, in wait_on_children
     self.run_child()
   File "/Users/pmezard/dev/openstack/glance/glance/common/wsgi.py", line 225, in run_child
     self.run_server()
   File "/Users/pmezard/dev/openstack/glance/glance/common/wsgi.py", line 235, in run_server
     eventlet.hubs.use_hub('poll')
   File "/Users/pmezard/dev/openstack/glance/.venv/lib/python2.7/site-packages/eventlet/hubs/__init__.py", line 66, in use_hub
     mod = __import__('eventlet.hubs.' + mod, globals(), locals(), ['Hub'])
   File "/Users/pmezard/dev/openstack/glance/.venv/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 12, in <module>
     EXC_MASK = select.POLLERR | select.POLLHUP
 AttributeError: 'module' object has no attribute 'POLLERR'

This is obviously caused by "poll" API being not available on OSX and in python. The real problem is "poll" hub is being selected once the child process forked (probably to avoid polluting the main thread eventlet configuration), but the error is treated like any other and childs keep being respawned with warnings filling syslog.

Waiting for better suggestions, I will start working on a two steps solution:
- Adding a check_eventlet() which forks, try to use the eventlet hub and return 0 on success. wsgi.Server() will call it before spawning workers and bail out on error.
- Introduce a configuration entry to change the default hub, could be used at least for the tests

Comments?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (master)

Fix proposed to branch: master
Review: https://review.openstack.org/7172

Changed in glance:
assignee: nobody → Patrick Mezard (pmezard)
status: New → In Progress
Brian Waldon (bcwaldon)
Changed in glance:
importance: Undecided → Critical
milestone: none → folsom-1
Revision history for this message
Patrick Mezard (pmezard) wrote :

Selecting "poll" was hardcoded by:

------------
commit e893b248a2f541eb8409c552b17b43c67430d117
Author: Stuart McLaren <email address hidden>
Date: Tue Dec 20 18:03:55 2011 +0000

    Multi-process Glance API server support.

    Implements blueprint multi-process-server. Allows several Glance API
    worker processes to be started, which can increase performance on machines
    with more than one CPU.

    Change-Id: I1cbb48945fd23afd71de3a30b80836b590c023a1
------------

I have no idea what kind of load is expected on glance-api, but if Stuart spent time making it multi-process I suppose this is not negligible. So I would avoid the "pick whatever is there" solution as people may rely on "poll" performance profile, and could be surprised if something else is picked. What about:

- introduce a DEFAULT.eventlet_hub configuration entry to allow explicit hub selection, defaulting to "poll"
- extend DEFAULT.eventlet_hub values to include "auto" for convenience, which would pick what is available. Could be used at least for tests if people are not interested in eventlet details.
- find a way to communicate "eventlet_hub" from the test environment. I had a solution using a GLANCE_TEST_EVENTLET_HUB environment variable, if it can be done with a test configuration file I would prefer that but I could not find anything like it.
- make FunctionalTest.wait_for_servers() suggest to use GLANCE_TEST_EVENTLET_HUB when failing on timeout and an eventlet error message appear in the logs (this is fragile, depends on the error string/localization), but still can save time for newcomers like me.

What do you think?

Revision history for this message
Brian Waldon (bcwaldon) wrote :

Well, I guess nobody else has any feedback :)

I don't know the best answer for you, but I'll suggest you move forward with a configuration-based approach. Using 'auto' sounds like it should work, and we can revisit this later if it causes problems.

Thierry Carrez (ttx)
Changed in glance:
importance: Critical → High
milestone: folsom-1 → folsom-2
Revision history for this message
Brian Waldon (bcwaldon) wrote :

Have you been able to make any more progress with this?

Thierry Carrez (ttx)
Changed in glance:
importance: High → Medium
milestone: folsom-2 → folsom-3
Revision history for this message
Brian Waldon (bcwaldon) wrote :

I'm going to close this bug as the hanging tests are at least timing out now. I'll file a new one representing the failing tests.

Changed in glance:
status: In Progress → Triaged
status: Triaged → Fix Committed
milestone: folsom-3 → folsom-2
Thierry Carrez (ttx)
Changed in glance:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in glance:
milestone: folsom-2 → 2012.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.