novnc fails when amqp_rpc_single_reply_queue=True

Bug #1193031 reported by Nikola Đipanov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Unassigned
Grizzly
Fix Released
High
Nikola Đipanov

Bug Description

When amqp_rpc_single_reply_queue=True is set, using qpid on Fedora 17 novn service will fail to work reliably.

Reproduce by starting more then one instance, and then get theit vnc consoles with:

$ nova get-vnc-consol $INSTANCE_UUID_1 novnc
$ nova get-vnc-consol $INSTANCE_UUID_2 novnc

And try to connect to those consoles in your browser, to use them, and refresh the page. The page will eventually get stuck on 'Waiting for VNC handshake' and generaly not work properly.

I dug a bit deeper into this and it seems to be realted to a select call in wbesockify.py which does not work properly with eventlet monkey_patched sockets.

The solution is to either not monkey_patch for /cmd/nova-novncproxy and do the rpc call that validates the token with the consolauth service in a separate process that is monkey_patched (the rpc won't work without it), or to make openstack/common/rpc/amqp/multicall not use green-threads for this particular call (ie. block the caller).

Confirmed with trunk on 75ead3a2a37efbc6a4fdea7e492ca41cdd559a8c (20-6-2013)

Revision history for this message
Nikola Đipanov (ndipanov) wrote :

Also this fails even worse with Grizzly as there is no monkey patching on nova-novncproxy. see https://review.openstack.org/#/c/33319/

Changed in nova:
status: New → Confirmed
importance: Undecided → Critical
assignee: nobody → Nikola Đipanov (ndipanov)
milestone: none → havana-2
Changed in nova:
importance: Critical → High
Revision history for this message
Xavier Queralt (xqueralt-deactivatedaccount) wrote :

I can confirm that this issue also appears in ubuntu using havana.

I've done some digging on this and got to the conclusion that eventlet's epoll hub implementation doesn't play well with multiprocessing module. The issue doesn't happen if we force to use "poll" hub in the websocketproxy module or just monkey_patch inside the new_client method (as this is run in a separate process) :

websockify uses 'select' directly to decide when to write/read to a socket and multiprocessing to spawn the processes that will take care of the client. This makes me think that eventlet's epoll implementation is broken if used together with multiprocessing.

I'll update if I find something else while following the trail into eventlet's internals.

Revision history for this message
Xavier Queralt (xqueralt-deactivatedaccount) wrote :

Oh, to force eventlet's hub implementation to be poll just add the following in the top of the module:

from eventlet import hubs
hubs.use_hub("poll")

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/37419

Changed in nova:
assignee: Nikola Đipanov (ndipanov) → Xavier Queralt (xqueralt)
status: Confirmed → In Progress
Changed in nova:
milestone: havana-2 → havana-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/37419
Committed: http://github.com/openstack/nova/commit/cb25bc4530323aaa33d5c42eb01f998d463f2106
Submitter: Jenkins
Branch: master

commit cb25bc4530323aaa33d5c42eb01f998d463f2106
Author: Xavier Queralt <email address hidden>
Date: Wed Jul 17 01:31:36 2013 +0200

    Force reopening eventlet's hub after fork

    With this we reopen eventlet's hub after a fork (triggered from
    websockify when a new client connects) to prevent sharing epoll's fd
    with the parent, which may cause erratic behaviour.

    This caused novncproxy to stop working when it had more than two clients
    connected.

    Fixes bug #1193031

    Change-Id: I3ff9001543b84b1037597da243422490bb611657

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (milestone-proposed)

Fix proposed to branch: milestone-proposed
Review: https://review.openstack.org/37507

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/grizzly)

Fix proposed to branch: stable/grizzly
Review: https://review.openstack.org/37512

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/grizzly
Review: https://review.openstack.org/37526

Thierry Carrez (ttx)
Changed in nova:
milestone: havana-3 → havana-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (milestone-proposed)

Reviewed: https://review.openstack.org/37507
Committed: http://github.com/openstack/nova/commit/9315de02d67aaacc7e8da0de67ca477c2c0bc127
Submitter: Jenkins
Branch: milestone-proposed

commit 9315de02d67aaacc7e8da0de67ca477c2c0bc127
Author: Xavier Queralt <email address hidden>
Date: Wed Jul 17 01:31:36 2013 +0200

    Force reopening eventlet's hub after fork

    With this we reopen eventlet's hub after a fork (triggered from
    websockify when a new client connects) to prevent sharing epoll's fd
    with the parent, which may cause erratic behaviour.

    This caused novncproxy to stop working when it had more than two clients
    connected.

    Fixes bug #1193031

    Change-Id: I3ff9001543b84b1037597da243422490bb611657
    (cherry picked from commit cb25bc4530323aaa33d5c42eb01f998d463f2106)

Changed in nova:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/grizzly)

Reviewed: https://review.openstack.org/37526
Committed: http://github.com/openstack/nova/commit/11046d94456c63fda51f626e1b816fd57612a4ec
Submitter: Jenkins
Branch: stable/grizzly

commit 11046d94456c63fda51f626e1b816fd57612a4ec
Author: Nikola Dipanov <email address hidden>
Date: Wed Jul 17 18:34:36 2013 +0200

    Force reopening eventlet's hub after fork

    With this we reopen eventlet's hub after a fork (triggered from
    websockify when a new client connects) to prevent sharing epoll's fd
    with the parent, which may cause erratic behaviour.

    This caused novncproxy/spicehtml5proxy to stop working when it had more
    than two clients connected.

    Fixes bug #1193031

    Change-Id: I3ff9001543b84b1037597da243422490bb611657

Thierry Carrez (ttx)
Changed in nova:
milestone: havana-2 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.