RPC concurrency problem

Bug #716427 reported by Soren Hansen
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Unassigned
Bexar
Fix Released
High
Thierry Carrez

Bug Description

If I have just a few clients connecting at the same time, I get this sort of stuff in nova-api.log:

2011-02-10 13:34:51,570 DEBUG nova.compute.api [-] Casting to scheduler for soren/soren's instance 7 from MainProcess (pid=14464) create /usr/lib/pymodules/python2.7/nova/compute/api.py:190
2011-02-10 13:34:51,571 DEBUG nova.rpc [-] Making asynchronous cast... from MainProcess (pid=14464) cast /usr/lib/pymodules/python2.7/nova/rpc.py:347
2011-02-10 13:34:51,574 ERROR nova.api [4QU6596JK6SLEQRAFSG8 soren soren] Unexpected error raised: Second simultaneous read on fileno 7 detected. Unless you really know what you're doing, make sure that only one greenthread can read any particular socket. Consider using a pools.Pool. If you do know what you're doing and want to disable this error, call eventlet.debug.hub_multiple_reader_prevention(False)

rpc.cast does not pass new=True to Connection.instance, so this is not surprising. I could just change the instance() call to instance(True), but I wonder if there really are situations where it's safe to use a shared connection when we have these eventlet thread things going on?

Related branches

Revision history for this message
Ewan Mellor (ewanmellor) wrote :

We're seeing this in our stress tests. This is clearly a very high priority issue. Thanks for the fix, Soren!

Changed in nova:
status: New → Confirmed
Changed in nova:
status: Confirmed → Fix Committed
Thierry Carrez (ttx)
tags: added: bexar-post-release
Thierry Carrez (ttx)
tags: removed: bexar-post-release
Revision history for this message
Sateesh (sateesh-chodapuneedi) wrote :

Verified the fix patched over lp:nova/bexar burwell in nova-compute nodes running on top of XenServer and ESXi server. Not seeing the reported errors in nova-api.log files. Thanks Soren!

Revision history for this message
Soren Hansen (soren) wrote :

I've also verified it in a 2011.1 vs 2011.1.1. This problem is gone in 2011.1.1. I simple ran euca-run-instances multiple times from a command line (e.g. "euca-run-instances blah & euca-run-instances blah & euca-run-instances &").

Thierry Carrez (ttx)
Changed in nova:
milestone: none → 2011.2
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.