RPC concurrency problem

Bug #716427 reported by Soren Hansen on 2011-02-10
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Unassigned
Bexar
High
Thierry Carrez

Bug Description

If I have just a few clients connecting at the same time, I get this sort of stuff in nova-api.log:

2011-02-10 13:34:51,570 DEBUG nova.compute.api [-] Casting to scheduler for soren/soren's instance 7 from MainProcess (pid=14464) create /usr/lib/pymodules/python2.7/nova/compute/api.py:190
2011-02-10 13:34:51,571 DEBUG nova.rpc [-] Making asynchronous cast... from MainProcess (pid=14464) cast /usr/lib/pymodules/python2.7/nova/rpc.py:347
2011-02-10 13:34:51,574 ERROR nova.api [4QU6596JK6SLEQRAFSG8 soren soren] Unexpected error raised: Second simultaneous read on fileno 7 detected. Unless you really know what you're doing, make sure that only one greenthread can read any particular socket. Consider using a pools.Pool. If you do know what you're doing and want to disable this error, call eventlet.debug.hub_multiple_reader_prevention(False)

rpc.cast does not pass new=True to Connection.instance, so this is not surprising. I could just change the instance() call to instance(True), but I wonder if there really are situations where it's safe to use a shared connection when we have these eventlet thread things going on?

Related branches

Ewan Mellor (ewanmellor) wrote :

We're seeing this in our stress tests. This is clearly a very high priority issue. Thanks for the fix, Soren!

Changed in nova:
status: New → Confirmed
Changed in nova:
status: Confirmed → Fix Committed
Thierry Carrez (ttx) on 2011-02-15
tags: added: bexar-post-release
Thierry Carrez (ttx) on 2011-02-16
tags: removed: bexar-post-release
Sateesh (sateesh-chodapuneedi) wrote :

Verified the fix patched over lp:nova/bexar burwell in nova-compute nodes running on top of XenServer and ESXi server. Not seeing the reported errors in nova-api.log files. Thanks Soren!

Soren Hansen (soren) wrote :

I've also verified it in a 2011.1 vs 2011.1.1. This problem is gone in 2011.1.1. I simple ran euca-run-instances multiple times from a command line (e.g. "euca-run-instances blah & euca-run-instances blah & euca-run-instances &").

Thierry Carrez (ttx) on 2011-04-15
Changed in nova:
milestone: none → 2011.2
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers