Paramiko does not properly work with eventlet concurrency

Bug #1321787 reported by Viktor Serhieiev
42
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
High
Unassigned
tripleo
Fix Released
High
Unassigned
python-eventlet (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

UPD(dtantsur): Old title: `ironic node-set-power-state off` fails during Tripleo deployment

Tripleo deployment with Ironic, when it register baremetal nodes, with the following error

+ ironic node-set-power-state 1c55376d-19c5-473c-9c28-33e3c4549785 off
SSH connection cannot be established: Failed to establish SSH connection to host 192.168.122.1. (HTTP 400)

Here is the traceback from ironic-conductor.log

2014-05-21 13:16:12.342 5073 ERROR paramiko.transport [-] Exception: Error reading SSH protocol bannerSecond simultaneous read on fileno 8 detected. Unless you really know what you're
doing, make sure that only one greenthread can read any particular socket. Consider using a pools.Pool. If you do know what you're doing and want to disable this error, call eventlet.d
ebug.hub_prevent_multiple_readers(False)
2014-05-21 13:16:12.357 5073 ERROR paramiko.transport [-] Traceback (most recent call last):
2014-05-21 13:16:12.357 5073 ERROR paramiko.transport [-] File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/paramiko/transport.py", line 1412, in run
2014-05-21 13:16:12.358 5073 ERROR paramiko.transport [-] self._check_banner()
2014-05-21 13:16:12.358 5073 ERROR paramiko.transport [-] File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/paramiko/transport.py", line 1539, in _check_banner
2014-05-21 13:16:12.358 5073 ERROR paramiko.transport [-] raise SSHException('Error reading SSH protocol banner' + str(e))
2014-05-21 13:16:12.358 5073 ERROR paramiko.transport [-] SSHException: Error reading SSH protocol bannerSecond simultaneous read on fileno 8 detected. Unless you really know what you're doing, make sure that only one greenthread can read any particular socket. Consider using a pools.Pool. If you do know what you're doing and want to disable this error, call eventlet.debug.hub_prevent_multiple_readers(False)
2014-05-21 13:16:12.358 5073 ERROR paramiko.transport [-]
Traceback (most recent call last):
  File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 346, in fire_timers
    timer()
  File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/eventlet/hubs/timer.py", line 56, in __call__
    cb(*args, **kw)
  File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/eventlet/greenthread.py", line 194, in main
    result = function(*args, **kwargs)
  File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 92, in wrapper
    return f(*args, **kwargs)
  File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/ironic/conductor/utils.py", line 74, in node_power_action
    node.save(context)
  File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/ironic/openstack/common/excutils.py", line 82, in __exit__
    six.reraise(self.type_, self.value, self.tb)
  File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/ironic/conductor/utils.py", line 67, in node_power_action
    curr_state = task.driver.power.get_power_state(task, node)
  File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/ironic/drivers/modules/ssh.py", line 394, in get_power_state
    ssh_obj = _get_connection(node)
  File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/ironic/drivers/modules/ssh.py", line 250, in _get_connection
    return utils.ssh_connect(_parse_driver_info(node))
  File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/ironic/common/utils.py", line 113, in ssh_connect
    raise exception.SSHConnectFailed(host=connection.get('host'))
SSHConnectFailed: Failed to establish SSH connection to host 192.168.122.1.

Tags: ci driver ssh
Revision history for this message
Dmitry Tantsur (divius) wrote :

Sahara had similar problems: https://review.openstack.org/#/c/45716/

summary: - `ironic node-set-power-state off` fails during Tripleo deployment
+ Paramiko does not properly work with evenlet concurrency
description: updated
Changed in ironic:
status: New → Triaged
importance: Undecided → High
aeva black (tenbrae)
summary: - Paramiko does not properly work with evenlet concurrency
+ Paramiko does not properly work with eventlet concurrency
Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

Just hit this problem today, logs: http://paste.openstack.org/show/83620/ ... After the problem I could not delete my instance anymore.

Revision history for this message
aeva black (tenbrae) wrote :

For what it's worth, this seems to have been reported to Paramiko last year:
  https://github.com/paramiko/paramiko/issues/173

aeva black (tenbrae)
tags: added: driver
Changed in ironic:
milestone: none → juno-2
Revision history for this message
jan grant (jan-grant) wrote :

I'm following this up with the eventletdev list too. The "multiple simultaneous reads" is at the root cause of various different errors we've seen cropping up in production going back for at least 18 months; ironic seems to tickle it the worst, due to increased parallelism.

I've some WIP patches to eventlet that explore the problem, here: https://github.com/jan-g/eventlet

...but they're ugly hacks and still haven't bottomed out the problem yet.

It's entirely possible that this could be worked around, in large part, by putting the _same_ synchronized section around all uses of problematic libraries. That includes at the very least paramiko, and the calls to utils.execute that invoke qemu-img.

Revision history for this message
Robert Collins (lifeless) wrote :
Revision history for this message
jan grant (jan-grant) wrote :

It moves the problem around a bit :-) and potentially closes some problems, but doesn't make my threaded paramiko test work reliably.

Had another thought as to a potential approach to this; in the interim will follow up to os-dev.

Revision history for this message
jan grant (jan-grant) wrote :

It looks like we've got something which might work. It survives some parallel paramiko testing, anyway. We'll look to run tests with this tomorrow.

Revision history for this message
jan grant (jan-grant) wrote :

https://github.com/jan-g/eventlet/tree/wip has what we think is a working fix to eventlet. Trying it now.

Revision history for this message
jan grant (jan-grant) wrote :

(That looks positive. Going to look to sort out unit tests, etc, for that change.)

Revision history for this message
jan grant (jan-grant) wrote :

Okay. I've tidied up the eventlet patches. I'm told they're slated for release with 0.16.

Derek Higgins (derekh)
tags: added: ci
Changed in tripleo:
importance: Undecided → High
status: New → Triaged
aeva black (tenbrae)
Changed in ironic:
milestone: juno-2 → none
no longer affects: eventlet (Ubuntu)
Revision history for this message
Adam Gandelman (gandelman-a) wrote :

The fix appears to have been released upstream in eventlet 0.15.1

Changed in python-eventlet (Ubuntu):
status: New → Confirmed
Dmitry Tantsur (divius)
Changed in ironic:
milestone: none → juno-rc1
Revision history for this message
Dmitry Tantsur (divius) wrote :

Ironic now depends on eventlet>=0.15.1 and I hope this bug is fixed

Changed in ironic:
status: Triaged → Fix Committed
Derek Higgins (derekh)
Changed in tripleo:
status: Triaged → Fix Released
aeva black (tenbrae)
Changed in ironic:
status: Fix Committed → Fix Released
milestone: juno-rc1 → none
Chuck Short (zulcss)
Changed in python-eventlet (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.