iter_nodes not robust to eventlet trampolining

Bug #1180110 reported by Darrell Bishop
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
High
Samuel Merritt

Bug Description

The generator, swift.proxy.controllers.base.Controller.iter_nodes(), is used concurrently by swift.proxy.controllers.base.Controller.make_requests(). If iter_nodes() trampolines, as it will when sending the StatsD UDP packet if StatsD logging is configured, then another greenthread may try to use the generator resulting in an exception like "ValueError: generator already executing" and a 500 response from the proxy.

I first observed this with the probe tests against my SAIO when I had StatsD logging enabled. I got a stacktrace like this in the proxy-server log: http://paste.openstack.org/show/lS1UzUupEgsXjyfY1VkF/

The probe test failure itself looked like this: http://paste.openstack.org/show/1qdH3ACijK12qSNt2WSx/

This blog post discusses the problem and a potential solution: http://anandology.com/blog/using-iterators-and-generators/

Revision history for this message
Samuel Merritt (torgomatic) wrote :

Marking as high priority since it causes 500s.

(Also, I'm going to fix it, so I can mark it whatever I want. :P)

Changed in swift:
assignee: nobody → Samuel Merritt (torgomatic)
importance: Undecided → High
status: New → Confirmed
Changed in swift:
status: Confirmed → In Progress
Revision history for this message
Sergio Rubio (rubiojr) wrote :

We have observed this too under high concurrency.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/29125
Committed: http://github.com/openstack/swift/commit/bd3ca778d10c461210f26ea3d194cacc70bef975
Submitter: Jenkins
Branch: master

commit bd3ca778d10c461210f26ea3d194cacc70bef975
Author: Samuel Merritt <email address hidden>
Date: Tue May 14 13:04:01 2013 -0700

    Fix ValueError from handoff logging.

    If you have StatsD logging turned on, then the iterator returned by
    Ring.get_part_nodes() will emit StatsD packets when it yields a
    handoff node. That network IO may make eventlet trampoline to another
    greenthread before returning from next(). Now, if that other
    greenthread tries to call next() on that same iterator, it blows up
    with a ValueError.

    Any socket IO inside a generator's next() method can cause this. It's
    easiest to reproduce with StatsD logging turned on, but logging to
    syslog can trigger it too.

    You can see this happen sometimes in the proxy's make_requests method
    if two of the primary nodes are down. Greenthread A goes into next()
    to get a handoff node, then sends a StatsD packet, and so eventlet
    trampolines to Greenthread B. Now, Greenthread B also calls next() to
    get a handoff node, and dies with a ValueError.

    This commit wraps up concurrently-accessed iter_nodes() iterators in a
    new thing called a GreenthreadSafeIterator that serializes access.

    Bug 1180110
    Change-Id: I8fe13d7295c056a2cab9e084f5966078a49bdc13

Changed in swift:
status: In Progress → Fix Committed
Changed in swift:
milestone: none → 1.9.0
Thierry Carrez (ttx)
Changed in swift:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.