glance qpid notifier can hang under heavy load

Bug #1229042 reported by Attila Fazekas on 2013-09-23
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
Medium
Attila Fazekas
Grizzly
Medium
Flavio Percoco

Bug Description

Glance qpid notifier can hang under heavy image creation load.

The issue happens because of two issse:

- The qpid notifier instance can be called by multiple green thread concurrently, one thread may recreate the connection object, while other thread working.
The connection object should be local variable instead of object variable, in order to avoid unwanted modification, or replacement.

The second issue the python-qpid uses PipeWaiter with a select.select. The select is not monkey patched to the be green thread friendly, is can causes hang issue.

All other AMQP user openstack competent monkey patches the thread and select modules, but glance not.
Usually good practice to make everything what is possible green thread/eventlet safe and make the application more preemptive.

cinder, neutron makes everything evenetlet friend, nova excludes the 'os' module.
I would recommend to make everything greenlet friendly.

Changed in glance:
status: New → Triaged
importance: Undecided → Medium
milestone: none → havana-rc1
Changed in glance:
assignee: nobody → Attila Fazekas (afazekas)
status: Triaged → In Progress
Attila Fazekas (afazekas) wrote :

A reproducer attached.

Alan Pevec (apevec) on 2013-09-30
tags: added: grizzly-backport-potential

Reviewed: https://review.openstack.org/47786
Committed: http://github.com/openstack/glance/commit/2e7aa761b6c2b31f4cbd9703ee19090b6757508a
Submitter: Jenkins
Branch: master

commit 2e7aa761b6c2b31f4cbd9703ee19090b6757508a
Author: Attila Fazekas <email address hidden>
Date: Mon Sep 23 08:44:37 2013 +0200

    Fixing glance-api hangs in the qpid notifier

    Glance-api was able to hang in qpid notifier under heavy image creation load.

    The ``thread`` and ``select`` modules used by the python-qpid for managing
    the AMQP connection. When the eventlet was not able to switch between threads
    because leaded to hang and/or pipe(2) leaking issues.

    * Monkey patching the ``select`` and ``thread`` modules to be eventlet friendly
      in order to avoid hanging issues.

    * The reference to the connection object in the QpidStrategy
      was replaceable by a concurrent thread, which could cause various issues.
      Using just local variables for storing connection object in order to avoid
      concurrent unsafe manipulation.

    Fixing bug 1229042

    Change-Id: I8fa8c4f36892b96d406216cb3c64854a94ca9df7

Thierry Carrez (ttx) on 2013-10-01
Changed in glance:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2013-10-02
Changed in glance:
status: Fix Committed → Fix Released

Reviewed: https://review.openstack.org/50258
Committed: http://github.com/openstack/glance/commit/9a557c8f54e8ac0db2f48bb95296ea2a9ef9a7bb
Submitter: Jenkins
Branch: stable/grizzly

commit 9a557c8f54e8ac0db2f48bb95296ea2a9ef9a7bb
Author: Attila Fazekas <email address hidden>
Date: Mon Sep 23 08:44:37 2013 +0200

    Fixing glance-api hangs in the qpid notifier

    Glance-api was able to hang in qpid notifier under heavy image creation load.

    The ``thread`` and ``select`` modules used by the python-qpid for managing
    the AMQP connection. When the eventlet was not able to switch between threads
    because leaded to hang and/or pipe(2) leaking issues.

    * Monkey patching the ``select`` and ``thread`` modules to be eventlet friendly
      in order to avoid hanging issues.

    * The reference to the connection object in the QpidStrategy
      was replaceable by a concurrent thread, which could cause various issues.
      Using just local variables for storing connection object in order to avoid
      concurrent unsafe manipulation.

    Fixing bug 1229042

    Change-Id: I8fa8c4f36892b96d406216cb3c64854a94ca9df7
    (cherry picked from commit 2e7aa761b6c2b31f4cbd9703ee19090b6757508a)

tags: added: in-stable-grizzly
Thierry Carrez (ttx) on 2013-10-17
Changed in glance:
milestone: havana-rc1 → 2013.2
Alan Pevec (apevec) on 2014-03-30
tags: removed: grizzly-backport-potential in-stable-grizzly
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments