OpenStack Image Registry and Delivery Service (Glance)

glance-api unresponsive during long-lived I/O-bound operations

Reported by Russell Bryant on 2012-09-18
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
Critical
Eoghan Glynn

Bug Description

The following commit changed image downloading when using copy_from to be asynchronous:

commit 41c164139cab619d0a3e0d97b80037f85eb541ad
Author: Eoghan Glynn <email address hidden>
Date: Wed Sep 5 14:33:47 2012 +0000

    Asynchronously copy from external image source

    Fixes bug 1008874, bug 1046433.

    Avoid tieing up dispatch thread for large copy-from images,
    instead initiate copy asynchronously.

    The response status is not set to 202 Accepted as per standard
    RESTful idiom, as a non-error response code change requires
    an API version bump.

    Instead, the incomplete nature of the image registration is
    reflected in the image status.

    Change-Id: I06692422490de0a7d93f63bbd0ffb9c6435a0d2b

Unfortunately, it appears that there is a greenthread scheduling problem that still leaves glance-api in a bad state while the image is downloaded.

[rbryant@f17-openstack-test-day ~]$ time glance add name=f16 is_public=true disk_format=qcow2 container_format=bare copy_from=http://berrange.fedorapeople.org/images/2012-02-29/f16-x86_64-openstack-sda.qcow2
Added new image with ID: 9717656c-2564-4b15-812c-8706ca038d2c

real 0m0.715s
user 0m0.117s
sys 0m0.033s
[rbryant@f17-openstack-test-day ~]$ time glance index
ID Name Disk Format Container Format Size
------------------------------------ ------------------------------ -------------------- -------------------- --------------
9717656c-2564-4b15-812c-8706ca038d2c f16 qcow2 bare 213581824

real 1m11.992s
user 0m0.109s
sys 0m0.022s
[rbryant@f17-openstack-test-day ~]$ time glance index
ID Name Disk Format Container Format Size
------------------------------------ ------------------------------ -------------------- -------------------- --------------
9717656c-2564-4b15-812c-8706ca038d2c f16 qcow2 bare 213581824

real 1m1.287s
user 0m0.124s
sys 0m0.061s

(repeat 'time glance index' 4 more times, with times varying from 1 to 1.5 minutes)

[rbryant@f17-openstack-test-day ~]$ time glance index
ID Name Disk Format Container Format Size
------------------------------------ ------------------------------ -------------------- -------------------- --------------
9717656c-2564-4b15-812c-8706ca038d2c f16 qcow2 bare 213581824

real 0m24.255s
user 0m0.125s
sys 0m0.034s
[rbryant@f17-openstack-test-day ~]$ time glance index
ID Name Disk Format Container Format Size
------------------------------------ ------------------------------ -------------------- -------------------- --------------
9717656c-2564-4b15-812c-8706ca038d2c f16 qcow2 bare 213581824

real 0m0.443s
user 0m0.110s
sys 0m0.017s

(all further instances return quickly like this)

description: updated
tags: added: folsom-rc-potential
Brian Waldon (bcwaldon) on 2012-09-18
Changed in glance:
assignee: nobody → Eoghan Glynn (eglynn)
milestone: none → grizzly-1
status: New → In Progress
importance: Undecided → Critical
tags: removed: folsom-rc-potential
Eoghan Glynn (eglynn) wrote :

Recently the glance copy-from logic was made asynchronous, so that a 202 response code is returned immediately, and the download from the remote location proceeds on a greenthread.

This change was the obvious candidate for causing the blockage on the subsequent API calls.

However, it turns out that with the copy-from reverting to its original synchronous form, or even just using a slow direct upload, we still see concurrent API calls being blocked.

So there's something more fundamental awry on the glance dispatch path.

Continuing to investigate ...

Thierry Carrez (ttx) on 2012-09-19
no longer affects: glance/grizzly
no longer affects: glance/folsom
Eoghan Glynn (eglynn) on 2012-09-19
summary: - glance-api unresponsive while downloading an image with copy_from
+ glance-api unresponsive during long-lived I/O-bound operations

Reviewed: https://review.openstack.org/13279
Committed: http://github.com/openstack/glance/commit/8f42dacecd713970c31d59a81fe1f9056e28196c
Submitter: Jenkins
Branch: master

commit 8f42dacecd713970c31d59a81fe1f9056e28196c
Author: Eoghan Glynn <email address hidden>
Date: Wed Sep 19 11:37:28 2012 +0100

    Ensure glance-api application is "greened"

    Fixes bug 1052640

    Avoid unresponsiveness during long-lived I/O-bound operations by
    ensuring that the standard socket libraries are monkey-patched in
    all code-paths.

    Change-Id: If672c26f2b462d1abcfc86e20256957f73f98fde

Changed in glance:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/13350
Committed: http://github.com/openstack/glance/commit/235c5a3b573333b826705e2c5115f0485f28acf6
Submitter: Jenkins
Branch: milestone-proposed

commit 235c5a3b573333b826705e2c5115f0485f28acf6
Author: Eoghan Glynn <email address hidden>
Date: Wed Sep 19 11:37:28 2012 +0100

    Ensure glance-api application is "greened"

    Fixes bug 1052640

    Avoid unresponsiveness during long-lived I/O-bound operations by
    ensuring that the standard socket libraries are monkey-patched in
    all code-paths.

    Change-Id: If672c26f2b462d1abcfc86e20256957f73f98fde

Changed in glance:
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2012-09-27
Changed in glance:
milestone: folsom-rc2 → 2012.2
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers