OpenStack Image Registry and Delivery Service (Glance)

Glance does not recover database connection automatically after DB restart

Reported by Unmesh Gurjar on 2012-03-14
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
Low
Unmesh Gurjar
glance (Ubuntu)
Undecided
Unassigned
Precise
Undecided
Unassigned

Bug Description

Scenario: Configure Glance Registry to use MySQL for metadata storage. While Glance Registry is running, restart the MySQL service. Now execute the 'glance index' command to fetch image list from glance.

Actual Response: HTTP 500 Internal Server Error, with following error in the stack trace "OperationalError: (OperationalError) (2006, 'MySQL server has gone away')".

Expected Response: HTTP 200 (with list of registered images in response body).

Branch: milestone-proposed

This typically happens for the first connection (to database) after the database service restarts, the following requests work fine.
Once the database server comes up, Glance should re-establish the database connection and not return error.

Tags: ntt Edit Tag help
Changed in glance:
status: New → Confirmed
assignee: nobody → Unmesh Gurjar (unmesh-gurjar)
description: updated
Jay Pipes (jaypipes) wrote :

Setting to low priority since the subsequent requests work fine...

Changed in glance:
importance: Undecided → Low

Fix proposed to branch: master
Review: https://review.openstack.org/5552

Changed in glance:
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/5552
Committed: http://github.com/openstack/glance/commit/3f462fbf9a2b9cfe6ac39d87fe5b327ba82e9dac
Submitter: Jenkins
Branch: master

commit 3f462fbf9a2b9cfe6ac39d87fe5b327ba82e9dac
Author: Unmesh Gurjar <email address hidden>
Date: Tue Mar 20 12:07:39 2012 +0530

    Fixed db conn recovery issue. Fixes bug 954971.

    This implementation wraps the db method calls and retries the method if an attempt failed due to db connectivity issues. Defines 2 new configuration parameters viz sql_max_retries and sql_retry_interval (values default to 10 and 1 respectively).
    This implementation does not have any overhead in the normal case, as opposed to MySQLPingListener implementation, which does a 'select 1' query every time a connection is checkout from the pool.

    Change-Id: I24240f22bca445b9ab76a1594631b5eaca393b4d

Changed in glance:
status: In Progress → Fix Committed
Brian Waldon (bcwaldon) on 2012-03-27
Changed in glance:
milestone: none → essex-rc2

Reviewed: https://review.openstack.org/5889
Committed: http://github.com/openstack/glance/commit/98f53473008a04c113e35f9c61f0361fa9328420
Submitter: Jenkins
Branch: milestone-proposed

commit 98f53473008a04c113e35f9c61f0361fa9328420
Author: Unmesh Gurjar <email address hidden>
Date: Tue Mar 20 12:07:39 2012 +0530

    Fixed db conn recovery issue. Fixes bug 954971.

    This implementation wraps the db method calls and retries the method if an attempt failed due to db connectivity issues. Defines 2 new configuration parameters viz sql_max_retries and sql_retry_interval (values default to 10 and 1 respectively).
    This implementation does not have any overhead in the normal case, as opposed to MySQLPingListener implementation, which does a 'select 1' query every time a connection is checkout from the pool.

    Change-Id: I24240f22bca445b9ab76a1594631b5eaca393b4d

Changed in glance:
status: Fix Committed → Fix Released
Adam Gandelman (gandelman-a) wrote :

I'm not sure this is actually fixed the issue anymore and perhaps has regressed if the MySQLPingListener did in fact reconnect after servers have 'gone away' As a test, I'm starting glance, running 'glance index' and restarting the mysql server. The next run of 'glance index' produces a traceback on the client and server side, dumping the "Mysql server has gone away" error. I believe the functions in get_session + get_engine are not enough to catch the error when executed against the database connection. If someone else can confirm, this bug should be reopenned.

Adam Gandelman (gandelman-a) wrote :

Turns out there are some differences in exception behavior between python-sqlalchemy 0.6.8-1 (oneiric) and 0.7.4-1 (precise).

AFAICS, the exceptions being caught in the wrap_db() handling are not being similarly raised in the newer version. On oneiric, we were catching the sqlalchemy.exc.OperationalError exception out of /usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py:1726 in _execute_and_instances(). This code has changed in some recent version, and no there is no exception to be caught. The one that ends up raising up and causing traceback is coming from the mysqldb back-end dialect, and it would be a bad idea to start worrying about catching that kind of thing from glance/nova. I'm not sure if this is a bug in sqlalchemy, or if we should not rely on wrapping + catching this kind of thing, and instead use MySQLPingListener that was used before, as it seems to at least be portable.

Adam Gandelman (gandelman-a) wrote :

Correction: the issue is more that _execute_and_instances() previously was calling session.execute(), which we're wrapping in glance. Newer versions called execute directly on the connection. If we wished to wrap this as well, we'd need to be looking out for exceptions that are specific to the many backend connections supported by SQLA.

Thierry Carrez (ttx) on 2012-04-05
Changed in glance:
milestone: essex-rc2 → 2012.1
Chuck Short (zulcss) on 2012-05-28
Changed in glance (Ubuntu Precise):
status: New → Fix Released
Changed in glance (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers