CLI will fail one time after restarting DB

Bug #1389985 reported by Song Li
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceilometer
Fix Committed
Undecided
Lan Qi song
Juno
Fix Committed
Undecided
Unassigned
Glance
In Progress
Undecided
Louis Taylor
OpenStack Compute (nova)
Incomplete
Undecided
Unassigned
OpenStack Identity (keystone)
Incomplete
Undecided
Unassigned
oslo.db
New
Undecided
Unassigned

Bug Description

After restarting database, the first command will fail. for example:
after restarting Database, and wait for a few minutes.
Then run heat stack-list, result will be like below:

ERROR: Remote error: DBConnectionError (OperationalError) ibm_db_dbi::OperationalError: SQLNumResultCols failed: [IBM][CLI Driver] SQL30081N A communication error has been detected. Communication protocol being used: "TCP/IP". Communication API being used: "SOCKETS". Location where the error was detected: "10.11.1.14". Communication function detecting the error: "send". Protocol specific error code(s): "2", "*", "*". SQLSTATE=08001 SQLCODE=-30081 'SELECT stack.status_reason AS stack_status_reason, stack.created_at AS stack_created_at, stack.deleted_at AS stack_deleted_at, stack.action AS stack_action, stack.status AS stack_status, stack.id AS stack_id, stack.name AS stack_name, stack.raw_template_id AS stack_raw_template_id, stack.username AS stack_username, stack.tenant AS stack_tenant, stack.parameters AS stack_parameters, stack.user_creds_id AS stack_user_creds_id, stack.owner_id AS stack_owner_id, stack.timeout AS stack_timeout, stack.disable_rollback AS stack_disable_rollback, stack.stack_user_project_id AS stack_stack_user_project_id, stack.backup AS stack_backup, stack.updated_at AS stack_updated_at \nFROM stack \nWHERE stack.deleted_at IS NULL AND stack.owner_id IS NULL AND stack.tenant = ? ORDER BY stack.created_at DESC, stack.id DESC' ('a3a14c6f82bd4ce88273822407a0829b',)
[u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply\n incoming.message))\n', u' File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch\n return self._do_dispatch(endpoint, method, ctxt, args)\n', u' File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch\n result = getattr(endpoint, method)(ctxt, **new_args)\n', u' File "/usr/lib/python2.6/site-packages/heat/engine/service.py", line 69, in wrapped\n return func(self, ctx, *args, **kwargs)\n', u' File "/usr/lib/python2.6/site-packages/heat/engine/service.py", line 490, in list_stacks\n return [api.format_stack(stack) for stack in stacks]\n', u' File "/usr/lib/python2.6/site-packages/heat/engine/stack.py", line 264, in load_all\n show_deleted, show_nested) or []\n', u' File "/usr/lib/python2.6/site-packages/heat/db/api.py", line 130, in stack_get_all\n show_deleted, show_nested)\n', u' File "/usr/lib/python2.6/site-packages/heat/db/sqlalchemy/api.py", line 368, in stack_get_all\n marker, sort_dir, filters).all()\n', u' File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2241, in all\n return list(self)\n', u' File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2353, in __iter__\n return self._execute_and_instances(context)\n', u' File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2368, in _execute_and_instances\n result = conn.execute(querycontext.statement, self._params)\n', u' File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 662, in execute\n params)\n', u' File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 761, in _execute_clauseelement\n compiled_sql, distilled_params\n', u' File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 874, in _execute_context\n context)\n', u' File "/usr/lib/python2.6/site-packages/oslo/db/sqlalchemy/compat/handle_error.py", line 125, in _handle_dbapi_exception\n six.reraise(type(newraise), newraise, sys.exc_info()[2])\n', u' File "/usr/lib/python2.6/site-packages/oslo/db/sqlalchemy/compat/handle_error.py", line 102, in _handle_dbapi_exception\n per_fn = fn(ctx)\n', u' File "/usr/lib/python2.6/site-packages/oslo/db/sqlalchemy/exc_filters.py", line 323, in handler\n context.is_disconnect)\n', u' File "/usr/lib/python2.6/site-packages/oslo/db/sqlalchemy/exc_filters.py", line 263, in _is_db_connection_error\n raise exception.DBConnectionError(operational_error)\n', u'DBConnectionError: (OperationalError) ibm_db_dbi::OperationalError: SQLNumResultCols failed: [IBM][CLI Driver] SQL30081N A communication error has been detected. Communication protocol being used: "TCP/IP". Communication API being used: "SOCKETS". Location where the error was detected: "10.11.1.14". Communication function detecting the error: "send". Protocol specific error code(s): "2", "*", "*". SQLSTATE=08001 SQLCODE=-30081 \'SELECT stack.status_reason AS stack_status_reason, stack.created_at AS stack_created_at, stack.deleted_at AS stack_deleted_at, stack.action AS stack_action, stack.status AS stack_status, stack.id AS stack_id, stack.name AS stack_name, stack.raw_template_id AS stack_raw_template_id, stack.username AS stack_username, stack.tenant AS stack_tenant, stack.parameters AS stack_parameters, stack.user_creds_id AS stack_user_creds_id, stack.owner_id AS stack_owner_id, stack.timeout AS stack_timeout, stack.disable_rollback AS stack_disable_rollback, stack.stack_user_project_id AS stack_stack_user_project_id, stack.backup AS stack_backup, stack.updated_at AS stack_updated_at \\nFROM stack \\nWHERE stack.deleted_at IS NULL AND stack.owner_id IS NULL AND stack.tenant = ? ORDER BY stack.created_at DESC, stack.id DESC\' (\'a3a14c6f82bd4ce88273822407a0829b\',)\n'].

then run heat stack-list or other command, the command will be ok.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Seems like this is an oslo.db issue, not specific to any of its consumers.

Revision history for this message
Song Li (lisong-cruise) wrote :

yes, thanks Clint, I will try to make sure whether it is oslo.db issue as soon as possible, then I will move the issue to oslo. Thanks

Revision history for this message
Ai Jie Niu (niuaj) wrote :

hi, Client, yes, I think when oslo.db lost the connection to DB2, it can not reconnect to it automatically, but report a error at the first time it found the connection lost

Revision history for this message
Louis Taylor (kragniz) wrote :

There is a patch under review to fix this in glance: https://review.openstack.org/#/c/122114/

This could be the wrong approach to fixing the problem, but it just extends the current method of handling deadlocks to also deal with connection errors.

Changed in glance:
status: New → Confirmed
assignee: nobody → Louis Taylor (kragniz)
status: Confirmed → In Progress
Song Li (lisong-cruise)
Changed in heat:
assignee: nobody → Song Li (lisong-cruise)
Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

Wasn't this issue already addressed in oslo.db? This looks an awful lot like https://bugs.launchpad.net/keystone/+bug/1374497

Changed in keystone:
status: New → Incomplete
Joe Gordon (jogo)
Changed in nova:
status: New → Incomplete
Revision history for this message
Song Li (lisong-cruise) wrote :

@Morgan Fainberg

Thanks you very much for your reminder, I have looked into the issue:
https://bugs.launchpad.net/keystone/+bug/1374497

They are really very similar, I will make sure that with the Owner and then duplicate our issue to the 1374497.

Thanks again :)

Revision history for this message
Song Li (lisong-cruise) wrote :

I have tried the patch in https://bugs.launchpad.net/keystone/+bug/1374497
It can resolve our issue. thanks:)

no longer affects: heat
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ceilometer (master)

Fix proposed to branch: master
Review: https://review.openstack.org/135186

Changed in ceilometer:
assignee: nobody → Lan Qi song (lqslan)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ceilometer (master)

Reviewed: https://review.openstack.org/135186
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=8c6841d3c00931204eaba0e9058707629120c1da
Submitter: Jenkins
Branch: master

commit 8c6841d3c00931204eaba0e9058707629120c1da
Author: lqslan <email address hidden>
Date: Tue Nov 18 15:38:51 2014 +0800

    Retry to connect database when DB2 or mongodb is restarted

    The patch https://review.openstack.org/#/c/122387 works fine
    with operations with get, record and update functions.
    But exception would still occured with the operation of
    db.collection.find() function.

    This patch can give some benefit to tolerate DB restart
    with find() function.
    This patch also removes "test_mongo_find" test case since
    it doesn't raise AutoReconnect exception at all.

    Change-Id: Ia0474726960ce2b4b611fda0a1c304bb8ad96922
    Closes-Bug: #1389985

Changed in ceilometer:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ceilometer (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/140223

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ceilometer (stable/juno)

Reviewed: https://review.openstack.org/140223
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=529b4aaf50d34881ce0869b74688200adf462ea0
Submitter: Jenkins
Branch: stable/juno

commit 529b4aaf50d34881ce0869b74688200adf462ea0
Author: lqslan <email address hidden>
Date: Tue Nov 18 15:38:51 2014 +0800

    Retry to connect database when DB2 or mongodb is restarted

    The patch https://review.openstack.org/#/c/122387 works fine
    with operations with get, record and update functions.
    But exception would still occured with the operation of
    db.collection.find() function.

    This patch can give some benefit to tolerate DB restart
    with find() function.
    This patch also removes "test_mongo_find" test case since
    it doesn't raise AutoReconnect exception at all.

    Closes-Bug: #1389985
    Change-Id: Ia0474726960ce2b4b611fda0a1c304bb8ad96922
    (cherry-picked from commit 8c6841d3c00931204eaba0e9058707629120c1da)

tags: added: in-stable-juno
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.