CLI will fail one time after restarting DB

Bug #1389985 reported by Song Li on 2014-11-06
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceilometer
Fix Committed
Undecided
Lan Qi song
Juno
Fix Committed
Undecided
Unassigned
Glance
Undecided
Louis Taylor
OpenStack Compute (nova)
Undecided
Unassigned
OpenStack Identity (keystone)
Undecided
Unassigned
oslo.db
Undecided
Unassigned

Bug Description

After restarting database, the first command will fail. for example:
after restarting Database, and wait for a few minutes.
Then run heat stack-list, result will be like below:

ERROR: Remote error: DBConnectionError (OperationalError) ibm_db_dbi::OperationalError: SQLNumResultCols failed: [IBM][CLI Driver] SQL30081N A communication error has been detected. Communication protocol being used: "TCP/IP". Communication API being used: "SOCKETS". Location where the error was detected: "10.11.1.14". Communication function detecting the error: "send". Protocol specific error code(s): "2", "*", "*". SQLSTATE=08001 SQLCODE=-30081 'SELECT stack.status_reason AS stack_status_reason, stack.created_at AS stack_created_at, stack.deleted_at AS stack_deleted_at, stack.action AS stack_action, stack.status AS stack_status, stack.id AS stack_id, stack.name AS stack_name, stack.raw_template_id AS stack_raw_template_id, stack.username AS stack_username, stack.tenant AS stack_tenant, stack.parameters AS stack_parameters, stack.user_creds_id AS stack_user_creds_id, stack.owner_id AS stack_owner_id, stack.timeout AS stack_timeout, stack.disable_rollback AS stack_disable_rollback, stack.stack_user_project_id AS stack_stack_user_project_id, stack.backup AS stack_backup, stack.updated_at AS stack_updated_at \nFROM stack \nWHERE stack.deleted_at IS NULL AND stack.owner_id IS NULL AND stack.tenant = ? ORDER BY stack.created_at DESC, stack.id DESC' ('a3a14c6f82bd4ce88273822407a0829b',)
[u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply\n incoming.message))\n', u' File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch\n return self._do_dispatch(endpoint, method, ctxt, args)\n', u' File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch\n result = getattr(endpoint, method)(ctxt, **new_args)\n', u' File "/usr/lib/python2.6/site-packages/heat/engine/service.py", line 69, in wrapped\n return func(self, ctx, *args, **kwargs)\n', u' File "/usr/lib/python2.6/site-packages/heat/engine/service.py", line 490, in list_stacks\n return [api.format_stack(stack) for stack in stacks]\n', u' File "/usr/lib/python2.6/site-packages/heat/engine/stack.py", line 264, in load_all\n show_deleted, show_nested) or []\n', u' File "/usr/lib/python2.6/site-packages/heat/db/api.py", line 130, in stack_get_all\n show_deleted, show_nested)\n', u' File "/usr/lib/python2.6/site-packages/heat/db/sqlalchemy/api.py", line 368, in stack_get_all\n marker, sort_dir, filters).all()\n', u' File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2241, in all\n return list(self)\n', u' File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2353, in __iter__\n return self._execute_and_instances(context)\n', u' File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2368, in _execute_and_instances\n result = conn.execute(querycontext.statement, self._params)\n', u' File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 662, in execute\n params)\n', u' File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 761, in _execute_clauseelement\n compiled_sql, distilled_params\n', u' File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 874, in _execute_context\n context)\n', u' File "/usr/lib/python2.6/site-packages/oslo/db/sqlalchemy/compat/handle_error.py", line 125, in _handle_dbapi_exception\n six.reraise(type(newraise), newraise, sys.exc_info()[2])\n', u' File "/usr/lib/python2.6/site-packages/oslo/db/sqlalchemy/compat/handle_error.py", line 102, in _handle_dbapi_exception\n per_fn = fn(ctx)\n', u' File "/usr/lib/python2.6/site-packages/oslo/db/sqlalchemy/exc_filters.py", line 323, in handler\n context.is_disconnect)\n', u' File "/usr/lib/python2.6/site-packages/oslo/db/sqlalchemy/exc_filters.py", line 263, in _is_db_connection_error\n raise exception.DBConnectionError(operational_error)\n', u'DBConnectionError: (OperationalError) ibm_db_dbi::OperationalError: SQLNumResultCols failed: [IBM][CLI Driver] SQL30081N A communication error has been detected. Communication protocol being used: "TCP/IP". Communication API being used: "SOCKETS". Location where the error was detected: "10.11.1.14". Communication function detecting the error: "send". Protocol specific error code(s): "2", "*", "*". SQLSTATE=08001 SQLCODE=-30081 \'SELECT stack.status_reason AS stack_status_reason, stack.created_at AS stack_created_at, stack.deleted_at AS stack_deleted_at, stack.action AS stack_action, stack.status AS stack_status, stack.id AS stack_id, stack.name AS stack_name, stack.raw_template_id AS stack_raw_template_id, stack.username AS stack_username, stack.tenant AS stack_tenant, stack.parameters AS stack_parameters, stack.user_creds_id AS stack_user_creds_id, stack.owner_id AS stack_owner_id, stack.timeout AS stack_timeout, stack.disable_rollback AS stack_disable_rollback, stack.stack_user_project_id AS stack_stack_user_project_id, stack.backup AS stack_backup, stack.updated_at AS stack_updated_at \\nFROM stack \\nWHERE stack.deleted_at IS NULL AND stack.owner_id IS NULL AND stack.tenant = ? ORDER BY stack.created_at DESC, stack.id DESC\' (\'a3a14c6f82bd4ce88273822407a0829b\',)\n'].

then run heat stack-list or other command, the command will be ok.

Clint Byrum (clint-fewbar) wrote :

Seems like this is an oslo.db issue, not specific to any of its consumers.

Song Li (lisong-cruise) wrote :

yes, thanks Clint, I will try to make sure whether it is oslo.db issue as soon as possible, then I will move the issue to oslo. Thanks

Ai Jie Niu (niuaj) wrote :

hi, Client, yes, I think when oslo.db lost the connection to DB2, it can not reconnect to it automatically, but report a error at the first time it found the connection lost

Louis Taylor (kragniz) wrote :

There is a patch under review to fix this in glance: https://review.openstack.org/#/c/122114/

This could be the wrong approach to fixing the problem, but it just extends the current method of handling deadlocks to also deal with connection errors.

Changed in glance:
status: New → Confirmed
assignee: nobody → Louis Taylor (kragniz)
status: Confirmed → In Progress
Song Li (lisong-cruise) on 2014-11-14
Changed in heat:
assignee: nobody → Song Li (lisong-cruise)
Morgan Fainberg (mdrnstm) wrote :

Wasn't this issue already addressed in oslo.db? This looks an awful lot like https://bugs.launchpad.net/keystone/+bug/1374497

Changed in keystone:
status: New → Incomplete
Joe Gordon (jogo) on 2014-11-14
Changed in nova:
status: New → Incomplete
Song Li (lisong-cruise) wrote :

@Morgan Fainberg

Thanks you very much for your reminder, I have looked into the issue:
https://bugs.launchpad.net/keystone/+bug/1374497

They are really very similar, I will make sure that with the Owner and then duplicate our issue to the 1374497.

Thanks again :)

Song Li (lisong-cruise) wrote :

I have tried the patch in https://bugs.launchpad.net/keystone/+bug/1374497
It can resolve our issue. thanks:)

no longer affects: heat

Fix proposed to branch: master
Review: https://review.openstack.org/135186

Changed in ceilometer:
assignee: nobody → Lan Qi song (lqslan)
status: New → In Progress

Reviewed: https://review.openstack.org/135186
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=8c6841d3c00931204eaba0e9058707629120c1da
Submitter: Jenkins
Branch: master

commit 8c6841d3c00931204eaba0e9058707629120c1da
Author: lqslan <email address hidden>
Date: Tue Nov 18 15:38:51 2014 +0800

    Retry to connect database when DB2 or mongodb is restarted

    The patch https://review.openstack.org/#/c/122387 works fine
    with operations with get, record and update functions.
    But exception would still occured with the operation of
    db.collection.find() function.

    This patch can give some benefit to tolerate DB restart
    with find() function.
    This patch also removes "test_mongo_find" test case since
    it doesn't raise AutoReconnect exception at all.

    Change-Id: Ia0474726960ce2b4b611fda0a1c304bb8ad96922
    Closes-Bug: #1389985

Changed in ceilometer:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/140223
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=529b4aaf50d34881ce0869b74688200adf462ea0
Submitter: Jenkins
Branch: stable/juno

commit 529b4aaf50d34881ce0869b74688200adf462ea0
Author: lqslan <email address hidden>
Date: Tue Nov 18 15:38:51 2014 +0800

    Retry to connect database when DB2 or mongodb is restarted

    The patch https://review.openstack.org/#/c/122387 works fine
    with operations with get, record and update functions.
    But exception would still occured with the operation of
    db.collection.find() function.

    This patch can give some benefit to tolerate DB restart
    with find() function.
    This patch also removes "test_mongo_find" test case since
    it doesn't raise AutoReconnect exception at all.

    Closes-Bug: #1389985
    Change-Id: Ia0474726960ce2b4b611fda0a1c304bb8ad96922
    (cherry-picked from commit 8c6841d3c00931204eaba0e9058707629120c1da)

tags: added: in-stable-juno
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers