OSProfiler does not catch DB error events

Bug #1706299 reported by Ilya Shakhat
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
osprofiler
Fix Released
Undecided
Ilya Shakhat

Bug Description

Consider the following scenario: user traces OpenStack command and an error occurs in database layer (e.g. connection is closed). This leads to 2 issues:

1. OSProfiler does not catch 'db-stop' event, nor any other errors from DB

2. All further events are mis-aligned, causing wrong interpretation of the trace.

Example
-------

Keystone trace usually have the following events:

wsgi-start -> POST/identity/v2.0/tokens
db-start -> SELECT 1
db-stop
db-start -> SELECT user...
db-stop
....
wsgi-stop

If DB error occurs during execution of 'SELECT user...' then the trace looks as following:

wsgi-start -> POST/identity/v2.0/tokens
db-start -> SELECT 1
db-stop
db-start -> SELECT user...
db-stop

The last 'db-stop' event is actually emitted from WSGI interceptor, but it has wrong name because the name is calculated based on stack status (https://github.com/openstack/osprofiler/blob/master/osprofiler/profiler.py#L393).

Possible solutions
------------------

a) Subscribe to SQLAlchemy events

There are 2 types of events related to errors:
 * 'dbapi_error' intercepts a raw DB error, but it is deprecated for a long time
 * 'handle_error' intercepts exceptions processed by SQLAlchemy connection. Oslo.db is already subscribed to this event, the handler maps low-level exceptions and raises custom instead (https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/exc_filters.py#L431-L478). However this behavior makes all other handlers to be ignored (according to http://docs.sqlalchemy.org/en/latest/core/events.html?highlight=context#sqlalchemy.events.ConnectionEvents.handle_error if custom handler raises exception then all other handlers are bypassed).

b) Force WSGIMiddleware to ignore Profiler's stack.
Middleware works like interceptor and OSProfiler can control when the trace starts and when it stops, there is no need to rely on stack state. However this solution just fixes the alignment, but DB error will still be missing.

Tovin Seven (tovin07)
Changed in osprofiler:
status: New → Confirmed
Revision history for this message
Ilya Shakhat (shakhat) wrote :

Related change into oslo.db: https://review.openstack.org/#/c/486980/

Revision history for this message
Tovin Seven (tovin07) wrote :

Thanks Ilya for the patch.

Hi Ilya, did related change in oslo.db fix this?

Revision history for this message
Ilya Shakhat (shakhat) wrote :

Hi Tovin, change in oslo.db only allows to have more than one sqlalchemy error handler. The remaining work items are:
 1. wait for a new release of oslo.db
 2. bump oslo.db version in requirements
 3. add to osprofiler a handler for 'handle_error' events.
I will make a patch for 3) once 1) and 2) are ready.

Revision history for this message
Tovin Seven (tovin07) wrote :

Thanks Ilya, I got it! :D

Ilya Shakhat (shakhat)
Changed in osprofiler:
assignee: nobody → Ilya Shakhat (shakhat)
Changed in osprofiler:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to osprofiler (master)

Reviewed: https://review.openstack.org/514587
Committed: https://git.openstack.org/cgit/openstack/osprofiler/commit/?id=70dabd508dfcce9f7afa968dec4fbf169e37274e
Submitter: Zuul
Branch: master

commit 70dabd508dfcce9f7afa968dec4fbf169e37274e
Author: Ilya Shakhat <email address hidden>
Date: Tue Oct 24 12:02:02 2017 +0200

    Handle and report SQLAlchemy errors

    Add handling of SQLAlchemy errors into OSProfiler. For errors
    profiler will emit stop event with information about error
    (Python exception).

    Closes bug: #1706299

    Change-Id: Ib9c1ea2ee3fb0450b8591ff1164f3e7153c5472a

Changed in osprofiler:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/osprofiler 1.14.0

This issue was fixed in the openstack/osprofiler 1.14.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.