db retry not triggered when fail happened in after_create notify

Bug #1687913 reported by Wim De Clercq
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Undecided
Anusha K

Bug Description

Note:
- The specific use case can no longer happen on master (due to a couple of commits). So the below is for a < ocata context.
- Bug seen on Newton setup

During high concurrency testing (with router:external networks) the following deadlock may occur
http://paste.openstack.org/show/608690/

Deadlocks are normally 'okay', because the db retry mechanism will retry the request. But in this specific case it did not.

The issue happens here:
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/plugin.py#L769

- It's inside of a transaction
- the external_net_db code does a notify with AFTER_CREATE.
- in the AFTER_CREATE even processing, the deadlock happens

The problem is that an AFTER_CREATE event will not raise exceptions. It just logs.
But it IS inside of a transaction, and it did make the session invalid.

So the code continues, it tries to commit the invalid session. And the resulting exception of this is a

sqlalchemy.exc.InvalidRequestError - This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: ...

Since this exception type is not part of the db_retry exceptions, no retry happens and the request fails.

While this use case is a very specific one. Maybe some action is needed to avoid something like this happening in other places. Because any database error which occurs inside of an event notify which is not BEFORE_x or PRECOMMIT will have this behaviour: corrupt the session object, nothing raises, and the following error is not retriable.

(to easily reproduce on a test setup: add

    if event == events.AFTER_CREATE:
        try:
            context.session.add(models_v2.Network(name=256*'g'))
            context.session.flush() # this makes the session invalid
        except:
            raise db_exc.DBDeadlock()

to _ensure_external_network_default_value_callback in neutron.services.auto_allocate.db.py
and create a router:external network.

This should trigger the retry mechanism at first sight, but it won't.)

Anusha K (anusha25)
Changed in neutron:
assignee: nobody → Anusha K (anusha25)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.