neutron

Bug #1837635
Comment #8

Comment 8 for bug 1837635

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-10: Fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/679431
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b7bf8363333bcd39705f01b7c50bdfeddbf1c836
Submitter: Zuul
Branch: stable/stein

commit b7bf8363333bcd39705f01b7c50bdfeddbf1c836
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Wed Jul 24 11:17:19 2019 +0000

Refactor the L3 agent batch notifier

This patch is the first one of a series of patches improving how the L3
agents update the router HA state to the Neutron server.

    This patch partially reverts the previous patch [1]. When the batch
    notifier sends events, it calls the callback method passed during the
    initialization, in this case AgentMixin.notify_server. The batch
    notifier spawns a new thread in charge of sending the notifications and
    then wait the specified "batch_interval" time. If the callback method is
    not synchronous with the notify thread execution (what [1] implemented),
    the thread can finish while the RPC client is still sending the
    HA router states. If another HA state update is received, then both
    updates can be executed at the same time. It is possible then that a new
    router state can be overwritten with an old one still not sent or
    processed.

    The batch notifier is refactored, to improve what initally was
    implemented [2] and then updated [3]. Currently, each new event thread
    can update the "pending_events" list. Then, a new thread is spawned to
    process this event list. This thread decouples the current execution
    from the calling thread, making the event processing a non-blocking
    process.

    But with the current implementation, each new process will spawn a new
    thread, synchronized with the previous and new ones (using a
    synchronized decorator). That means, during the batch interval time, the
    system can have as many threads waiting as new events received. Those
    threads will end secuentially when the previous threads end the batch
    interval sleep time.

    Instead of this, this patch receives and enqueue each new event and
    allows only one thread to be alive while processing the event list. If
    at the end of the processing loop new events are stored, the thread will
    process then.

    [1] I3f555a0c78fbc02d8214f12b62c37d140bc71da1
    [2] I2f8cf261f48bdb632ac0bd643a337290b5297fce
    [3] I82f403441564955345f47877151e0c457712dd2f

Partial-Bug: #1837635

Change-Id: I20cfa1cf5281198079f5e0dbf195755abc919581
(cherry picked from commit 8b7d2c8a93fdf69a828f14bd527d8f132b27bc6e)

Reviewed:  https://review.opendev.org/679431
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b7bf8363333bcd39705f01b7c50bdfeddbf1c836
Submitter: Zuul
Branch:    stable/stein

commit b7bf8363333bcd39705f01b7c50bdfeddbf1c836
Author: Rodolfo Alonso Hernandez <ralonsoh@redhat.com>
Date:   Wed Jul 24 11:17:19 2019 +0000

Refactor the L3 agent batch notifier
    
    This patch is the first one of a series of patches improving how the L3
    agents update the router HA state to the Neutron server.
    
    This patch partially reverts the previous patch [1]. When the batch
    notifier sends events, it calls the callback method passed during the
    initialization, in this case AgentMixin.notify_server. The batch
    notifier spawns a new thread in charge of sending the notifications and
    then wait the specified "batch_interval" time. If the callback method is
    not synchronous with the notify thread execution (what [1] implemented),
    the thread can finish while the RPC client is still sending the
    HA router states. If another HA state update is received, then both
    updates can be executed at the same time. It is possible then that a new
    router state can be overwritten with an old one still not sent or
    processed.
    
    The batch notifier is refactored, to improve what initally was
    implemented [2] and then updated [3]. Currently, each new event thread
    can update the "pending_events" list. Then, a new thread is spawned to
    process this event list. This thread decouples the current execution
    from the calling thread, making the event processing a non-blocking
    process.
    
    But with the current implementation, each new process will spawn a new
    thread, synchronized with the previous and new ones (using a
    synchronized decorator). That means, during the batch interval time, the
    system can have as many threads waiting as new events received. Those
    threads will end secuentially when the previous threads end the batch
    interval sleep time.
    
    Instead of this, this patch receives and enqueue each new event and
    allows only one thread to be alive while processing the event list. If
    at the end of the processing loop new events are stored, the thread will
    process then.
    
    [1] I3f555a0c78fbc02d8214f12b62c37d140bc71da1
    [2] I2f8cf261f48bdb632ac0bd643a337290b5297fce
    [3] I82f403441564955345f47877151e0c457712dd2f
    
    Partial-Bug: #1837635
    
    Change-Id: I20cfa1cf5281198079f5e0dbf195755abc919581
    (cherry picked from commit 8b7d2c8a93fdf69a828f14bd527d8f132b27bc6e)