Bug #1837635 “HA router state change from “standby” to “master” ...” : Bugs : neutron

Rodolfo Alonso (rodolfo-alonso-hernandez) on 2019-07-23

Changed in neutron:
assignee:	nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)

Boden R (boden) on 2019-07-23

tags:

added: l3-ha

Revision history for this message

LIU Yulong (dragon889) wrote on 2019-07-24:

#1

In the following bug, I noticed some similar log, it is not related to the higher IP, but maybe it will cause the same problem as here:
https://bugs.launchpad.net/neutron/+bug/1798475/comments/14
https://bugs.launchpad.net/neutron/+bug/1798475/comments/15
https://bugs.launchpad.net/neutron/+bug/1798475/comments/16
https://bugs.launchpad.net/neutron/+bug/1798475/comments/17

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-07-24: Fix proposed to neutron (master)

#2

Fix proposed to branch: master
Review: https://review.opendev.org/672533

Changed in neutron:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-07-24:

#3

Fix proposed to branch: master
Review: https://review.opendev.org/672568

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-05: Fix merged to neutron (master)

#4

Reviewed: https://review.opendev.org/672533
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=8b7d2c8a93fdf69a828f14bd527d8f132b27bc6e
Submitter: Zuul
Branch: master

commit 8b7d2c8a93fdf69a828f14bd527d8f132b27bc6e
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Wed Jul 24 11:17:19 2019 +0000

Refactor the L3 agent batch notifier

This patch is the first one of a series of patches improving how the L3
agents update the router HA state to the Neutron server.

    This patch partially reverts the previous patch [1]. When the batch
    notifier sends events, it calls the callback method passed during the
    initialization, in this case AgentMixin.notify_server. The batch
    notifier spawns a new thread in charge of sending the notifications and
    then wait the specified "batch_interval" time. If the callback method is
    not synchronous with the notify thread execution (what [1] implemented),
    the thread can finish while the RPC client is still sending the
    HA router states. If another HA state update is received, then both
    updates can be executed at the same time. It is possible then that a new
    router state can be overwritten with an old one still not sent or
    processed.

    The batch notifier is refactored, to improve what initally was
    implemented [2] and then updated [3]. Currently, each new event thread
    can update the "pending_events" list. Then, a new thread is spawned to
    process this event list. This thread decouples the current execution
    from the calling thread, making the event processing a non-blocking
    process.

    But with the current implementation, each new process will spawn a new
    thread, synchronized with the previous and new ones (using a
    synchronized decorator). That means, during the batch interval time, the
    system can have as many threads waiting as new events received. Those
    threads will end secuentially when the previous threads end the batch
    interval sleep time.

    Instead of this, this patch receives and enqueue each new event and
    allows only one thread to be alive while processing the event list. If
    at the end of the processing loop new events are stored, the thread will
    process then.

    [1] I3f555a0c78fbc02d8214f12b62c37d140bc71da1
    [2] I2f8cf261f48bdb632ac0bd643a337290b5297fce
    [3] I82f403441564955345f47877151e0c457712dd2f

Partial-Bug: #1837635

Change-Id: I20cfa1cf5281198079f5e0dbf195755abc919581

Reviewed:  https://review.opendev.org/672533
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=8b7d2c8a93fdf69a828f14bd527d8f132b27bc6e
Submitter: Zuul
Branch:    master

commit 8b7d2c8a93fdf69a828f14bd527d8f132b27bc6e
Author: Rodolfo Alonso Hernandez <ralonsoh@redhat.com>
Date:   Wed Jul 24 11:17:19 2019 +0000

Refactor the L3 agent batch notifier
    
    This patch is the first one of a series of patches improving how the L3
    agents update the router HA state to the Neutron server.
    
    This patch partially reverts the previous patch [1]. When the batch
    notifier sends events, it calls the callback method passed during the
    initialization, in this case AgentMixin.notify_server. The batch
    notifier spawns a new thread in charge of sending the notifications and
    then wait the specified "batch_interval" time. If the callback method is
    not synchronous with the notify thread execution (what [1] implemented),
    the thread can finish while the RPC client is still sending the
    HA router states. If another HA state update is received, then both
    updates can be executed at the same time. It is possible then that a new
    router state can be overwritten with an old one still not sent or
    processed.
    
    The batch notifier is refactored, to improve what initally was
    implemented [2] and then updated [3]. Currently, each new event thread
    can update the "pending_events" list. Then, a new thread is spawned to
    process this event list. This thread decouples the current execution
    from the calling thread, making the event processing a non-blocking
    process.
    
    But with the current implementation, each new process will spawn a new
    thread, synchronized with the previous and new ones (using a
    synchronized decorator). That means, during the batch interval time, the
    system can have as many threads waiting as new events received. Those
    threads will end secuentially when the previous threads end the batch
    interval sleep time.
    
    Instead of this, this patch receives and enqueue each new event and
    allows only one thread to be alive while processing the event list. If
    at the end of the processing loop new events are stored, the thread will
    process then.
    
    [1] I3f555a0c78fbc02d8214f12b62c37d140bc71da1
    [2] I2f8cf261f48bdb632ac0bd643a337290b5297fce
    [3] I82f403441564955345f47877151e0c457712dd2f
    
    Partial-Bug: #1837635
    
    Change-Id: I20cfa1cf5281198079f5e0dbf195755abc919581

Bernard Cafarelli (bcafarel) on 2019-08-23

tags:

added: neutron-proactive-backport-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-29:

#5

Reviewed: https://review.opendev.org/672568
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3f022a193f66fde3bfd945af1119a60dfe91cb91
Submitter: Zuul
Branch: master

commit 3f022a193f66fde3bfd945af1119a60dfe91cb91
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Wed Jul 24 16:32:02 2019 +0000

Delay HA router transition from "backup" to "master"

    As described in the bug, when a HA router transitions from "master" to
    "backup", "keepalived" processes will set the virtual IP in all other
    HA routers. Each HA router will then advert it and "keepalived" will
    decide, according to a trivial algorithm (higher interface IP), which
    one should be "master". At this point, the other "keepalived" processes
    running in the other servers, will remove the HA router virtual IP
    assigned an instant before

    To avoid transitioning some routers form "backup" to "master" and then
    to "backup" in a very short period, this patch delays the "backup" to
    "master" transition, waiting for a possible new "backup" state. If
    during the waiting period (set to the HA VRRP advert time, 2 seconds
    default) to set the HA state to "master", the L3 agent receives a new
    "backup" HA state, the L3 agent does nothing.

Closes-Bug: #1837635

Change-Id: I70037da9cdd0f8448e0af8dd96b4e3f5de5728ad

Changed in neutron:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-30: Fix proposed to neutron (stable/stein)

#6

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/679431

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-30:

#7

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/679438

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-10: Fix merged to neutron (stable/stein)

#8

Reviewed: https://review.opendev.org/679431
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b7bf8363333bcd39705f01b7c50bdfeddbf1c836
Submitter: Zuul
Branch: stable/stein

commit b7bf8363333bcd39705f01b7c50bdfeddbf1c836
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Wed Jul 24 11:17:19 2019 +0000

Refactor the L3 agent batch notifier

This patch is the first one of a series of patches improving how the L3
agents update the router HA state to the Neutron server.

    This patch partially reverts the previous patch [1]. When the batch
    notifier sends events, it calls the callback method passed during the
    initialization, in this case AgentMixin.notify_server. The batch
    notifier spawns a new thread in charge of sending the notifications and
    then wait the specified "batch_interval" time. If the callback method is
    not synchronous with the notify thread execution (what [1] implemented),
    the thread can finish while the RPC client is still sending the
    HA router states. If another HA state update is received, then both
    updates can be executed at the same time. It is possible then that a new
    router state can be overwritten with an old one still not sent or
    processed.

    The batch notifier is refactored, to improve what initally was
    implemented [2] and then updated [3]. Currently, each new event thread
    can update the "pending_events" list. Then, a new thread is spawned to
    process this event list. This thread decouples the current execution
    from the calling thread, making the event processing a non-blocking
    process.

    But with the current implementation, each new process will spawn a new
    thread, synchronized with the previous and new ones (using a
    synchronized decorator). That means, during the batch interval time, the
    system can have as many threads waiting as new events received. Those
    threads will end secuentially when the previous threads end the batch
    interval sleep time.

    Instead of this, this patch receives and enqueue each new event and
    allows only one thread to be alive while processing the event list. If
    at the end of the processing loop new events are stored, the thread will
    process then.

    [1] I3f555a0c78fbc02d8214f12b62c37d140bc71da1
    [2] I2f8cf261f48bdb632ac0bd643a337290b5297fce
    [3] I82f403441564955345f47877151e0c457712dd2f

Partial-Bug: #1837635

Change-Id: I20cfa1cf5281198079f5e0dbf195755abc919581
(cherry picked from commit 8b7d2c8a93fdf69a828f14bd527d8f132b27bc6e)

Reviewed:  https://review.opendev.org/679431
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b7bf8363333bcd39705f01b7c50bdfeddbf1c836
Submitter: Zuul
Branch:    stable/stein

commit b7bf8363333bcd39705f01b7c50bdfeddbf1c836
Author: Rodolfo Alonso Hernandez <ralonsoh@redhat.com>
Date:   Wed Jul 24 11:17:19 2019 +0000

Refactor the L3 agent batch notifier
    
    This patch is the first one of a series of patches improving how the L3
    agents update the router HA state to the Neutron server.
    
    This patch partially reverts the previous patch [1]. When the batch
    notifier sends events, it calls the callback method passed during the
    initialization, in this case AgentMixin.notify_server. The batch
    notifier spawns a new thread in charge of sending the notifications and
    then wait the specified "batch_interval" time. If the callback method is
    not synchronous with the notify thread execution (what [1] implemented),
    the thread can finish while the RPC client is still sending the
    HA router states. If another HA state update is received, then both
    updates can be executed at the same time. It is possible then that a new
    router state can be overwritten with an old one still not sent or
    processed.
    
    The batch notifier is refactored, to improve what initally was
    implemented [2] and then updated [3]. Currently, each new event thread
    can update the "pending_events" list. Then, a new thread is spawned to
    process this event list. This thread decouples the current execution
    from the calling thread, making the event processing a non-blocking
    process.
    
    But with the current implementation, each new process will spawn a new
    thread, synchronized with the previous and new ones (using a
    synchronized decorator). That means, during the batch interval time, the
    system can have as many threads waiting as new events received. Those
    threads will end secuentially when the previous threads end the batch
    interval sleep time.
    
    Instead of this, this patch receives and enqueue each new event and
    allows only one thread to be alive while processing the event list. If
    at the end of the processing loop new events are stored, the thread will
    process then.
    
    [1] I3f555a0c78fbc02d8214f12b62c37d140bc71da1
    [2] I2f8cf261f48bdb632ac0bd643a337290b5297fce
    [3] I82f403441564955345f47877151e0c457712dd2f
    
    Partial-Bug: #1837635
    
    Change-Id: I20cfa1cf5281198079f5e0dbf195755abc919581
    (cherry picked from commit 8b7d2c8a93fdf69a828f14bd527d8f132b27bc6e)

tags:

added: in-stable-stein

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-12:

#9

Reviewed: https://review.opendev.org/679438
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=adac5d9b7a72b4edeba5357c6a47e7e528fcf775
Submitter: Zuul
Branch: stable/stein

commit adac5d9b7a72b4edeba5357c6a47e7e528fcf775
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Wed Jul 24 16:32:02 2019 +0000

Delay HA router transition from "backup" to "master"

    As described in the bug, when a HA router transitions from "master" to
    "backup", "keepalived" processes will set the virtual IP in all other
    HA routers. Each HA router will then advert it and "keepalived" will
    decide, according to a trivial algorithm (higher interface IP), which
    one should be "master". At this point, the other "keepalived" processes
    running in the other servers, will remove the HA router virtual IP
    assigned an instant before

    To avoid transitioning some routers form "backup" to "master" and then
    to "backup" in a very short period, this patch delays the "backup" to
    "master" transition, waiting for a possible new "backup" state. If
    during the waiting period (set to the HA VRRP advert time, 2 seconds
    default) to set the HA state to "master", the L3 agent receives a new
    "backup" HA state, the L3 agent does nothing.

Closes-Bug: #1837635

Change-Id: I70037da9cdd0f8448e0af8dd96b4e3f5de5728ad
(cherry picked from commit 3f022a193f66fde3bfd945af1119a60dfe91cb91)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-19: Fix included in openstack/neutron 15.0.0.0b1

#10

This issue was fixed in the openstack/neutron 15.0.0.0b1 development milestone.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-22: Fix included in openstack/neutron 14.0.3

#11

This issue was fixed in the openstack/neutron 14.0.3 release.

Edward Hope-Morley (hopem) on 2019-11-11

tags:

added: sts

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-15: Fix merged to neutron (stable/rocky)

#12

Reviewed: https://review.opendev.org/719968
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7682d2fa77108b148ede651525458babc1b30d8d
Submitter: Zuul
Branch: stable/rocky

commit 7682d2fa77108b148ede651525458babc1b30d8d
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Wed Jul 24 16:32:02 2019 +0000

Delay HA router transition from "backup" to "master"

    As described in the bug, when a HA router transitions from "master" to
    "backup", "keepalived" processes will set the virtual IP in all other
    HA routers. Each HA router will then advert it and "keepalived" will
    decide, according to a trivial algorithm (higher interface IP), which
    one should be "master". At this point, the other "keepalived" processes
    running in the other servers, will remove the HA router virtual IP
    assigned an instant before

    To avoid transitioning some routers form "backup" to "master" and then
    to "backup" in a very short period, this patch delays the "backup" to
    "master" transition, waiting for a possible new "backup" state. If
    during the waiting period (set to the HA VRRP advert time, 2 seconds
    default) to set the HA state to "master", the L3 agent receives a new
    "backup" HA state, the L3 agent does nothing.

Conflicts:
neutron/agent/l3/agent.py

Closes-Bug: #1837635

    Change-Id: I70037da9cdd0f8448e0af8dd96b4e3f5de5728ad
    (cherry picked from commit 3f022a193f66fde3bfd945af1119a60dfe91cb91)
    (cherry picked from commit adac5d9b7a72b4edeba5357c6a47e7e528fcf775)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-20: Fix proposed to neutron (stable/rocky)

#13

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/721243

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-22: Fix merged to neutron (stable/rocky)

#14

Reviewed: https://review.opendev.org/721243
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=2d849c6fee4fdf14e0ecc5242f6c9cc12aae8cbc
Submitter: Zuul
Branch: stable/rocky

commit 2d849c6fee4fdf14e0ecc5242f6c9cc12aae8cbc
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Wed Jul 24 11:17:19 2019 +0000

Refactor the L3 agent batch notifier

This patch is the first one of a series of patches improving how the L3
agents update the router HA state to the Neutron server.

    This patch partially reverts the previous patch [1]. When the batch
    notifier sends events, it calls the callback method passed during the
    initialization, in this case AgentMixin.notify_server. The batch
    notifier spawns a new thread in charge of sending the notifications and
    then wait the specified "batch_interval" time. If the callback method is
    not synchronous with the notify thread execution (what [1] implemented),
    the thread can finish while the RPC client is still sending the
    HA router states. If another HA state update is received, then both
    updates can be executed at the same time. It is possible then that a new
    router state can be overwritten with an old one still not sent or
    processed.

    The batch notifier is refactored, to improve what initally was
    implemented [2] and then updated [3]. Currently, each new event thread
    can update the "pending_events" list. Then, a new thread is spawned to
    process this event list. This thread decouples the current execution
    from the calling thread, making the event processing a non-blocking
    process.

    But with the current implementation, each new process will spawn a new
    thread, synchronized with the previous and new ones (using a
    synchronized decorator). That means, during the batch interval time, the
    system can have as many threads waiting as new events received. Those
    threads will end secuentially when the previous threads end the batch
    interval sleep time.

    Instead of this, this patch receives and enqueue each new event and
    allows only one thread to be alive while processing the event list. If
    at the end of the processing loop new events are stored, the thread will
    process then.

    [1] I3f555a0c78fbc02d8214f12b62c37d140bc71da1
    [2] I2f8cf261f48bdb632ac0bd643a337290b5297fce
    [3] I82f403441564955345f47877151e0c457712dd2f

Partial-Bug: #1837635

Change-Id: I20cfa1cf5281198079f5e0dbf195755abc919581
(cherry picked from commit 8b7d2c8a93fdf69a828f14bd527d8f132b27bc6e)

Reviewed:  https://review.opendev.org/721243
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=2d849c6fee4fdf14e0ecc5242f6c9cc12aae8cbc
Submitter: Zuul
Branch:    stable/rocky

commit 2d849c6fee4fdf14e0ecc5242f6c9cc12aae8cbc
Author: Rodolfo Alonso Hernandez <ralonsoh@redhat.com>
Date:   Wed Jul 24 11:17:19 2019 +0000

Refactor the L3 agent batch notifier
    
    This patch is the first one of a series of patches improving how the L3
    agents update the router HA state to the Neutron server.
    
    This patch partially reverts the previous patch [1]. When the batch
    notifier sends events, it calls the callback method passed during the
    initialization, in this case AgentMixin.notify_server. The batch
    notifier spawns a new thread in charge of sending the notifications and
    then wait the specified "batch_interval" time. If the callback method is
    not synchronous with the notify thread execution (what [1] implemented),
    the thread can finish while the RPC client is still sending the
    HA router states. If another HA state update is received, then both
    updates can be executed at the same time. It is possible then that a new
    router state can be overwritten with an old one still not sent or
    processed.
    
    The batch notifier is refactored, to improve what initally was
    implemented [2] and then updated [3]. Currently, each new event thread
    can update the "pending_events" list. Then, a new thread is spawned to
    process this event list. This thread decouples the current execution
    from the calling thread, making the event processing a non-blocking
    process.
    
    But with the current implementation, each new process will spawn a new
    thread, synchronized with the previous and new ones (using a
    synchronized decorator). That means, during the batch interval time, the
    system can have as many threads waiting as new events received. Those
    threads will end secuentially when the previous threads end the batch
    interval sleep time.
    
    Instead of this, this patch receives and enqueue each new event and
    allows only one thread to be alive while processing the event list. If
    at the end of the processing loop new events are stored, the thread will
    process then.
    
    [1] I3f555a0c78fbc02d8214f12b62c37d140bc71da1
    [2] I2f8cf261f48bdb632ac0bd643a337290b5297fce
    [3] I82f403441564955345f47877151e0c457712dd2f
    
    Partial-Bug: #1837635
    
    Change-Id: I20cfa1cf5281198079f5e0dbf195755abc919581
    (cherry picked from commit 8b7d2c8a93fdf69a828f14bd527d8f132b27bc6e)

tags:

added: in-stable-rocky

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-23: Fix merged to neutron (stable/queens)

#15

Reviewed: https://review.opendev.org/721244
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3f8ee68f116fd38fd18d6a5832f1934d212bc321
Submitter: Zuul
Branch: stable/queens

commit 3f8ee68f116fd38fd18d6a5832f1934d212bc321
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Wed Jul 24 11:17:19 2019 +0000

Refactor the L3 agent batch notifier

This patch is the first one of a series of patches improving how the L3
agents update the router HA state to the Neutron server.

    This patch partially reverts the previous patch [1]. When the batch
    notifier sends events, it calls the callback method passed during the
    initialization, in this case AgentMixin.notify_server. The batch
    notifier spawns a new thread in charge of sending the notifications and
    then wait the specified "batch_interval" time. If the callback method is
    not synchronous with the notify thread execution (what [1] implemented),
    the thread can finish while the RPC client is still sending the
    HA router states. If another HA state update is received, then both
    updates can be executed at the same time. It is possible then that a new
    router state can be overwritten with an old one still not sent or
    processed.

    The batch notifier is refactored, to improve what initally was
    implemented [2] and then updated [3]. Currently, each new event thread
    can update the "pending_events" list. Then, a new thread is spawned to
    process this event list. This thread decouples the current execution
    from the calling thread, making the event processing a non-blocking
    process.

    But with the current implementation, each new process will spawn a new
    thread, synchronized with the previous and new ones (using a
    synchronized decorator). That means, during the batch interval time, the
    system can have as many threads waiting as new events received. Those
    threads will end secuentially when the previous threads end the batch
    interval sleep time.

    Instead of this, this patch receives and enqueue each new event and
    allows only one thread to be alive while processing the event list. If
    at the end of the processing loop new events are stored, the thread will
    process then.

    [1] I3f555a0c78fbc02d8214f12b62c37d140bc71da1
    [2] I2f8cf261f48bdb632ac0bd643a337290b5297fce
    [3] I82f403441564955345f47877151e0c457712dd2f

Partial-Bug: #1837635

Change-Id: I20cfa1cf5281198079f5e0dbf195755abc919581
(cherry picked from commit 8b7d2c8a93fdf69a828f14bd527d8f132b27bc6e)

Reviewed:  https://review.opendev.org/721244
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3f8ee68f116fd38fd18d6a5832f1934d212bc321
Submitter: Zuul
Branch:    stable/queens

commit 3f8ee68f116fd38fd18d6a5832f1934d212bc321
Author: Rodolfo Alonso Hernandez <ralonsoh@redhat.com>
Date:   Wed Jul 24 11:17:19 2019 +0000

Refactor the L3 agent batch notifier
    
    This patch is the first one of a series of patches improving how the L3
    agents update the router HA state to the Neutron server.
    
    This patch partially reverts the previous patch [1]. When the batch
    notifier sends events, it calls the callback method passed during the
    initialization, in this case AgentMixin.notify_server. The batch
    notifier spawns a new thread in charge of sending the notifications and
    then wait the specified "batch_interval" time. If the callback method is
    not synchronous with the notify thread execution (what [1] implemented),
    the thread can finish while the RPC client is still sending the
    HA router states. If another HA state update is received, then both
    updates can be executed at the same time. It is possible then that a new
    router state can be overwritten with an old one still not sent or
    processed.
    
    The batch notifier is refactored, to improve what initally was
    implemented [2] and then updated [3]. Currently, each new event thread
    can update the "pending_events" list. Then, a new thread is spawned to
    process this event list. This thread decouples the current execution
    from the calling thread, making the event processing a non-blocking
    process.
    
    But with the current implementation, each new process will spawn a new
    thread, synchronized with the previous and new ones (using a
    synchronized decorator). That means, during the batch interval time, the
    system can have as many threads waiting as new events received. Those
    threads will end secuentially when the previous threads end the batch
    interval sleep time.
    
    Instead of this, this patch receives and enqueue each new event and
    allows only one thread to be alive while processing the event list. If
    at the end of the processing loop new events are stored, the thread will
    process then.
    
    [1] I3f555a0c78fbc02d8214f12b62c37d140bc71da1
    [2] I2f8cf261f48bdb632ac0bd643a337290b5297fce
    [3] I82f403441564955345f47877151e0c457712dd2f
    
    Partial-Bug: #1837635
    
    Change-Id: I20cfa1cf5281198079f5e0dbf195755abc919581
    (cherry picked from commit 8b7d2c8a93fdf69a828f14bd527d8f132b27bc6e)

tags:

added: in-stable-queens

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-25:

#16

Reviewed: https://review.opendev.org/719978
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=5b45f947fd7a25cc9b58e3b94f189ad50523a0b4
Submitter: Zuul
Branch: stable/queens

commit 5b45f947fd7a25cc9b58e3b94f189ad50523a0b4
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Wed Jul 24 16:32:02 2019 +0000

Delay HA router transition from "backup" to "master"

    As described in the bug, when a HA router transitions from "master" to
    "backup", "keepalived" processes will set the virtual IP in all other
    HA routers. Each HA router will then advert it and "keepalived" will
    decide, according to a trivial algorithm (higher interface IP), which
    one should be "master". At this point, the other "keepalived" processes
    running in the other servers, will remove the HA router virtual IP
    assigned an instant before

    To avoid transitioning some routers form "backup" to "master" and then
    to "backup" in a very short period, this patch delays the "backup" to
    "master" transition, waiting for a possible new "backup" state. If
    during the waiting period (set to the HA VRRP advert time, 2 seconds
    default) to set the HA state to "master", the L3 agent receives a new
    "backup" HA state, the L3 agent does nothing.

    Conflicts:
        neutron/agent/l3/agent.py
        neutron/agent/l3/ha_router.py

Closes-Bug: #1837635

    Change-Id: I70037da9cdd0f8448e0af8dd96b4e3f5de5728ad
    (cherry picked from commit 3f022a193f66fde3bfd945af1119a60dfe91cb91)
    (cherry picked from commit adac5d9b7a72b4edeba5357c6a47e7e528fcf775)
    (cherry picked from commit 7682d2fa77108b148ede651525458babc1b30d8d)

Revision history for this message

Chris MacNaughton (chris.macnaughton) wrote on 2020-07-20:

#17

This seems to be in the released Neutron at Stein, so I'm marking it as Fix-Released

Changed in cloud-archive:
status:	New → Invalid

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-18: Fix included in openstack/neutron queens-eol

#18

This issue was fixed in the openstack/neutron queens-eol release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-18: Fix included in openstack/neutron rocky-eol

#19

This issue was fixed in the openstack/neutron rocky-eol release.

	Status	Importance	Assigned to
Ubuntu Cloud Archive	Invalid	Undecided	Unassigned
Queens	Fix Released	Undecided	Unassigned
Rocky	Fix Released	Undecided	Unassigned
Stein	Fix Released	Undecided	Unassigned
neutron	Fix Released	Undecided	Rodolfo Alonso

neutron

HA router state change from "standby" to "master" should be delayed

Bug Description

Other bug subscribers

Remote bug watches