[dvr][ha][dataplane down] router_gateway port binding host goes wrong after the 'master' host down/up

Bug #1793529 reported by LIU Yulong
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Unassigned

Bug Description

ENV:
master
devstack multinode install:
1 controller node
2 compute nodes -> dvr_no_external (compute1, compute2)
2 network nodes -> dvr_snat (network1, network2)

Problem:
For L3 DVR HA router, when the network node, which hosting the `master` router, is down and up.
The router port `device_owner = network:router_gateway` will be binding back to the `down` host.

How to reproduce:
1. create DVR_HA router connecting user private network and public external network
2. create a VM in dvr_no_external compute in private network
3. create floating IP and associate to the VM port
4. directly reboot the `master` router located host (network1)

Some testing output:

(1) before reboot:
router `master` is stay in network1
http://paste.openstack.org/show/730423/
(2)
during reboot:
http://paste.openstack.org/show/730424/

(3) after reboot:
router `master` is in network2
http://paste.openstack.org/show/730422/
network:router_gateway port is binding back to network1.
network:router_centralized_snat is normally binding to network2.

LIU Yulong (dragon889)
description: updated
summary: - [dvr_no_external][ha] router_gateway port binding host goes wrong after
- the HA state change
+ [dvr][ha] router_gateway port binding host goes wrong after the HA state
+ change
summary: - [dvr][ha] router_gateway port binding host goes wrong after the HA state
- change
+ [dvr][ha] router_gateway port binding host goes wrong after the 'master'
+ host down/up
LIU Yulong (dragon889)
summary: - [dvr][ha] router_gateway port binding host goes wrong after the 'master'
- host down/up
+ [dvr][ha][dataplane down] router_gateway port binding host goes wrong
+ after the 'master' host down/up
tags: added: l3-dvr-backlog
Revision history for this message
LIU Yulong (dragon889) wrote :

This may also cause the centralized floating IPs connection down, especially the public (external) network is using the vlan provider network type.

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

Probably a possible bug. I need to triage this.

LIU Yulong (dragon889)
Changed in neutron:
assignee: nobody → LIU Yulong (dragon889)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/606384

Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/606384
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1973a037c29b0fc8bf6347771ed930726d1648f5
Submitter: Zuul
Branch: master

commit 1973a037c29b0fc8bf6347771ed930726d1648f5
Author: LIU Yulong <email address hidden>
Date: Fri Sep 28 18:33:28 2018 +0800

    Fix dvr ha router gateway goes wrong host

    During L3 agent restart, the dvr ha router gateway port
    binding host may change because the multiple ha router
    scheduled hosts.

    After this patch, we return the 'master' ha binding host
    directly during the gateway port create. And do not let
    the original 'master' (current is backup) host override
    the gateway port binding host.

    Closes-Bug: #1793529
    Change-Id: Icb2112c7f0bd42c4f4b1cf32d6b83b6d97f85ef7

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/612452

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/612453

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/612456

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/612453
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3578ab87a2b1d1872e1334314aba20f08d52dd5b
Submitter: Zuul
Branch: stable/queens

commit 3578ab87a2b1d1872e1334314aba20f08d52dd5b
Author: LIU Yulong <email address hidden>
Date: Fri Sep 28 18:33:28 2018 +0800

    Fix dvr ha router gateway goes wrong host

    During L3 agent restart, the dvr ha router gateway port
    binding host may change because the multiple ha router
    scheduled hosts.

    After this patch, we return the 'master' ha binding host
    directly during the gateway port create. And do not let
    the original 'master' (current is backup) host override
    the gateway port binding host.

    Closes-Bug: #1793529
    Change-Id: Icb2112c7f0bd42c4f4b1cf32d6b83b6d97f85ef7
    (cherry picked from commit 1973a037c29b0fc8bf6347771ed930726d1648f5)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.openstack.org/612452
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b411b5ff59ef182b1e392d0a41dfa02d5e0ecc6b
Submitter: Zuul
Branch: stable/rocky

commit b411b5ff59ef182b1e392d0a41dfa02d5e0ecc6b
Author: LIU Yulong <email address hidden>
Date: Fri Sep 28 18:33:28 2018 +0800

    Fix dvr ha router gateway goes wrong host

    During L3 agent restart, the dvr ha router gateway port
    binding host may change because the multiple ha router
    scheduled hosts.

    After this patch, we return the 'master' ha binding host
    directly during the gateway port create. And do not let
    the original 'master' (current is backup) host override
    the gateway port binding host.

    Closes-Bug: #1793529
    Change-Id: Icb2112c7f0bd42c4f4b1cf32d6b83b6d97f85ef7
    (cherry picked from commit 1973a037c29b0fc8bf6347771ed930726d1648f5)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/612456
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=44ed7e5eb1fa7345450305352d0d5d5ee8a4a321
Submitter: Zuul
Branch: stable/pike

commit 44ed7e5eb1fa7345450305352d0d5d5ee8a4a321
Author: LIU Yulong <email address hidden>
Date: Fri Sep 28 18:33:28 2018 +0800

    Fix dvr ha router gateway goes wrong host

    During L3 agent restart, the dvr ha router gateway port
    binding host may change because the multiple ha router
    scheduled hosts.

    After this patch, we return the 'master' ha binding host
    directly during the gateway port create. And do not let
    the original 'master' (current is backup) host override
    the gateway port binding host.

    Closes-Bug: #1793529
    Change-Id: Icb2112c7f0bd42c4f4b1cf32d6b83b6d97f85ef7

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.2

This issue was fixed in the openstack/neutron 13.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.6

This issue was fixed in the openstack/neutron 11.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.5

This issue was fixed in the openstack/neutron 12.0.5 release.

tags: added: neutron-proactive-backport-potential
tags: removed: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.0.0b1

This issue was fixed in the openstack/neutron 14.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/640451

Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in neutron:
assignee: LIU Yulong (dragon889) → nobody
tags: added: timeout-abandon
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/ocata)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: stable/ocata
Review: https://review.opendev.org/640451
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.