Enable keepalived VRRP health check again

Bug #1825966 reported by Hua Zhang on 2019-04-23
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack neutron-gateway charm
Undecided
Unassigned
OpenStack neutron-openvswitch charm
Undecided
Unassigned

Bug Description

If you wanted to have vrrp watch the external networking interface today, the option ha_vrrp_health_check_interval [1] can help re-trigger the election process to make the system recover automatically, so we should enable it.

In fact, we've tried to enable it before [2], but then we have had to revert it [3] due to instability issues [4] in previous releases of Openstack.

Maybe the previous instability issue [4] was caused by another keepalived issue mentioned in the comment [5], today I tested this option again by the following detailed steps, it works.

# first create a neutron l3ha test env, then continue to do:
git clone https://github.com/openstack/charm-neutron-gateway.git neutron-gateway
cd neutron-gateway/
git fetch https://review.opendev.org/openstack/charm-neutron-gateway refs/changes/33/601533/1 && git format-patch -1 --stdout FETCH_HEAD > lp1732154.patch
git checkout master
patch -p1 < lp1732154.patch
juju upgrade-charm neutron-gateway --path $PWD

# install the script check_router_vrrp_transitions.sh in two neutron-gateway test nodes by:
wget https://gist.githubusercontent.com/dosaboy/cf8422f16605a76affa69a8db47f0897/raw/8e045160440ecf0f9dc580c8927b2bff9e9139f6/check_router_vrrp_transitions.sh
chmod +x check_router_vrrp_transitions.sh

This is test result, I haven't seen instability issue [4] now.

$ date; neutron l3-agent-list-hosting-router $(neutron router-show provider-router -c id -f value); juju ssh neutron-gateway/0 -- bash /home/ubuntu/check_router_vrrp_transitions.sh; juju ssh neutron-gateway/1 -- bash /home/ubuntu/check_router_vrrp_transitions.sh; sleep 40; date; neutron l3-agent-list-hosting-router $(neutron router-show provider-router -c id -f value); juju ssh neutron-gateway/0 -- bash /home/ubuntu/check_router_vrrp_transitions.sh; juju ssh neutron-gateway/1 -- bash /home/ubuntu/check_router_vrrp_transitions.sh;
Tue Apr 23 03:11:28 UTC 2019
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
Auth plugin requires parameters which were not given: auth_url
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
Auth plugin requires parameters which were not given: auth_url
Analysing keepalived vrrp transitions...1 active vrouters found (total 1):
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=false, vrid=VR_1, pid=16716, first=Apr-23-01:48:20, last=Apr-23-01:57:05) had 2 transition(s)
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=false, vrid=VR_1, pid=24269, first=Apr-23-02:22:16, last=Apr-23-02:22:28) had 2 transition(s)
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=true, vrid=VR_1, pid=29188, first=Apr-23-02:46:03, last=Apr-23-02:46:03) had 1 transition(s) (state=BACKUP)
Done.
Connection to 10.5.0.42 closed.
Analysing keepalived vrrp transitions...1 active vrouters found (total 1):
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=false, vrid=VR_1, pid=31249, first=Apr-23-01:48:26, last=Apr-23-02:21:53) had 2 transition(s)
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=true, vrid=VR_1, pid=6187, first=Apr-23-02:22:29, last=Apr-23-02:45:33) had 2 transition(s) (state=MASTER)
Done.
Connection to 10.5.0.36 closed.
Tue Apr 23 03:12:12 UTC 2019
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
Auth plugin requires parameters which were not given: auth_url
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
Auth plugin requires parameters which were not given: auth_url
Analysing keepalived vrrp transitions...1 active vrouters found (total 1):
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=false, vrid=VR_1, pid=16716, first=Apr-23-01:48:20, last=Apr-23-01:57:05) had 2 transition(s)
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=false, vrid=VR_1, pid=24269, first=Apr-23-02:22:16, last=Apr-23-02:22:28) had 2 transition(s)
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=true, vrid=VR_1, pid=29188, first=Apr-23-02:46:03, last=Apr-23-02:46:03) had 1 transition(s) (state=BACKUP)
Done.
Connection to 10.5.0.42 closed.
Analysing keepalived vrrp transitions...1 active vrouters found (total 1):
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=false, vrid=VR_1, pid=31249, first=Apr-23-01:48:26, last=Apr-23-02:21:53) had 2 transition(s)
router=b8d4435b-bd83-46fd-a828-6d8a0b52d23a (current=true, vrid=VR_1, pid=6187, first=Apr-23-02:22:29, last=Apr-23-02:45:33) had 2 transition(s) (state=MASTER)
Done.
Connection to 10.5.0.36 closed.

So I would suggest that we focus on getting the vrrp healthcheck support added back to the charms so that we can have the gateway address pinged to monitor southbound network as well.

[1] https://docs.openstack.org/ocata/networking-guide/deploy-ovs-ha-vrrp.html#keepalived-vrrp-health-check
[2] https://review.opendev.org/#/c/601533/
[3] https://review.opendev.org/#/c/603347/
[4] https://bugs.launchpad.net/neutron/+bug/1793102
[5] https://bugs.launchpad.net/neutron/+bug/1793102/comments/5

Tags: sts Edit Tag help
Hua Zhang (zhhuabj) wrote :

submitted the patch for review - https://review.opendev.org/#/c/657719/

Dmitrii Shcherbakov (dmitriis) wrote :

Could you submit a patch for charm-neutron-openvswitch as well?

It supports enable-dvr-snat=True option which allows running gateway components on compute nodes (as of 19.04 charms, see https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1808045).

Hua Zhang (zhhuabj) wrote :

@dmitriis, thank you, this is patch for charm-neutron-openvswitch - https://review.opendev.org/#/c/657774

Hua Zhang (zhhuabj) on 2019-05-08
Changed in charm-neutron-gateway:
status: New → In Progress
Changed in charm-neutron-openvswitch:
status: New → In Progress
Hua Zhang (zhhuabj) on 2019-05-16
tags: added: sts

Reviewed: https://review.opendev.org/657719
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-gateway/commit/?id=4c150529b5725e25bfe53d7db15f2f11410c6111
Submitter: Zuul
Branch: master

commit 4c150529b5725e25bfe53d7db15f2f11410c6111
Author: Zhang Hua <email address hidden>
Date: Wed May 8 09:52:12 2019 +0800

    Enable keepalived VRRP health check

    If you want to have vrrp watch the external networking interface
    today, the option ha_vrrp_health_check_interval [1] detects a failure
    it re-triggers the transitional change - which works if the external
    physical interface fails because the ping will fail.

    In fact, we've tried to enable it before [2], but then we had to
    revert it [3] due to instability issues [4] in previous releases of
    OpenStack. Maybe the previous instability issue [4] was caused by
    another keepalived issue mentioned in the comment [5], now I have
    tested this option again, it works.

    This is how neutron allows monitoring southbound network today, so
    I would suggest we add this capability into the charm again.

    [1] https://docs.openstack.org/ocata/networking-guide/ \
            deploy-ovs-ha-vrrp.html#keepalived-vrrp-health-check
    [2] https://review.opendev.org/#/c/601533/
    [3] https://review.opendev.org/#/c/603347/
    [4] https://bugs.launchpad.net/neutron/+bug/1793102
    [5] https://bugs.launchpad.net/neutron/+bug/1793102/comments/5

    Change-Id: If2947e7640545cb9a48215afb9b2439fdc33c645
    Closes-Bug: 1825966

Changed in charm-neutron-gateway:
status: In Progress → Fix Committed

Change abandoned by Zhang Hua (<email address hidden>) on branch: stable/19.04
Review: https://review.opendev.org/660574
Reason: Hi Alex and icey, I'm going to abandon this change, thanks for the review

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers