We should cleanup ipv4 address if keepalived is dead

Bug #1833653 reported by Yang Li
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Won't Fix
Medium
zhengyong

Bug Description

If a router's keepalived is dead(kill -9 <pid>, or kill -HUP <pid> too many times that cause process terminated), and the original role in this node is master, this will cause brain split. Then when we restart neutron-l3-agent, the ipv6 will be cleanup, but the ipv4 still exists, I think we should alse cleanup ipv4 before enable keepalived.
The current state:
original master:
[root@node-1 ~]# ip netns exec qrouter-a8f9d4e4-622d-4ebf-b7a9-4818f63ef502 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
28: ha-66734f93-5e: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN qlen 1000
    link/ether fa:16:3e:fd:49:a1 brd ff:ff:ff:ff:ff:ff
    inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-66734f93-5e
       valid_lft forever preferred_lft forever
    inet 169.254.0.1/24 scope global ha-66734f93-5e
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fefd:49a1/64 scope link
       valid_lft forever preferred_lft forever
29: qr-4f77a86c-c2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN qlen 1000
    link/ether fa:16:3e:26:98:8f brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.1/24 scope global qr-4f77a86c-c2
       valid_lft forever preferred_lft forever
46: qg-e28d50e4-ba: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether fa:16:3e:c4:3e:43 brd ff:ff:ff:ff:ff:ff
    inet 172.16.10.133/24 scope global qg-e28d50e4-ba
       valid_lft forever preferred_lft forever
    inet 172.16.10.134/32 scope global qg-e28d50e4-ba
       valid_lft forever preferred_lft forever

currnet master:
[root@node-2 ~]# ip netns exec qrouter-a8f9d4e4-622d-4ebf-b7a9-4818f63ef502 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
32: ha-606f2f23-5f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN qlen 1000
    link/ether fa:16:3e:8e:dc:c4 brd ff:ff:ff:ff:ff:ff
    inet 169.254.192.2/18 brd 169.254.255.255 scope global ha-606f2f23-5f
       valid_lft forever preferred_lft forever
    inet 169.254.0.1/24 scope global ha-606f2f23-5f
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe8e:dcc4/64 scope link
       valid_lft forever preferred_lft forever
33: qr-4f77a86c-c2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN qlen 1000
    link/ether fa:16:3e:26:98:8f brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.1/24 scope global qr-4f77a86c-c2
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe26:988f/64 scope link nodad
       valid_lft forever preferred_lft forever
50: qg-e28d50e4-ba: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether fa:16:3e:c4:3e:43 brd ff:ff:ff:ff:ff:ff
    inet 172.16.10.133/24 scope global qg-e28d50e4-ba
       valid_lft forever preferred_lft forever
    inet 172.16.10.134/32 scope global qg-e28d50e4-ba
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fec4:3e43/64 scope link nodad
       valid_lft forever preferred_lft forever

The command line output showed that the original master's ipv6 were cleanup.

Tags: l3-ha
tags: added: l3-dvr-backlog
tags: added: l3-ha
Revision history for this message
YAMAMOTO Takashi (yamamoto) wrote :

while i'm not sure if the suggested cleanup on a restart is good enough, it seems something needs to be done.

Changed in neutron:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/667071

Changed in neutron:
assignee: nobody → Yang Li (yang-li)
status: Confirmed → In Progress
Yang Li (yang-li)
description: updated
tags: removed: l3-dvr-backlog
Changed in neutron:
assignee: Yang Li (yang-li) → zhengyong (zhengy23)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Rodolfo Alonso <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/667071

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Bug closed due to lack of activity, please feel free to reopen if needed.

Changed in neutron:
status: In Progress → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.