Activity log for bug #1958149

Date Who What changed Old value New value Message
2022-01-17 14:31:57 Maximilian Stinsky bug added bug
2022-01-17 14:40:20 OpenStack Infra neutron: status New In Progress
2022-01-17 15:34:59 Jakub Libosvar tags l3-ha
2022-01-17 15:35:36 Jakub Libosvar neutron: importance Undecided Medium
2022-01-17 18:28:45 Brian Haley bug added subscriber Brian Haley
2022-01-24 12:20:44 Oleg Bondarev neutron: assignee Maximilian Stinsky (mstinsky)
2023-09-12 09:29:56 Maximilian Stinsky description When we restart the neutron-l3-agent we observe that backup routers start accepting router advertisements. This leads to routes inside the router namespace which expire. e.g.: $ ip netns exec qrouter-a5f7fb32-3e30-4e15-89f9-4ae888c2cac6 ip -6 r x:x:1002:1::/64 dev qr-72f85121-ce proto kernel metric 256 expires 86355sec pref medium x:x:1002:1::/64 dev qr-4e84792f-aa proto kernel metric 256 expires 86355sec pref medium fe80::/64 dev ha-9d085c9d-15 proto kernel metric 256 pref medium default via fe80::f816:3eff:fed3:3fa6 dev qr-4e84792f-aa proto ra metric 1024 expires 255sec hoplimit 64 pref medium default via fe80::f816:3eff:fed3:3fa6 dev qr-72f85121-ce proto ra metric 1024 expires 255sec hoplimit 64 pref medium When we now failover to such a backup router, the kernel does not create the necessary directly attached routes because they already exist. The problem is that those routes expire and because we are now a master router the routes do not refresh from the router advertisement anymore and expire after 24h which breaks ipv6 for those routers. After we dug a bit deeper into this issue we found that the function [1] that disables the accept_ra on the backup routers always returns false. So backup routers never get their router advertisement disabled. master router: $ ip netns exec qrouter-92ed5c1f-c705-4ab9-a0e1-56e905d43abd sysctl net.ipv6.conf.qr-c7eb60ab-f1.accept_ra net.ipv6.conf.qr-c7eb60ab-f1.accept_ra = 1 backup router: $ ip netns exec qrouter-92ed5c1f-c705-4ab9-a0e1-56e905d43abd sysctl net.ipv6.conf.qr-c7eb60ab-f1.accept_ra net.ipv6.conf.qr-c7eb60ab-f1.accept_ra = 1 [1] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/ha_router.py#L318 When we restart the neutron-l3-agent we observe that backup routers start accepting router advertisements. This leads to routes inside the router namespace which expire. e.g.: $ ip netns exec qrouter-a5f7fb32-3e30-4e15-89f9-4ae888c2cac6 ip -6 r x:x:1002:1::/64 dev qr-72f85121-ce proto kernel metric 256 expires 86355sec pref medium x:x:1002:1::/64 dev qr-4e84792f-aa proto kernel metric 256 expires 86355sec pref medium fe80::/64 dev ha-9d085c9d-15 proto kernel metric 256 pref medium default via fe80::f816:3eff:fed3:3fa6 dev qr-4e84792f-aa proto ra metric 1024 expires 255sec hoplimit 64 pref medium default via fe80::f816:3eff:fed3:3fa6 dev qr-72f85121-ce proto ra metric 1024 expires 255sec hoplimit 64 pref medium When we now failover to such a backup router, the kernel does not create the necessary directly attached routes because they already exist. The problem is that those routes expire and because we are now a master router the routes do not refresh from the router advertisement anymore and expire after 24h which breaks ipv6 for those routers. After we dug a bit deeper into this issue we found that the function [1] that disables the accept_ra on the backup routers always returns false. So backup routers never get their router advertisement disabled. master router: $ ip netns exec qrouter-92ed5c1f-c705-4ab9-a0e1-56e905d43abd sysctl net.ipv6.conf.qr-c7eb60ab-f1.accept_ra net.ipv6.conf.qr-c7eb60ab-f1.accept_ra = 1 backup router: $ ip netns exec qrouter-92ed5c1f-c705-4ab9-a0e1-56e905d43abd sysctl net.ipv6.conf.qr-c7eb60ab-f1.accept_ra net.ipv6.conf.qr-c7eb60ab-f1.accept_ra = 1 [1] https://github.com/openstack/neutron/blob/stable/train/neutron/agent/l3/ha_router.py#L318
2023-12-06 19:43:19 Dr. Jens Harbott bug added subscriber Dr. Jens Harbott