ha router goes to standby mode only when they are created by heat

Bug #1826644 reported by Dilip Renkila
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Undecided
Unassigned

Bug Description

Hi all,

I am running l3-ha with linuxbridge. When routers are created by heat for magnum k8s cluster provisioning, they are ending up in standby mode in my environment. But when i manually create routers, it looks fine.

root@ctrl1:~# neutron l3-agent-list-hosting-router a26bc4c4-e834-479e-9b41-90a909811140
+--------------------------------------+-------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+-------+----------------+-------+----------+
| 7f3fc3e4-be9a-479b-9571-88a96c13c966 | ctrl3 | True | :-) | standby |
| a0f5ccd4-e811-438d-be63-7bc4db1193d9 | ctrl2 | True | :-) | standby |
| cc7b7efe-6fb2-4add-8629-5470beaf8d43 | ctrl1 | True | :-) | standby |
+--------------------------------------+-------+----------------+-------+----------+

I have followed the following bug https://bugs.launchpad.net/neutron/+bug/1823314. But in my case i'm only creating a single router.

I running on neutron 14.0.0~b1~git2019013137.7484700deb.

Following are the logs from one of the network nodes

https://etherpad.openstack.org/p/t4xDU3sRo3

These kind of errors are easily reproducible in my setup while i am trying to provision kubernetes cluster through magnum.

No luck even after restarting l3-agent.

Tags: l3-ha
description: updated
Revision history for this message
LIU Yulong (dragon889) wrote :

Hi Dilip,
Could you please following this guide to add some information about your test/environment/version?
https://docs.openstack.org/neutron/latest/contributor/policies/bugs.html#bug-report-template

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Hi,

Can You also check what vr_id are assigned to those routers? It's in db table "router_extra_attributes".

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

And also, can You check if it happens also with patch https://review.openstack.org/651495 ?

Revision history for this message
Dilip Renkila (dilip-renkila278) wrote :
Download full text (6.2 KiB)

Hi Slawek,

the below are router_extra_attributes

MariaDB [neutron_prod]> select * from router_extra_attributes;
+--------------------------------------+-------------+----------------+----+----------+-------------------------+
| router_id | distributed | service_router | ha | ha_vr_id | availability_zone_hints |
+--------------------------------------+-------------+----------------+----+----------+-------------------------+
| 00eb57be-1737-4d0b-8cc4-da5846c88420 | 0 | 0 | 1 | 1 | [] |
| 077a2576-535a-4851-a5b7-3c9b66b3e257 | 0 | 0 | 1 | 1 | [] |
| 085d6596-93bf-45c8-a4f5-7c7d6c43882d | 0 | 0 | 1 | 1 | [] |
| 1ac4ac96-5af2-4ef2-a158-d9ed76ba63c5 | 0 | 0 | 1 | 1 | [] |
| 1bee8603-d3ec-4736-bf76-ed28fe0fd6ca | 0 | 0 | 1 | 1 | [] |
| 2233a211-4eaa-483b-a360-f68af7ac4e4b | 0 | 0 | 1 | 1 | [] |
| 2ad5d5b0-53b4-450b-a6c6-73e646608947 | 0 | 0 | 1 | 1 | [] |
| 37f98339-a9c5-4d3c-a1fb-08325e993458 | 0 | 0 | 1 | 2 | [] |
| 3997e8b7-aa8a-4608-abbb-e65d60e6eec0 | 0 | 0 | 1 | 1 | [] |
| 3d53e780-f45f-439d-9d22-f1cc382196df | 0 | 0 | 1 | 1 | [] |
| 3da6fd20-0240-4769-8dc8-688424180aa2 | 0 | 0 | 1 | 1 | [] |
| 421dfec9-b33d-43dd-90e3-a0db7301d3fd | 0 | 0 | 1 | 1 | [] |
| 42ded4a2-de26-4867-b9ab-7f21d972009e | 0 | 0 | 1 | 1 | [] |
| 4b1552dd-7b6e-4b32-8cff-6d11cfd755f2 | 0 | 0 | 1 | 1 | [] |
| 4f080054-462b-46a6-9e0f-377e35123f7c | 0 | 0 | 1 | 1 | [] |
| 5008ad0f-ec77-459a-9195-2c34b3bc4680 | 0 | 0 | 1 | 1 | [] |
| 55e7fe24-53b3-4883-b764-b83ffd480a28 | 0 | 0 | 1 | 1 | [] |
| 5e3bda2a-2469-469d-82ab-09b4d1aed790 | 0 | 0 | 1 | 1 | [] |
| 5e966e1e-08cd-49b4-8392-83b7023ee53b | 0 | 0 | 1 | 3 | [] |
| 634f2dad-4f78-43ea-8cf0-9b42de11ef8b | 0 | 0 | 1 | 1 | [] |
| 6fd23e32-763e-44a3-b257-0a975edefa3f | 0 | 0 | 1 | 1 | [] |
| 701582e5-cf92-423c-840a-64894f1d95d6 | 0 | 0 | 1 | 1 | [] |
| 73e816a9-973f-4a3b-8fd8-63944b145ff9 | 0 | 0 | 1 | 1 | [] |
| 7c64619c-b2a1-4813-bbdb-19d934de8eab | 0 | 0 | 1 | 1 | [] |
| 7e4352...

Read more...

Revision history for this message
Dilip Renkila (dilip-renkila278) wrote :

Hi Slawek,

these are router_extra_attributes for routers that belong to same tenant. The below two routers belong to same tenant. In my case they don't have same vr_id.

MariaDB [neutron_prod]> select * from router_extra_attributes where router_id="a26bc4c4-e834-479e-9b41-90a909811140" ;
+--------------------------------------+-------------+----------------+----+----------+-------------------------+
| router_id | distributed | service_router | ha | ha_vr_id | availability_zone_hints |
+--------------------------------------+-------------+----------------+----+----------+-------------------------+
| a26bc4c4-e834-479e-9b41-90a909811140 | 0 | 0 | 1 | 2 | [] |
+--------------------------------------+-------------+----------------+----+----------+-------------------------+
1 row in set (0.00 sec)

MariaDB [neutron_prod]> select * from router_extra_attributes where router_id="b1853f6f-dd7a-4a0e-8941-0c50d1bbbe56" ;
+--------------------------------------+-------------+----------------+----+----------+-------------------------+
| router_id | distributed | service_router | ha | ha_vr_id | availability_zone_hints |
+--------------------------------------+-------------+----------------+----+----------+-------------------------+
| b1853f6f-dd7a-4a0e-8941-0c50d1bbbe56 | 0 | 0 | 1 | 1 | [] |
+--------------------------------------+-------------+----------------+----+----------+-------------------------+

summary: - ha router goes to standby mode only when it's created by heat
+ ha router goes to standby mode only when they are created by heat
tags: added: l3-ha
description: updated
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Thx for checking that.
So if 2 routers in same tenant have got different vr_id configured than it's definitely not the same bug as something what https://review.openstack.org/651495 is trying to fix.

Can You then check in logs of L3 agents which handles those routers if there are any errors there? Also checking logs and configs of keepalived and neutron-keepalived-state-change may help to understand the issue here.

Please also check in qrouter namespaces on network nodes if all other IPs from HA network are reachable from every namespace.

Revision history for this message
Dilip Renkila (dilip-renkila278) wrote :
Download full text (5.2 KiB)

lets take a failed router

root@ctrl1:~# neutron l3-agent-list-hosting-router 6e84cf52-ef66-455c-b89c-d9725057e131
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+--------------------------------------+-------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+-------+----------------+-------+----------+
| 7f3fc3e4-be9a-479b-9571-88a96c13c966 | ctrl3 | True | :-) | standby |
| a0f5ccd4-e811-438d-be63-7bc4db1193d9 | ctrl2 | True | :-) | standby |
| cc7b7efe-6fb2-4add-8629-5470beaf8d43 | ctrl1 | True | :-) | standby |
+--------------------------------------+-------+----------------+-------+----------+

on ctrl1

root@ctrl1:~# ip netns exec qrouter-6e84cf52-ef66-455c-b89c-d9725057e131 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ha-7b94f8d7-e7@if2158: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8950 qdisc noqueue state UP group default qlen 1000
    link/ether fa:16:3e:34:4e:3d brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.194.27/18 brd 169.254.255.255 scope global ha-7b94f8d7-e7
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe34:4e3d/64 scope link
       valid_lft forever preferred_lft forever
3: qg-60ecaa0d-46@if2159: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fa:16:3e:f3:a6:b0 brd ff:ff:ff:ff:ff:ff link-netnsid 0
4: qr-3e967e78-45@if2160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fa:16:3e:92:52:bb brd ff:ff:ff:ff:ff:ff link-netnsid 0

on ctrl2
root@ctrl2:~# ip netns exec qrouter-6e84cf52-ef66-455c-b89c-d9725057e131 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ha-ef796792-c1@if1786: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8950 qdisc noqueue state UP group default qlen 1000
    link/ether fa:16:3e:7f:49:a3 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.192.242/18 brd 169.254.255.255 scope global ha-ef796792-c1
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe7f:49a3/64 scope link
       valid_lft forever preferred_lft forever
3: qr-3e967e78-45@if1787: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fa:16:3e:92:52:bb brd ff:ff:ff:ff:ff:ff link-netnsid 0
4: qg-60ecaa0d-46@if1788: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fa:16:3e:f3:a6:b0 brd ff:ff:ff:ff:ff:ff link-netnsid 0

on ctrl3

root@ctrl3:~# ip netns exec qrouter-...

Read more...

Revision history for this message
Dilip Renkila (dilip-renkila278) wrote :
Download full text (31.3 KiB)

Hi Slawek,

Ping works fine from namespaces. You mentioned to check logs of l3-agent, neutron-keepalived-state-change and keepalived.

Here are logs of one of l3 agent

2019-05-05 12:56:18.014 2734608 DEBUG neutron.agent.l3.agent [req-e5d957eb-2bbc-4024-97d3-86bdc2943432 1f7840b022f84efeaffbd9df5361923e a1dd449c9ce64345af2a7fb05c4aa21f - - -] Got routers updated notification :['6e84cf52-ef66-455c-b89c-d9725057e131'] routers_updated /usr/lib/python3/dist-packages/neutron/agent/l3/agent.py:444
2019-05-05 12:56:18.016 2734608 DEBUG neutron.agent.l3.agent [-] Starting router update for 6e84cf52-ef66-455c-b89c-d9725057e131, action 3, priority 1 _process_router_update /usr/lib/python3/dist-packages/neutron/agent/l3/agent.py:547
2019-05-05 12:56:19.203 2734608 DEBUG neutron.agent.l3.agent [req-e5d957eb-2bbc-4024-97d3-86bdc2943432 1f7840b022f84efeaffbd9df5361923e a1dd449c9ce64345af2a7fb05c4aa21f - - -] Got routers updated notification :['6e84cf52-ef66-455c-b89c-d9725057e131'] routers_updated /usr/lib/python3/dist-packages/neutron/agent/l3/agent.py:444
2019-05-05 12:56:22.557 2737537 DEBUG oslo.privsep.daemon [-] privsep: request[139694637118880]: (3, 'neutron.privileged.agent.linux.ip_lib.create_netns', ('qrouter-6e84cf52-ef66-455c-b89c-d9725057e131',), {}) loop /usr/lib/python3/dist-packages/oslo_privsep/daemon.py:443
2019-05-05 12:56:22.642 2734608 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-6e84cf52-ef66-455c-b89c-d9725057e131', 'sysctl', '-w', 'net.ipv4.conf.all.promote_secondaries=1'] create_process /usr/lib/python3/dist-packages/neutron/agent/linux/utils.py:87
2019-05-05 12:56:23.068 2737537 DEBUG oslo.privsep.daemon [-] privsep: request[139694637118880]: (3, 'neutron.privileged.agent.linux.ip_lib.set_link_attribute', ('lo', 'qrouter-6e84cf52-ef66-455c-b89c-d9725057e131'), {'state': 'up'}) loop /usr/lib/python3/dist-packages/oslo_privsep/daemon.py:443
2019-05-05 12:56:23.103 2734608 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-6e84cf52-ef66-455c-b89c-d9725057e131', 'sysctl', '-w', 'net.ipv4.ip_forward=1'] create_process /usr/lib/python3/dist-packages/neutron/agent/linux/utils.py:87
2019-05-05 12:56:23.502 2734608 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-6e84cf52-ef66-455c-b89c-d9725057e131', 'sysctl', '-w', 'net.ipv4.conf.all.arp_ignore=1'] create_process /usr/lib/python3/dist-packages/neutron/agent/linux/utils.py:87
2019-05-05 12:56:23.981 2734608 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-6e84cf52-ef66-455c-b89c-d9725057e131', 'sysctl', '-w', 'net.ipv4.conf.all.arp_announce=2'] create_process /usr/lib/python3/dist-packages/neutron/agent/linux/utils.py:87
2019-05-05 12:56:24.402 2734608 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', '...

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Hi,

Sorry for so late answer but I was busy recently.
Can You also check for router 6e84cf52-ef66-455c-b89c-d9725057e131 on ctrl1, ctrl2 and ctrl3 if there is VIP address configured in router's namespace on one of hosts?
Also, please check (and paste here) config files from keepalived processes for this router from each of controller nodes and check journal logs if there are maybe any errors/warnings coming from keepalived or neutron-keepalived-state-change related to this router.

Can You also tell me if restart of L3 agents or restart of keepalived processes and L3 agents after that fixes this issue?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.