HA router interfaces in standby state
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
New
|
Undecided
|
Unassigned |
Bug Description
Hello,
I faced up with issue when IP floating stopped to work for particular project in OpenStack because of wrong state of HA interfaces.
I have OpenStack-Ansible setup with 3 Neutron containers. IP floating, creating router interfaces in other OpenStack projects works fine in the same time.
Debug showed that, all HA interfaces dedicated to router inside OpenStack project have status "standby". Neutron cli command output:
neutron l3-agent-
+------
| id | host | admin_state_up | alive | ha_state |
+------
| 44738018-
| c1d95367-
| c7023dd7-
+------
root@infra1-
+------
| ID | Name | MAC Address | Fixed IP Addresses | Status |
+------
| 10227cbb-
| 49e60bd7-
| 81b8f9ab-
| bbe4833f-
| cab8cacb-
+------
Router namespace inside containers doesn't have assigned float IPs, router IP addresses, internal network GW (172.16.0.1):
root@infra1-
1: lo: <LOOPBACK,
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ha-bbe4833f-
link/ether fa:16:3e:71:66:85 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 169.254.192.5/18 brd 169.254.255.255 scope global ha-bbe4833f-f6
valid_lft forever preferred_lft forever
inet6 fe80::f816:
valid_lft forever preferred_lft forever
3: qr-cab8cacb-
link/ether fa:16:3e:f2:f4:db brd ff:ff:ff:ff:ff:ff link-netnsid 0
4: qg-10227cbb-
link/ether fa:16:3e:7d:f8:da brd ff:ff:ff:ff:ff:ff link-netnsid 0
Keepalived proccess isn't launched for router id "c71008d3-
root@infra1-
neutron 90394 0.0 0.1 166660 72136 ? S Oct05 0:00 /openstack/
root 103561 0.0 0.0 11284 928 ? S+ 14:14 0:00 grep c71008d3-
Neutron launches Keepalived from configuration folder "/var/lib/
I can't provide step-by-step reproduction steps because the trigger of this problem is unclear for me. According to my research, this error can be fixed by recreating router but I don't really want to do this because it will not solve the source of problem. Neutron log output is attached.
I suppose that problem can be in wrong Neutron database records, but I wasn't able to found what script generates "keepalived.conf". Please, let me know script/task do this and I will be able to continue debugging.
Thanks for paying attention.
Software description:
OpenStack was deployed via OpenStack-Ansible playbook, Pike 16.0.1, commit ebe2bc8734845b4
OpenStack services running inside LXC containers. Neutron server, API, agent, sceduler are placed in one container.
Linux OS - Ubuntu 16.04.4 LTS, kernel - 4.4.0-134-generic
neutron-
neutron-l3-agent 11.0.2.dev2
neutron-server 11.0.2.dev2
neutron CLI - 6.5.0
Thx Andrii for reporting this issue. /github. com/openstack/ neutron/ blob/master/ neutron/ agent/linux/ keepalived. py which is used later by this class: https:/ /github. com/openstack/ neutron/ blob/master/ neutron/ agent/l3/ ha_router. py#L60
In fact it's hard to say why it happened like that.
About Your question where keepalived.conf is prepared, it's in https:/