Comment 21 for bug 1744062

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Ok I have now completed testing the bionic-proposed keepalived package with Openstack Queens and am happy that it resolves the problem of ensuring that keepalived will teardown routes, vips, evips etc when it comes back up and transitions from master to backup. My test comprised of deploying Queens with 3 gateways, creating 100 users/projects each with 1 router, creating some instances with floating ips then forcibly killing both the keepalived and neutron-keepalived-state-change processes associated with a particular router for which i have an instance with a fip. I then observed that the qrouter ns interfaces for that router were definitely unconfigured and the vrrp transition happened as expected. This is in contrast to e.g. keepalived 1:1.2.19-1ubuntu0.2 available with all Xenial releases of Openstack for which I consistently see the qrouter interfaces remain configured on > 1 gateway.

For completeness (although not having any bearing on the keepalived fix) I also still see the other issue remain for bionic whereby in neutron the router is listed as being active on > 1 host e.g.

(truncating so that it will display properly)
+-//---------------------------+---------+----------------+-------+----------+
| // id | host | admin_state_up | alive | ha_state |
+-//---------------------------+---------+----------------+-------+----------+
| //901-4edd-86fb-8dbfe7373255 | crustle | True | :-) | active |
| //961-4318-9743-775ebc9b0067 | chespin | True | :-) | active |
| //628-4c2e-8e91-c309e4477c75 | orgen | True | :-) | standby |
+-//---------------------------+---------+----------------+-------+----------+

The reason for this is simple and the good news is that with the fixed keepalived it is also benign. Neutron detects state changes by running ip monitor on the qrouter interfaces and since my test involved killing both neutron-keepalived-state-change (that runs ip monitor) and keepalived, the vrrp transition appears to have happened before neutron had ip monitor running again. Looking at the l3-agent logs is see:

2018-07-25 10:19:33.636 14018 WARNING neutron.agent.linux.external_process [-] Respawning keepalived for uuid 75d24bfb-9807-4216-af4a-3aac37cf2417
2018-07-25 10:19:33.638 14018 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-75d24bfb-9807-4216-af4a-3aac37cf2417', 'keepalived', '-P', '-f', '/var/lib/neutron/ha_confs/
2018-07-25 10:19:33.886 14018 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-75d24bfb-9807-4216-af4a-3aac37cf2417', 'neutron-keepalived-state-change', '--router_id=75d24

i.e. neutron starts keepalived BEFORE keepalived-state-change so if the transition and teardown happens prior to the latter coming up and launching ip monitor it never sees the changes and has nothing to report to neutron.