Comment 4 for bug 1798475

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I was trying to understand on one example what happens there that this failover happens sometimes.

I was based on test result http://logs.openstack.org/09/608909/20/check/neutron-fullstack/c7b6401/logs/testr_results.html.gz

Two „hosts”: host-3f3dad1b and host-6d630618

Router id: 3d3c2c83-234a-4b63-bd6b-c450da34a7d2

First time router was created:
* host-3f3dad1b was backup, router transitioned to backup at 3:37:35.482
 http://logs.openstack.org/09/608909/20/check/neutron-fullstack/c7b6401/logs/dsvm-fullstack-logs/TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost/neutron-l3-agent--2018-11-30--03-37-05-357461.txt.gz#_2018-11-30_03_37_35_482

* host-6d630618 was active, router transitioned first to backup at 03:37:32.245
 http://logs.openstack.org/09/608909/20/check/neutron-fullstack/c7b6401/logs/dsvm-fullstack-logs/TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost/neutron-l3-agent--2018-11-30--03-37-05-693500.txt.gz#_2018-11-30_03_37_32_245
 and later transitioned to master at 03:37:47.489
 http://logs.openstack.org/09/608909/20/check/neutron-fullstack/c7b6401/logs/dsvm-fullstack-logs/TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost/neutron-l3-agent--2018-11-30--03-37-05-693500.txt.gz#_2018-11-30_03_37_47_489

Restarts of agents:
* First restart of backup agent (host-3f3dad1b) at 03:37:50.546
 http://logs.openstack.org/09/608909/20/check/neutron-fullstack/c7b6401/logs/dsvm-fullstack-logs/TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost.txt.gz#_2018-11-30_03_37_50_546
 Pinging gateway IP address from external vm for 1 minute is fine,
* New process on this host is started and router is agent transitioned to backup at 03:37:59.021:
 http://logs.openstack.org/09/608909/20/check/neutron-fullstack/c7b6401/logs/dsvm-fullstack-logs/TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost/neutron-l3-agent--2018-11-30--03-37-50-590709.txt.gz#_2018-11-30_03_37_59_388

* Then restart of master agent happens at 03:38:50.909
 http://logs.openstack.org/09/608909/20/check/neutron-fullstack/c7b6401/logs/dsvm-fullstack-logs/TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost.txt.gz#_2018-11-30_03_38_50_909
 Router is then transitioned to active on host-3f3dad1b at 03:39:02.322
 http://logs.openstack.org/09/608909/20/check/neutron-fullstack/c7b6401/logs/dsvm-fullstack-logs/TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost/neutron-l3-agent--2018-11-30--03-37-50-590709.txt.gz#_2018-11-30_03_39_02_322
 And it is transitioned to backup at 03:39:03.522:
 http://logs.openstack.org/09/608909/20/check/neutron-fullstack/c7b6401/logs/dsvm-fullstack-logs/TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost/neutron-l3-agent--2018-11-30--03-38-50-946978.txt.gz#_2018-11-30_03_39_03_522
 On this host it is also transitioned to backup once again at 03:39:19.314
 http://logs.openstack.org/09/608909/20/check/neutron-fullstack/c7b6401/logs/dsvm-fullstack-logs/TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost/neutron-l3-agent--2018-11-30--03-38-50-946978.txt.gz#_2018-11-30_03_39_19_314
 And finally it is transitioned back to active on host host-6d630618 at 03:39:36.339
 http://logs.openstack.org/09/608909/20/check/neutron-fullstack/c7b6401/logs/dsvm-fullstack-logs/TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost/neutron-l3-agent--2018-11-30--03-38-50-946978.txt.gz#_2018-11-30_03_39_36_339

And I still don't know why some VRRP packets are lost (probably) and keepalived switches this VIP.
Related keepalived logs can be found in http://logs.openstack.org/09/608909/20/check/neutron-fullstack/c7b6401/logs/journal.log around 03:39:00