Comment 2 for bug 1810583

Revision history for this message
Ben Hollins (bhollins) wrote :

Hi Karl.
I can confirm this issue also, we encountered it this morning on a 2 node keepalived cluster consisting of 2 VMWARE ubuntu 18.04.1 VMs. In our case, a daily update task had restarted UDEV, which in turn restarted systemd-networkd. When this service restarted, the virtual ip on the MASTER node's NIC was lost, but nothing was recognised by keepalived and the ip was never restored on either MASTER or BACKUP. This caused an outage of services hosted on the virtualip.

When we investigated, we found that both MASTER and BACKUP nodes only had their own primary ip addresses, and neither node had the virtual ip. The virtual ip was unreachable. No managed failover by keepalived had occurred.

We restarted keepalived on both nodes, which caused the virtual ip to re-appear on the MASTER node's NIC. We can reproduce this on demand right now by manually restarting systemd-networkd, which causes the virtual ip to vanish. The only way to get it to return is to then manually restart keepalived.

Notably, when this problem occurs, nothing is logged by keepalived in syslog at all, which suggests it's not recognising the restart of networkd, or the loss of the virtual ip, and therefore not announcing it to the BACKUP node.

There is a good discussion on the ubuntu forums about this, and someone has confirmed that downgrading the keepalived package to the previous one resolves this behaviour, so it does look like the patch in the latest package version has potentially introduced this.

Here is the thread for ref:
https://ubuntuforums.org/showthread.php?t=2406400&p=13819524#post13819524

I'm happy to test anything required on a VM if necessary. We haven't taken any action to workaround this yet.