Keepalived Loses VIP on DHCP Renewal

Bug #1863174 reported by Matthew Roark
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
keepalived (Ubuntu)
New
Undecided
Unassigned

Bug Description

Ubuntu Release: 16.04.6 LTS
Keepalived Package Version: 1.2.24-1ubuntu0.16.04.2

-- /etc/keepalived/keepalived.conf --
vrrp_script chk_apiserver {
        script "curl https://127.0.0.1:443/healthz --cacert ca.crt --key request.key --cert request.crt --fail > /dev/null 2>&1"
        interval 10
        fall 6
        rise 2
}

vrrp_instance K8S_APISERVER {
    interface ens3
    state BACKUP
    virtual_router_id 118
    nopreempt
    dont_track_primary

    authentication {
        auth_type AH
        auth_pass **REDACTED**
    }

    virtual_ipaddress {
        10.128.233.23
    }
    track_script {
        chk_apiserver
    }

}

Expected Behavior: Upon DHCP renewal, Keepalived would maintain its VIP on the designated interface. In the case that the VIP is lost (outside of its control), it should failover to another VRRP instance.

Actual Behavior: VIP disappeared from the designated interface, and did not failover to any other VRRP instance until Keepalived was restarted.

2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:8c:d2:e1 brd ff:ff:ff:ff:ff:ff
    inet 172.20.34.50/27 brd 172.20.34.63 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe8c:d2e1/64 scope link
       valid_lft forever preferred_lft forever

-- /var/log/syslog --
Feb 5 21:13:17 dhclient[839]: DHCPDISCOVER on ens3 to 255.255.255.255 port 67 interval 3 (xid=0x4cfc595e)
Feb 5 21:13:17 dhclient[839]: DHCPREQUEST of 172.20.34.50 on ens3 to 255.255.255.255 port 67 (xid=0x5e59fc4c)
Feb 5 21:13:17 dhclient[839]: DHCPOFFER of 172.20.34.50 from 172.20.34.36
Feb 5 21:13:17 dhclient[839]: DHCPACK of 172.20.34.50 from 172.20.34.36
Feb 5 21:13:17 dhclient[839]: bound to 172.20.34.50 -- renewal in 40846 seconds.
Feb 5 21:13:19 ntpd[19295]: Deleting interface #34 ens3, 172.20.34.40#123, interface stats: received=0, sent=0, dropped=0, active_time=150821 secs

-- /tmp/keepalived.stats --
VRRP Instance: K8S_APISERVER
  Advertisements:
    Received: 10722
    Sent: 153463
  Became master: 1
  Released master: 0
  Packet Errors:
    Length: 0
    TTL: 0
    Invalid Type: 0
    Advertisement Interval: 0
    Address List: 0
  Authentication Errors:
    Invalid Type: 0
    Type Mismatch: 0
    Failure: 15
  Priority Zero:
    Received: 0
    Sent: 0

Note: the networking.service was *not* restarted during this timeframe; however, I have been able to reproduce the issue in that manner.

Additionally, I've not been able to reproduce this issue by hand, i.e. 'dhclient -v -r ens3'. It seemingly only occurs when the lease has expired; it is the only time in which we can observe ntpd detecting that the interface has disappeared (thus the 'Deleting interface' message) at least.

Revision history for this message
Matthew Roark (mroark) wrote :

Note: this is not a duplicate of https://bugs.launchpad.net/netplan/+bug/1815101 considering that's only relevant for Ubuntu 18.04 (Bionic) and later, and this report is regarding Ubuntu 16.04.

The same change as outlined in https://bugs.launchpad.net/netplan/+bug/1815101/comments/4 _may_ still be relevant to also address this manifestation of the issue, though.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hello Matthew, actually this might be a duplicate of BUG 1815101 because that fix was not backported to Xenial. I can include Xenial in that list and study efforts to backport the systemd-networkd to Xenial (checking if its a go or no-go).

I'm marking it as a duplicate AND adding Xenial to the list of systemd-networkd affected versions. Let me know if you would like to add anything. I'll provide comments (about Xenial backport there).

Revision history for this message
Matthew Roark (mroark) wrote :

Rafael,

Thank you. One thing I should note, though - the host on which this occurred is not leveraging systemd-networkd.

$ systemctl list-units -t service --all | grep systemd-networkd
  systemd-networkd-resolvconf-update.service loaded inactive dead Update resolvconf for networkd DNS
  systemd-networkd-wait-online.service loaded inactive dead Wait for Network to be Configured
  systemd-networkd.service loaded inactive dead Network Service

$ systemctl list-units -t service | grep network
cloud-init-local.service loaded active exited Initial cloud-init job (pre-networking)
networking.service loaded active exited Raise network interfaces

The patch for keepalived (mentioned in https://bugs.launchpad.net/netplan/+bug/1815101/comments/30) _does_ seem to still be relevant regardless, though (correct me if I'm wrong). In which case, I will keep a close eye on efforts to backport that package specifically, as I'm afraid there doesn't appear to be any sort of proactive workaround that can be implemented for the time being otherwise.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.