keepalived loses VIP on netplan apply or systemd restart/upgrade

Bug #1817343 reported by Craig McIntyre
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Won't Fix
Undecided
Unassigned

Bug Description

Environment:
Underlay o/s: Ubuntu Bionic 18.04.1
Networking configured with Netplan with networkd renderer

$ cat /etc/openstack-release
# Ansible managed

DISTRIB_ID="OSA"
DISTRIB_RELEASE="18.1.3"
DISTRIB_CODENAME="Rocky"
DISTRIB_DESCRIPTION="OpenStack-Ansible"

Issue:
On a HAProxy host with keepalived installed, when systemd is upgraded/restarted or netplan apply is run, VIPs are being dropped from the interfaces so services become unavailable. As keepalived doesn't appear to monitor the VIPs, the VRRP failover never occurs when they dissapear.

Workaround:
A workaround solution has been documented here https://chr4.org/blog/2019/01/21/make-keepalived-play-nicely-with-netplan-slash-systemd-network/ which involves creating dummy interfaces with systemd-netword and assigning the VIPs to these.

Problem with workaround:
Currently the keepalived override variables haproxy_keepalived_external_interface and haproxy_keepalived_internal_interface assign the configured interface at 2 areas of keepalived.conf each:

vrrp_instance internal {
  interface <haproxy_keepalived_internal_interface>
  }
  virtual_ipaddress {
    <haproxy_keepalived_internal_vip_cidr> dev <haproxy_keepalived_internal_interface>
  }
vrrp_instance external {
  interface <haproxy_keepalived_external_interface>
  }
  virtual_ipaddress {
    <haproxy_keepalived_external_vip_cidr> dev <haproxy_keepalived_external_interface>
  }

However, for the workaround to work, the 'interface' knob of the stanza should still reference a physical interface. Also, the dummy interfaces both need to be not down (manually bringing them up puts them in an UNKNOWN state).

Request:
In order to utilise the workaround to make VIPs persist through a netplan apply and/or systemd upgrade/restart there would need to be a pair of variables per interface configurable. One to bind at the 'interface' knob and one to bind at the 'virtual_ipaddress' knob.

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

We can:
1) Add a new feature by adding an extra var in this line:
https://github.com/openstack/openstack-ansible/blob/442d53a4d58d2a58d91813dee9ff96b51ef5063e/inventory/group_vars/haproxy/keepalived.yml#L60

replacing:
dev {{ haproxy_keepalived_external_interface | default(management_bridge) }}

with:
dev {{ haproxy_keepalived_external_vip_interface | default(haproxy_keepalived_external_interface | default(management_bridge)) }}

2) We can add a release note pointing to a known issue for netplan, telling to use said variable, and, in the meantime a fix is released in keepalived packages, also propose a global group_var/extra var override for the users in the same reno.

Revision history for this message
Craig McIntyre (ceemac) wrote :

That would be ideal.

Would it be worth adding a documentation note on the requirement / process for setting up the dummy interfaces for this change to work correctly?

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

I'm afraid we haven't managed to do neither of these options. And at the moment issue is installed with keepalived 2.0 which for Ubuntu 20.04 out of the box.

And it makes little sense to do this now as we've just dropped bionic support for master...

However it's super easy to override keepalived_instances stanza for older releases on bionic in user_variables.yml to make things work as you need them.

As for keepalived_instances we're providing some reasonable default which could be easily changed by deployer at any time. And I can't say workaround with fake interface is neat one - imo the way easier just to drop netplan in favor of old ifupdown or systemd-networkd as eventually you would still leverage it for creation of fake device (and netplan uses it as well).

So now it's really too late for release note and I don't think this will be fixed in any way :(

Changed in openstack-ansible:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.