Keepalived < 2.0.x in Ubuntu 18.04 LTS not compatible with systemd-networkd

Bug #1819074 reported by cdmiller on 2019-03-07
40
This bug affects 7 people
Affects Status Importance Assigned to Milestone
keepalived (Ubuntu)
Medium
Unassigned
Bionic
Medium
Unassigned
netplan.io (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
Unassigned
systemd (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
Unassigned

Bug Description

Systemd-networkd clobbers VIPs placed by other daemons on any reconfiguration triggering systemd-networkd restart (netplan apply for example). Keepalived < version 2.0.x will not restore a VIP lost in this fashion, breaking high availability on Ubuntu 18.04 LTS. A backport for keepalived >= 2.0.x should fix the issue.

(or at least related)

Adding other components involved.

Is this a dup to bug 1815101 ?

cdmiller (cdmiller) wrote :

Bug 1815101 is a symptom of this bug.

I do not think bug 1810583 is related, I do not think a backport of keepalived >= 2.0.x would fix bug 1810583.

Thanks,

- cameron

cdmiller (cdmiller) wrote :

After reading the comments for bug 1810583 closer, it is also a symptom of this bug, and keepalived >= 2.0.x would fix it (restore the VIPs).

Thanks,

- cameron

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in keepalived (Ubuntu):
status: New → Confirmed
Changed in netplan.io (Ubuntu):
status: New → Confirmed
Changed in systemd (Ubuntu):
status: New → Confirmed

Will anything be done here? This basically breaks keepalived for bionic. I.e, will keepalived >= 2.0.x be available for bionic?

Bryce Harrington (bryce) wrote :

Given SRU and backport team policies, having a newer version of keepalived in bionic seems pretty unlikely. SRU policy favors backports of specific fixes rather than entire package backports, while the backports team generally discourages backports of libraries or services since they can randomly break other software using them.

So, the most practical route forward would be to identify the patch(es) needed for fixing the particular issue at hand, and go through the regular SRU process. I am, unfortunately, completely unfamiliar with keepalived, but attached is a list of upstream comments mentioning "VIP" since the v.1.3.9 release, which I generated like this:

    git log --grep="VIP" v1.3.9.. > /tmp/vip_commits.txt

The next step would be for someone more familiar than me, to review the list and identify 1 or 2 patches worth testing. Then apply the patch to the bionic keepalived package and test for a fix. After we know what patch is needed, an SRU request can be placed to have it released for all users.

Changed in keepalived (Ubuntu):
status: Confirmed → Triaged
Bryce Harrington (bryce) wrote :

Oh, one other thing that will be necessary to file an SRU for this would be a a series of "paint by number" steps to reproduce the issue, and to verify the fix. For example, something akin to: https://bugs.launchpad.net/ubuntu/+source/keepalived/+bug/1810583/comments/12

Bryce Harrington (bryce) on 2019-06-20
Changed in keepalived (Ubuntu Bionic):
status: New → Triaged
Changed in keepalived (Ubuntu Cosmic):
status: New → Triaged
Changed in keepalived (Ubuntu):
importance: Undecided → Medium
Changed in keepalived (Ubuntu Bionic):
importance: Undecided → Medium
Changed in keepalived (Ubuntu Cosmic):
importance: Undecided → Medium
Bryce Harrington (bryce) on 2019-06-20
tags: added: server-next
tags: added: server-triage-discuss

Fixed in later versions, Rafael will take a HA-POV look at this after FF in Eoan.

no longer affects: systemd (Ubuntu Cosmic)
no longer affects: netplan.io (Ubuntu Cosmic)
no longer affects: keepalived (Ubuntu Cosmic)
Changed in keepalived (Ubuntu):
status: Triaged → Fix Released
Changed in netplan.io (Ubuntu):
status: Confirmed → Fix Released
Changed in systemd (Ubuntu):
status: Confirmed → Fix Released
tags: removed: server-triage-discuss
Wes (wes234234) wrote :

Medium? I've had this take a few prod clusters offline at highly inconvenient times, it effectively makes keepalived on 18.04 unfit for purpose. We've had to build our own keepalived packages to keep things running - I think you need to give this a bit more priority please

Edward Hope-Morley (hopem) wrote :

Looks like this has been fixed in keepalived 2.x (detection of missing vip) - https://github.com/acassen/keepalived/issues/836 - but the patch is embedded with a whole load others that were merged at once so might be hard to backport.

The following 3 bugs:

https://bugs.launchpad.net/bugs/1815101
https://bugs.launchpad.net/bugs/1819074
https://bugs.launchpad.net/bugs/1810583

Have the same root cause: the fact that systemd-network messes with secondary IP addresses in NICs managed by systemd.

I'm marking all other cases as a duplicate of LP: #1815101.

TODO here is the following:

- There are mainly 2 "fixes" for this issue:

1) keepalived is able to recognize systemd-networkd changes and change cluster status in order to reconfigure managed NICs (keepalived (> 2.0.x)).

2) systemd-networkd implements a new stanza (KeepConfiguration=) to systemd service unit files in order to fix not only this behavior but all those HA related software that manages secondary IPs and/or aliases to NICs being managed by systemd-networkd.

I think the most appropriate would make sure those 2 features work in Eoan, both, together, and then make sure the SRUs are done to Disco and Bionic. One problem w/ the item (2) is that netplan will also have to support the new "KeepConfiguration=" systemd service file stanza, but, the fix (2) is more appropriate for all other HA related softwares controlling virtual IPs (CTDB, Pacemaker, and so ...).

tags: removed: server-next
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in netplan.io (Ubuntu Bionic):
status: New → Confirmed
Changed in systemd (Ubuntu Bionic):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.