Ubuntu
keepalived package

Daily cron restarts network on unattended updates but keepalived .service is not restarted as a dependency

Bug #1810583 reported by Tom Scholten on 2019-01-05

This bug report is a duplicate of: Bug #1815101: [master] Restarting systemd-networkd breaks keepalived, heartbeat, corosync, pacemaker (interface aliases are restarted). Edit Remove

This bug affects 13 people

Affects		Status	Importance	Assigned to	Milestone
	keepalived (Ubuntu)	Triaged	High	Karl Stenerud
	networkd-dispatcher (Ubuntu)	Invalid	Undecided	Unassigned

Bug Description

[Impact]

If systemd-networkd is restarted, any VRRP from keepalived are not restored.

[Test Case]

multipass launch daily:bionic --name tester && multipass exec tester -- sudo su

apt update && apt dist-upgrade -y && apt install -y keepalived &&
echo "vrrp_instance VI_1 {
    virtual_router_id 33
    state MASTER
    interface ens3

    virtual_ipaddress {
        $(ip addr | grep 'inet ' | grep global | head -1 | sed 's/.*inet $[0-9]*\.[0-9]*\.[0-9]*$\..*/\1.3/g')
    }
}" >/etc/keepalived/keepalived.conf &&
service keepalived start &&

# There will be a new IP address x.x.x.3/32 added to ens3
ip addr

# Restart networkd. The IP address won't come back
systemctl restart systemd-networkd
ip addr

# Restart keepalived. The IP address will come back
systemctl restart keepalived
ip addr

[Regression Potential]

TODO

[Original Description]

Description: Ubuntu 18.04.1 LTS
Release: 18.04
ii keepalived 1:1.3.9-1ubuntu0.18.04.1 amd64 Failover and monitoring daemon for LVS clusters

(From unanswered https://answers.launchpad.net/ubuntu/+source/keepalived/+question/676267)

Since two weeks we lost our keepalived VRRP address on on our of systems, closer inspection reveals that this was due to the daily cronjob.Apparently something triggered a udev reload (and last week the same seemed to happen) which obviously triggers a network restart.

Are we right in assuming the below patch is the correct way (and shouldn't this be in the default install of the systemd service of keepalived).

/etc/systemd/system/multi-user.target.wants/keepalived.service:
--- keepalived.service.orig 2018-11-20 09:17:06.973924706 +0100
+++ keepalived.service 2018-11-20 09:05:55.984773226 +0100
@@ -4,6 +4,7 @@
Wants=network-online.target
# Only start if there is a configuration file
ConditionFileNotEmpty=/etc/keepalived/keepalived.conf
+PartOf=systemd-networkd.service

Accompanying syslog:
Nov 20 06:34:33 ourmachine systemd[1]: Starting Daily apt upgrade and clean activities...
Nov 20 06:34:42 ourmachine systemd[1]: Reloading.
Nov 20 06:34:44 ourmachine systemd[1]: message repeated 2 times: [ Reloading.]
Nov 20 06:34:44 ourmachine systemd[1]: Starting Daily apt download activities...
Nov 20 06:34:44 ourmachine systemd[1]: Stopping udev Kernel Device Manager...
Nov 20 06:34:44 ourmachine systemd[1]: Stopped udev Kernel Device Manager.
Nov 20 06:34:44 ourmachine systemd[1]: Starting udev Kernel Device Manager...
Nov 20 06:34:44 ourmachine systemd[1]: Started udev Kernel Device Manager.
Nov 20 06:34:45 ourmachine systemd[1]: Reloading.
Nov 20 06:34:45 ourmachine systemd[1]: Reloading.
Nov 20 06:35:13 ourmachine systemd[1]: Reexecuting.
Nov 20 06:35:13 ourmachine systemd[1]: Stopped Wait for Network to be Configured.
Nov 20 06:35:13 ourmachine systemd[1]: Stopping Wait for Network to be Configured...
Nov 20 06:35:13 ourmachine systemd[1]: Stopping Network Service..

See original description

Tags:

Revision history for this message

Karl Stenerud (kstenerud) wrote on 2019-01-07:

Hi Tom, thanks for bringing up this issue!

As this package needs to work with both server and desktop editions, I'm not sure how this would work with networkmanager...

Would you be able to put together a simple VM test case that demonstrates the issue and fix, and ensures things still work as a whole?

Changed in keepalived (Ubuntu):
status:	New → Confirmed

Revision history for this message

Ben Hollins (bhollins) wrote on 2019-01-11:

Hi Karl.
I can confirm this issue also, we encountered it this morning on a 2 node keepalived cluster consisting of 2 VMWARE ubuntu 18.04.1 VMs. In our case, a daily update task had restarted UDEV, which in turn restarted systemd-networkd. When this service restarted, the virtual ip on the MASTER node's NIC was lost, but nothing was recognised by keepalived and the ip was never restored on either MASTER or BACKUP. This caused an outage of services hosted on the virtualip.

When we investigated, we found that both MASTER and BACKUP nodes only had their own primary ip addresses, and neither node had the virtual ip. The virtual ip was unreachable. No managed failover by keepalived had occurred.

We restarted keepalived on both nodes, which caused the virtual ip to re-appear on the MASTER node's NIC. We can reproduce this on demand right now by manually restarting systemd-networkd, which causes the virtual ip to vanish. The only way to get it to return is to then manually restart keepalived.

Notably, when this problem occurs, nothing is logged by keepalived in syslog at all, which suggests it's not recognising the restart of networkd, or the loss of the virtual ip, and therefore not announcing it to the BACKUP node.

There is a good discussion on the ubuntu forums about this, and someone has confirmed that downgrading the keepalived package to the previous one resolves this behaviour, so it does look like the patch in the latest package version has potentially introduced this.

Here is the thread for ref:
https://ubuntuforums.org/showthread.php?t=2406400&p=13819524#post13819524

I'm happy to test anything required on a VM if necessary. We haven't taken any action to workaround this yet.

Revision history for this message

Tom Scholten (snowtom) wrote on 2019-01-13:

I don't have a desktop edition ready at the moment, but would be willing to pick that up if time allows. I concur with the findings of Ben, we seem to hit the same, although we 'patched' the systemd unit file.

Looking around a bit I'm not sure what the best way would be to make it systemd-networking and NetworkManager proof. Looking at the documentation (https://www.freedesktop.org/software/systemd/man/systemd.unit.html#PartOf) it looks like PartOf is actually not a requirement and as such both could be in there.

Revision history for this message

Ben Hollins (bhollins) wrote on 2019-01-14:

Just to add, we also attempted to work around this by adding a systemd override to netplan to recycle the keepalived service whenever network management was restarted. While it corrected the issue, it also created another problem whereby the system hung on startup after a reboot waiting endlessly for the network daemon to start. I had to revert this change in light of this.

For now, I've disabled ubuntu auto update task completely and hopefully this will avoid any network service restarts until the issue is resolved within the package.

Sebastien Bacher (seb128) on 2019-01-14

tags:

added: rls-dd-incoming

Revision history for this message

Dimitri John Ledkov (xnox) wrote on 2019-01-15:

There are three cases:
- upgrades from xenial with ifupdown
- fresh installs with netplan/systemd-networkd
- fresh installs with network-manager

I think the right way to integrate this with networkd is to ship a networkd-dispatcher script to do the right thing w.r.t. keepalived

http://manpages.ubuntu.com/manpages/bionic/man8/networkd-dispatcher.8.html

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2019-01-15:

Do all of you have daily network restarts? What's the reason? Or was this a one-off update that just by chance had a package upgrade that required such a restart?

That being said, I of course agree that losing the virtual IP in such a situation is bad.

Revision history for this message

Julian Andres Klode (juliank) wrote on 2019-01-15:

I guess you want to systemctl reload keepalived on most state changes in networkd, but I'm not sure. Probably not on off and no-carrier, as well, there's no traffic possible yet.

That said, I do wonder why you need to do this in the first place. keepalived really should listen to netlink and figure out interface status on its own.

Revision history for this message

Sebastien Bacher (seb128) wrote on 2019-01-15:

> something triggered a udev reload (and last week the same seemed to happen) which obviously triggers a network restart

why would an udev reload trigger a network restart? just as a random side note, snapd does interact with udev rules and can trigger reload (or did in the past) so it's not impossible it could be the one triggering the event

Revision history for this message

Ben Hollins (bhollins) wrote on 2019-01-15:

Andreas, in our case this was a one off. The system had been running for 2 months without any issues, and this sudden network restart due to a daily update check was not expected. We did a lot of testing different failover events (disconnecting vNIC, powering off a single node, stopping keepalived service etc), but we never specifically tested a restart of the networkd service. This bug has potentially gone unnoticed for some time because of this aspect, and the frequency of this event occurring (in our case), is low.

Just for visibility, the specific workaround I attempted to implement which recycled keepalived on netowrk restart was to add an override to networkd unit file using the following commands. This results in the immediate issues being fixed (keepalived restarts as desired), but prevents the network daemon from starting up after a reboot causing the system to become stuck in a wait loop. I had to boot to recovery mode and remove the override file again to restore functionality.

---------------------------------
sudo systemctl edit systemd-networkd

then in the override file via NANO:

[Service]
ExecStartPost=!/bin/systemctl restart keepalived
---------------------------------

Revision history for this message

Launchpad Janitor (janitor) wrote on 2019-01-24:

#10

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in networkd-dispatcher (Ubuntu):
status:	New → Confirmed

Revision history for this message

Ben Hollins (bhollins) wrote on 2019-02-20:

#11

We had this happen again this morning, causing an outage. Same issue, apt daily leads to a udev restart, which in turn restarted the network service and caused VRRP address to be lost on both haproxy nodes. I am going to try and completely disable the apt daily scheduled job while this bug remains.

Julian Andres Klode (juliank) on 2019-02-20

Changed in networkd-dispatcher (Ubuntu):
status:	Confirmed → Opinion
status:	Opinion → Invalid

Robie Basak (racb) on 2019-02-23

tags:	added: server-triage-discuss
Changed in keepalived (Ubuntu):
status:	Confirmed → Triaged
importance:	Undecided → High

Robie Basak (racb) on 2019-02-27

tags:

removed: server-triage-discuss

Karl Stenerud (kstenerud) on 2019-03-19

Changed in keepalived (Ubuntu):
assignee:	nobody → Karl Stenerud (kstenerud)
description:	updated
description:	updated

Revision history for this message

Karl Stenerud (kstenerud) wrote on 2019-03-20:

#12

There is a fix upstream for this issue in keepalived 2.0. I'm looking into what would be required to backport the fix. In the meantime, there is a workaround that I hope will be sufficient for your needs, as discovered by https://chr4.org/blog/2019/01/21/make-keepalived-play-nicely-with-netplan-slash-systemd-network/

You'll need to create a dummy interface, and then assign the virtual IP to that. Here's an example using a VM, which will generate a virtual ip of x.y.z.3. You can set your own last quad by changing the last part of the sed command '\1.3/g' to .4 or .215 or whatever:

multipass launch daily:bionic --name tester && multipass exec tester -- sudo su

Inside the VM:

apt update && apt dist-upgrade -y && apt install -y keepalived &&
echo "vrrp_instance VI_1 {
    virtual_router_id 33
    state MASTER
    interface ens3

    virtual_ipaddress {
        $(ip addr | grep 'inet ' | grep global | head -1 | sed 's/.*inet $[0-9]*\.[0-9]*\.[0-9]*$\..*/\1.3/g') dev keepalived0
    }
}" >/etc/keepalived/keepalived.conf &&
echo "[NetDev]
Name=keepalived0
Kind=dummy" >/lib/systemd/network/90-keepalived.netdev &&
service systemd-networkd restart &&
service keepalived start

# There will be a new IP address x.y.z.3/32 added to keepalived0
ip addr

# Restart networkd. The IP address doesn't get destroyed like it did in the bug report
systemctl restart systemd-networkd
ip addr

# Restart keepalived. The IP address gets rebuild the same as before
systemctl restart keepalived
ip addr

Revision history for this message

Ben Hollins (bhollins) wrote on 2019-03-21:

#13

Thanks Karl. This solution from Chris Aumann seems perfect, and I've just deployed it onto our HAPROXY pair. Just restarted udev and networkd, and everything survives as expected now. Much appreciated.

Revision history for this message

Robert Kirscht (robotic1) wrote on 2019-09-04:

#14

Nice one keepalived crew for this excellent little app! Any news on a fix of this bug for the 1.x branch?

Revision history for this message

Chris Stone (cjstone707) wrote on 2019-09-04:

#15

Thank you Karl, this one bit us too this morning. Will there be a fix soon?

Revision history for this message

Tom Scholten (snowtom) wrote on 2019-09-04: Re: [Bug 1810583] Re: Daily cron restarts network on unattended updates but keepalived .service is not restarted as a dependency

#16

Download full text (3.9 KiB)

Yup, us too :(

Just amended my original fix from filing this issue again to the systemd-service and made it persistent (for now) through our automation tooling

> Op 4 sep. 2019, om 19:57 heeft Chris Stone <email address hidden> het volgende geschreven:
>
> Thank you Karl, this one bit us too this morning. Will there be a fix
> soon?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1810583
>
> Title:
> Daily cron restarts network on unattended updates but keepalived
> .service is not restarted as a dependency
>
> Status in keepalived package in Ubuntu:
> Triaged
> Status in networkd-dispatcher package in Ubuntu:
> Invalid
>
> Bug description:
> [Impact]
>
> If systemd-networkd is restarted, any VRRP from keepalived are not
> restored.
>
> [Test Case]
>
> multipass launch daily:bionic --name tester && multipass exec tester
> -- sudo su
>
> apt update && apt dist-upgrade -y && apt install -y keepalived &&
> echo "vrrp_instance VI_1 {
> virtual_router_id 33
> state MASTER
> interface ens3
>
> virtual_ipaddress {
> $(ip addr | grep 'inet ' | grep global | head -1 | sed 's/.*inet $[0-9]*\.[0-9]*\.[0-9]*$\..*/\1.3/g')
> }
> }" >/etc/keepalived/keepalived.conf &&
> service keepalived start &&
>
> # There will be a new IP address x.x.x.3/32 added to ens3
> ip addr
>
> # Restart networkd. The IP address won't come back
> systemctl restart systemd-networkd
> ip addr
>
> # Restart keepalived. The IP address will come back
> systemctl restart keepalived
> ip addr
>
> [Regression Potential]
>
> TODO
>
> [Original Description]
>
> Description: Ubuntu 18.04.1 LTS
> Release: 18.04
> ii keepalived 1:1.3.9-1ubuntu0.18.04.1 amd64 Failover and monitoring daemon for LVS clusters
>
> (From unanswered
> https://answers.launchpad.net/ubuntu/+source/keepalived/+question/676267)
>
> Since two weeks we lost our keepalived VRRP address on on our of
> systems, closer inspection reveals that this was due to the daily
> cronjob.Apparently something triggered a udev reload (and last week
> the same seemed to happen) which obviously triggers a network restart.
>
> Are we right in assuming the below patch is the correct way (and
> shouldn't this be in the default install of the systemd service of
> keepalived).
>
> /etc/systemd/system/multi-user.target.wants/keepalived.service:
> --- keepalived.service.orig 2018-11-20 09:17:06.973924706 +0100
> +++ keepalived.service 2018-11-20 09:05:55.984773226 +0100
> @@ -4,6 +4,7 @@
> Wants=network-online.target
> # Only start if there is a configuration file
> ConditionFileNotEmpty=/etc/keepalived/keepalived.conf
> +PartOf=systemd-networkd.service
>
> Accompanying syslog:
> Nov 20 06:34:33 ourmachine systemd[1]: Starting Daily apt upgrade and clean activities...
> Nov 20 06:34:42 ourmachine systemd[1]: Reloading.
> Nov 20 06:34:44 ourmachine systemd[1]: message repeated 2 times: [ Reloading.]
> Nov 20 06:34:44 ourmachine systemd[1]: Starting Daily apt download activities...
> Nov 20 06:34:44 ourmachine systemd[1]: Stopping udev Kernel Device M...

Yup, us too :(

Just amended my original fix from filing this issue again to the systemd-service and made it persistent (for now) through our automation tooling

> Op 4 sep. 2019, om 19:57 heeft Chris Stone <1810583@bugs.launchpad.net> het volgende geschreven:
> 
> Thank you Karl, this one bit us too this morning. Will there be a fix
> soon?
> 
> -- 
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1810583
> 
> Title:
>  Daily cron restarts network on unattended updates but keepalived
>  .service is not restarted as a dependency
> 
> Status in keepalived package in Ubuntu:
>  Triaged
> Status in networkd-dispatcher package in Ubuntu:
>  Invalid
> 
> Bug description:
>  [Impact]
> 
>  If systemd-networkd is restarted, any VRRP from keepalived are not
>  restored.
> 
>  [Test Case]
> 
>  multipass launch daily:bionic --name tester && multipass exec tester
>  -- sudo su
> 
>  apt update && apt dist-upgrade -y && apt install -y keepalived &&
>  echo "vrrp_instance VI_1 {
>      virtual_router_id 33
>      state MASTER
>      interface ens3
> 
>      virtual_ipaddress {
>          $(ip addr | grep 'inet ' | grep global | head -1 | sed 's/.*inet $[0-9]*\.[0-9]*\.[0-9]*$\..*/\1.3/g')
>      }
>  }" >/etc/keepalived/keepalived.conf &&
>  service keepalived start &&
> 
>  # There will be a new IP address x.x.x.3/32 added to ens3
>  ip addr
> 
>  # Restart networkd. The IP address won't come back
>  systemctl restart systemd-networkd
>  ip addr
> 
>  # Restart keepalived. The IP address will come back
>  systemctl restart keepalived
>  ip addr
> 
>  [Regression Potential]
> 
>  TODO
> 
>  [Original Description]
> 
>  Description: Ubuntu 18.04.1 LTS
>  Release: 18.04
>  ii keepalived 1:1.3.9-1ubuntu0.18.04.1 amd64 Failover and monitoring daemon for LVS clusters
> 
>  (From unanswered
>  https://answers.launchpad.net/ubuntu/+source/keepalived/+question/676267)
> 
>  Since two weeks we lost our keepalived VRRP address on on our of
>  systems, closer inspection reveals that this was due to the daily
>  cronjob.Apparently something triggered a udev reload (and last week
>  the same seemed to happen) which obviously triggers a network restart.
> 
>  Are we right in assuming the below patch is the correct way (and
>  shouldn't this be in the default install of the systemd service of
>  keepalived).
> 
>  /etc/systemd/system/multi-user.target.wants/keepalived.service:
>  --- keepalived.service.orig 2018-11-20 09:17:06.973924706 +0100
>  +++ keepalived.service 2018-11-20 09:05:55.984773226 +0100
>  @@ -4,6 +4,7 @@
>   Wants=network-online.target
>   # Only start if there is a configuration file
>   ConditionFileNotEmpty=/etc/keepalived/keepalived.conf
>  +PartOf=systemd-networkd.service
> 
>  Accompanying syslog:
>  Nov 20 06:34:33 ourmachine systemd[1]: Starting Daily apt upgrade and clean activities...
>  Nov 20 06:34:42 ourmachine systemd[1]: Reloading.
>  Nov 20 06:34:44 ourmachine systemd[1]: message repeated 2 times: [ Reloading.]
>  Nov 20 06:34:44 ourmachine systemd[1]: Starting Daily apt download activities...
>  Nov 20 06:34:44 ourmachine systemd[1]: Stopping udev Kernel Device Manager...
>  Nov 20 06:34:44 ourmachine systemd[1]: Stopped udev Kernel Device Manager.
>  Nov 20 06:34:44 ourmachine systemd[1]: Starting udev Kernel Device Manager...
>  Nov 20 06:34:44 ourmachine systemd[1]: Started udev Kernel Device Manager.
>  Nov 20 06:34:45 ourmachine systemd[1]: Reloading.
>  Nov 20 06:34:45 ourmachine systemd[1]: Reloading.
>  Nov 20 06:35:13 ourmachine systemd[1]: Reexecuting.
>  Nov 20 06:35:13 ourmachine systemd[1]: Stopped Wait for Network to be Configured.
>  Nov 20 06:35:13 ourmachine systemd[1]: Stopping Wait for Network to be Configured...
>  Nov 20 06:35:13 ourmachine systemd[1]: Stopping Network Service..
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/keepalived/+bug/1810583/+subscriptions

Revision history for this message

Rafael David Tinoco (rafaeldtinoco) wrote on 2019-09-13:

#17

The following 3 bugs:

https://bugs.launchpad.net/bugs/1815101
https://bugs.launchpad.net/bugs/1819074
https://bugs.launchpad.net/bugs/1810583

Have the same root cause: the fact that systemd-network messes with secondary IP addresses in NICs managed by systemd.

I'm marking all other cases as a duplicate of LP: #1815101.

TODO here is the following:

- There are mainly 2 "fixes" for this issue:

1) keepalived is able to recognize systemd-networkd changes and change cluster status in order to reconfigure managed NICs (keepalived (> 2.0.x)).

2) systemd-networkd implements a new stanza (KeepConfiguration=) to systemd service unit files in order to fix not only this behavior but all those HA related software that manages secondary IPs and/or aliases to NICs being managed by systemd-networkd.

I think the most appropriate would make sure those 2 features work in Eoan, both, together, and then make sure the SRUs are done to Disco and Bionic. One problem w/ the item (2) is that netplan will also have to support the new "KeepConfiguration=" systemd service file stanza, but, the fix (2) is more appropriate for all other HA related softwares controlling virtual IPs (CTDB, Pacemaker, and so ...).

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1815101 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntukeepalived package

Daily cron restarts network on unattended updates but keepalived .service is not restarted as a dependency

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
keepalived package