[master] Restarting systemd-networkd breaks keepalived, heartbeat, corosync, pacemaker (interface aliases are restarted)

Bug #1815101 reported by Leroy Tennison on 2019-02-07
116
This bug affects 26 people
Affects Status Importance Assigned to Milestone
Keepalived Charm
Undecided
Unassigned
netplan
Undecided
Rafael David Tinoco
heartbeat (Ubuntu)
Low
Unassigned
keepalived (Ubuntu)
Medium
Rafael David Tinoco
Bionic
Medium
Rafael David Tinoco
Disco
Medium
Rafael David Tinoco
Eoan
Medium
Rafael David Tinoco
systemd (Ubuntu)
Medium
Rafael David Tinoco
Bionic
Medium
Rafael David Tinoco
Disco
Medium
Rafael David Tinoco
Eoan
Medium
Rafael David Tinoco

Bug Description

[impact]

- ALL related HA software has a small problem if interfaces are being managed by systemd-networkd: nic restarts/reconfigs are always going to wipe all interfaces aliases when HA software is not expecting it to (no coordination between them.

- keepalived, smb ctdb, pacemaker, all suffer from this. Pacemaker is smarter in this case because it has a service monitor that will restart the virtual IP resource, in affected node & nic, before considering a real failure, but other HA service might consider a real failure when it is not.

[test case]

- comment #14 is a full test case: to have 3 node pacemaker, in that example, and cause a networkd service restart: it will trigger a failure for the virtual IP resource monitor.

- other example is given in the original description for keepalived. both suffer from the same issue (and other HA softwares as well).

[regression potential]

- this backports KeepConfiguration parameter, which adds some significant complexity to networkd's configuration and behavior, which could lead to regressions in correctly configuring the network at networkd start, or incorrectly maintaining configuration at networkd restart, or losing network state at networkd stop.

- Any regressions are most likely to occur during networkd start, restart, or stop, and most likely to involve missing or incorrect ip address(es).

- the change is based in upstream patches adding the exact feature we needed to fix this issue & it will be integrated with a netplan change to add the needed stanza to systemd nic configuration file (KeepConfiguration=)

[other info]

original description:
---

Configure netplan for interfaces, for example (a working config with IP addresses obfuscated)

network:
    ethernets:
        eth0:
            addresses: [192.168.0.5/24]
            dhcp4: false
            nameservers:
              search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com]
              addresses: [10.22.11.1]
        eth2:
            addresses:
              - 12.13.14.18/29
              - 12.13.14.19/29
            gateway4: 12.13.14.17
            dhcp4: false
            nameservers:
              search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com]
              addresses: [10.22.11.1]
        eth3:
            addresses: [10.22.11.6/24]
            dhcp4: false
            nameservers:
              search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com]
              addresses: [10.22.11.1]
        eth4:
            addresses: [10.22.14.6/24]
            dhcp4: false
            nameservers:
              search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com]
              addresses: [10.22.11.1]
        eth7:
            addresses: [9.5.17.34/29]
            dhcp4: false
            optional: true
            nameservers:
              search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com]
              addresses: [10.22.11.1]
    version: 2

Configure keepalived (again, a working config with IP addresses obfuscated)

global_defs # Block id
{
notification_email {
        <email address hidden>
}
        notification_email_from <email address hidden>
        smtp_server 10.22.11.7 # IP
        smtp_connect_timeout 30 # integer, seconds
        router_id system3 # string identifying the machine,
                                     # (doesn't have to be hostname).
        vrrp_mcast_group4 224.0.0.18 # optional, default 224.0.0.18
        vrrp_mcast_group6 ff02::12 # optional, default ff02::12
        enable_traps # enable SNMP traps
}
vrrp_sync_group collection {
        group {
                wan
                lan
                phone
        }
vrrp_instance wan {
        state MASTER
        interface eth2
        virtual_router_id 77
        priority 150
        advert_int 1
        smtp_alert
        authentication {
                auth_type PASS
                auth_pass BlahBlah
        }
        virtual_ipaddress {
        12.13.14.20
        }
}
vrrp_instance lan {
        state MASTER
        interface eth3
        virtual_router_id 78
        priority 150
        advert_int 1
        smtp_alert
        authentication {
                auth_type PASS
                auth_pass MoreBlah
        }
        virtual_ipaddress {
                10.22.11.13/24
        }
}
vrrp_instance phone {
        state MASTER
        interface eth4
        virtual_router_id 79
        priority 150
        advert_int 1
        smtp_alert
        authentication {
                auth_type PASS
                auth_pass MostBlah
        }
        virtual_ipaddress {
                10.22.14.3/24
        }
}

At boot the affected interfaces have:
5: eth4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether ab:cd:ef:90:c0:e3 brd ff:ff:ff:ff:ff:ff
    inet 10.22.14.6/24 brd 10.22.14.255 scope global eth4
       valid_lft forever preferred_lft forever
    inet 10.22.14.3/24 scope global secondary eth4
       valid_lft forever preferred_lft forever
    inet6 fe80::ae1f:6bff:fe90:c0e3/64 scope link
       valid_lft forever preferred_lft forever
7: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether ab:cd:ef:b0:26:29 brd ff:ff:ff:ff:ff:ff
    inet 10.22.11.6/24 brd 10.22.11.255 scope global eth3
       valid_lft forever preferred_lft forever
    inet 10.22.11.13/24 scope global secondary eth3
       valid_lft forever preferred_lft forever
    inet6 fe80::ae1f:6bff:feb0:2629/64 scope link
       valid_lft forever preferred_lft forever
9: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether ab:cd:ef:b0:26:2b brd ff:ff:ff:ff:ff:ff
    inet 12.13.14.18/29 brd 12.13.14.23 scope global eth2
       valid_lft forever preferred_lft forever
    inet 12.13.14.20/32 scope global eth2
       valid_lft forever preferred_lft forever
    inet 12.33.89.19/29 brd 12.13.14.23 scope global secondary eth2
       valid_lft forever preferred_lft forever
    inet6 fe80::ae1f:6bff:feb0:262b/64 scope link
       valid_lft forever preferred_lft forever

Run 'netplan try' (didn't even make any changes to the configuration) and the keepalived addresses disappear never to return, the affected interfaces have:
5: eth4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether ab:cd:ef:90:c0:e3 brd ff:ff:ff:ff:ff:ff
    inet 10.22.14.6/24 brd 10.22.14.255 scope global eth4
       valid_lft forever preferred_lft forever
    inet6 fe80::ae1f:6bff:fe90:c0e3/64 scope link
       valid_lft forever preferred_lft forever
7: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether ab:cd:ef:b0:26:29 brd ff:ff:ff:ff:ff:ff
    inet 10.22.11.6/24 brd 10.22.11.255 scope global eth3
       valid_lft forever preferred_lft forever
    inet6 fe80::ae1f:6bff:feb0:2629/64 scope link
       valid_lft forever preferred_lft forever
9: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether ab:cd:ef:b0:26:2b brd ff:ff:ff:ff:ff:ff
    inet 12.13.14.18/29 brd 12.13.14.23 scope global eth2
       valid_lft forever preferred_lft forever
    inet 12.33.89.19/29 brd 12.13.14.23 scope global secondary eth2
       valid_lft forever preferred_lft forever
    inet6 fe80::ae1f:6bff:feb0:262b/64 scope link
       valid_lft forever preferred_lft forever

Related branches

This isn't netplan, it's systemd-networkd. Netplan only writes configuration for the chosen renderer (in this case, systemd-networkd).

Either systemd needs to not wipe out foreign addresses (I believe there is a PR in git for that) or keepalived should somehow interface with systemd so they can collaborate on setting and keeping up the IP addresses.

Reassigning.

no longer affects: ubuntu
Changed in netplan:
status: New → Invalid
Changed in keepalived (Ubuntu):
status: New → Incomplete
Changed in systemd (Ubuntu):
status: New → Triaged

Kept a task for keepalived (Incomplete) in case it turns out there's something we can do there.

Also added a task for systemd, since that would definitely require development work.

Marked Invalid for netplan, as since netplan only translates config from the YAML to what networkd or NetworkManager require, there isn't really anything I see we can do in netplan directly. Applying absolutely does need to 'poke' the renderer somehow for the configuration to be applied; but if it turns out there's something to change in netplan we can update the task.

Turns out there isn't really a PR about foreign addresses handling; though two are somewhat relevant:

https://github.com/systemd/systemd/pull/9956
and
https://github.com/systemd/systemd/pull/7403

But neither will completely address the problem: systemd-networks expects to be authoritative on the network setup, which is somewhat counter to its use in conjunction with keepalived.

As a workaround, for now, one can use /etc/network/interfaces (and/or no configuration in netplan for the interfaces handled by keepalived) to configure the network.

Leroy Tennison (ltennison) wrote :

I am trying ifupdown. Do I need to do anythnig else or is what I've done adequate?

cdmiller (cdmiller) wrote :

Newer keepalived (> 2.0.x) addresses the systemd-networkd behavior. From keepalived 2.0.0 release notes: "Transition to backup state if a VIP or eVIP is removed When we next transition to master the addresses will be restored. If nopreempt is not set, that will be almost immediately."

Any chance of a keepalived 2.0.x backport package for Ubuntu 18.0.4?

Leroy Tennison (ltennison) wrote :

I note this bug is marked Incomplete meaning that information is missing, what else is needed?

Might I ask - how much is this bug related or a dup to bug 1819074?

Andreas Hasenack (ahasenack) wrote :

Seems a dupe to me.

For the bionic case, with keepalived < 2.0, is there some keepalived script that can be run to restore the vip, after networkd removed it? We could run it as a network-dispatcher hook then. Has this been considered?

summary: - netplan removes keepalived configuration
+ Restarting systemd-networkd breaks keepalived clusters
summary: - Restarting systemd-networkd breaks keepalived clusters
+ [master] Restarting systemd-networkd breaks keepalived clusters

If I understand the keepalived > 2.0.x behavior referred to by cdmiller above (see 2019-03-07 comment) that is not the appropriate response to the problem. Granted, it mitigates the consequences butr doesn't address the underlying issue. A systemd-source issue should not cause keepalived failover since failover is designed to address issues of system or hardware failure, not the bad behavior of other system software. systemd needs to be made to cooperate with other software rather than assuming it is the only authority on the system.

Robie Basak (racb) wrote :

It looks like there is some clear and actionable work in keepalived here (even if as a workaround and the real fix ends up being in systemd), so I'm marking it as Triaged.

FTR, the Ubuntu Server Team is aware of this as a high level issue and it is high up in our list of priorities to determine how to address it properly.

Changed in keepalived (Ubuntu):
status: Incomplete → Triaged
Bryce Harrington (bryce) wrote :

The aforementioned link shows there's been work towards a fix in systemd. Can't say if that suggests what can be done to improve keepalived, but I've tagged this "server-next" to get it on the Ubuntu SErver Team's high priority list, as per Robie's earlier comment.

tags: added: server-next

The following 3 bugs:

https://bugs.launchpad.net/bugs/1815101
https://bugs.launchpad.net/bugs/1819074
https://bugs.launchpad.net/bugs/1810583

Have the same root cause: the fact that systemd-network messes with secondary IP addresses in NICs managed by systemd.

I'm marking all other cases as a duplicate of LP: #1815101.

TODO here is the following:

- There are mainly 2 "fixes" for this issue:

1) keepalived is able to recognize systemd-networkd changes and change cluster status in order to reconfigure managed NICs (keepalived (> 2.0.x)).

2) systemd-networkd implements a new stanza (KeepConfiguration=) to systemd service unit files in order to fix not only this behavior but all those HA related software that manages secondary IPs and/or aliases to NICs being managed by systemd-networkd.

I think the most appropriate would make sure those 2 features work in Eoan, both, together, and then make sure the SRUs are done to Disco and Bionic. One problem w/ the item (2) is that netplan will also have to support the new "KeepConfiguration=" systemd service file stanza, but, the fix (2) is more appropriate for all other HA related softwares controlling virtual IPs (CTDB, Pacemaker, and so ...).

Changed in netplan:
status: Invalid → Confirmed
Changed in keepalived (Ubuntu):
status: Triaged → Confirmed
Changed in systemd (Ubuntu):
status: Triaged → Confirmed
Changed in keepalived (Ubuntu Bionic):
status: New → Confirmed
Changed in keepalived (Ubuntu Disco):
status: New → Confirmed

Based on comment #12, and other comments from other duplicate cases, I'll summarize here in a better (and consolidated way) how to reproduce the issue, how to mitigate it using the dummy workaround, and how to fix it (with the backports/merge requests). At the end I might provide a PPA asking for feedback.

Changed in systemd (Ubuntu Bionic):
status: New → Confirmed
Changed in systemd (Ubuntu Disco):
status: New → Confirmed
Changed in keepalived (Ubuntu Bionic):
importance: Undecided → Medium
Changed in keepalived (Ubuntu Disco):
importance: Undecided → Medium
Changed in keepalived (Ubuntu Eoan):
importance: Undecided → Medium
Changed in systemd (Ubuntu Bionic):
importance: Undecided → Medium
Changed in systemd (Ubuntu Disco):
importance: Undecided → Medium
Changed in systemd (Ubuntu Eoan):
importance: Undecided → Medium
Changed in keepalived (Ubuntu Bionic):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in keepalived (Ubuntu Disco):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in keepalived (Ubuntu Eoan):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in systemd (Ubuntu Bionic):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in systemd (Ubuntu Disco):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in systemd (Ubuntu Eoan):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in netplan:
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in systemd (Ubuntu Eoan):
status: Confirmed → In Progress
Changed in keepalived (Ubuntu Eoan):
status: Confirmed → In Progress
Changed in heartbeat (Ubuntu Bionic):
importance: Undecided → Medium
status: New → Triaged
Changed in heartbeat (Ubuntu Disco):
importance: Undecided → Medium
status: New → Triaged
Changed in heartbeat (Ubuntu Eoan):
importance: Undecided → Low
status: New → Triaged
Changed in heartbeat (Ubuntu Bionic):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in heartbeat (Ubuntu Disco):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in heartbeat (Ubuntu Eoan):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Download full text (5.9 KiB)

Alright,

As this is a problem that does not only affect keepalived, but, all cluster-like softwares dealing with aliases in any existing interface, managed or not by systemd, I have tested the same test case in a pacemaker based cluster, with 3 nodes, having 1 virtual IP + a lighttpd instance running in the same resource group:

----

(k)inaddy@kcluster01:~$ crm config show
node 1: kcluster01
node 2: kcluster02
node 3: kcluster03
primitive fence_kcluster01 stonith:fence_virsh \
 params ipaddr=192.168.100.205 plug=kcluster01 action=off login=stonithmgr passwd=xxxx use_sudo=true delay=2 \
 op monitor interval=60s
primitive fence_kcluster02 stonith:fence_virsh \
 params ipaddr=192.168.100.205 plug=kcluster02 action=off login=stonithmgr passwd=xxxx use_sudo=true delay=4 \
 op monitor interval=60s
primitive fence_kcluster03 stonith:fence_virsh \
 params ipaddr=192.168.100.205 plug=kcluster03 action=off login=stonithmgr passwd=xxxx use_sudo=true delay=6 \
 op monitor interval=60s
primitive virtual_ip IPaddr2 \
 params ip=10.0.3.1 nic=eth3 \
 op monitor interval=10s
primitive webserver systemd:lighttpd \
 op monitor interval=10 timeout=60
group webserver_virtual_ip webserver virtual_ip
location l_fence_kcluster01 fence_kcluster01 -inf: kcluster01
location l_fence_kcluster02 fence_kcluster02 -inf: kcluster02
location l_fence_kcluster03 fence_kcluster03 -inf: kcluster03
property cib-bootstrap-options: \
 have-watchdog=true \
 dc-version=2.0.1-9e909a5bdd \
 cluster-infrastructure=corosync \
 cluster-name=debian \
 stonith-enabled=true \
 stonith-action=off \
 no-quorum-policy=stop

----

(k)inaddy@kcluster01:~$ cat /etc/netplan/cluster.yaml
network:
    version: 2
    renderer: networkd
    ethernets:
        eth1:
            dhcp4: no
            dhcp6: no
            addresses: [10.0.1.2/24]
        eth2:
            dhcp4: no
            dhcp6: no
            addresses: [10.0.2.2/24]
        eth3:
            dhcp4: no
            dhcp6: no
            addresses: [10.0.3.2/24]
        eth4:
            dhcp4: no
            dhcp6: no
            addresses: [10.0.4.2/24]
        eth5:
            dhcp4: no
            dhcp6: no
            addresses: [10.0.5.2/24]

----

AND the virtual IP failed right after the netplan acted in systemd network interface.

(k)inaddy@kcluster03:~$ sudo netplan apply
(k)inaddy@kcluster03:~$ ping 10.0.3.1
PING 10.0.3.1 (10.0.3.1) 56(84) bytes of data.
From 10.0.3.4 icmp_seq=1 Destination Host Unreachable
From 10.0.3.4 icmp_seq=2 Destination Host Unreachable
From 10.0.3.4 icmp_seq=3 Destination Host Unreachable
From 10.0.3.4 icmp_seq=4 Destination Host Unreachable
From 10.0.3.4 icmp_seq=5 Destination Host Unreachable
From 10.0.3.4 icmp_seq=6 Destination Host Unreachable
64 bytes from 10.0.3.1: icmp_seq=7 ttl=64 time=0.088 ms
64 bytes from 10.0.3.1: icmp_seq=8 ttl=64 time=0.076 ms

--- 10.0.3.1 ping statistics ---
8 packets transmitted, 2 received, +6 errors, 75% packet loss, time 7128ms
rtt min/avg/max/mdev = 0.076/0.082/0.088/0.006 ms, pipe 4

Liked explained in this bug description. With that, virtual_ip_monitor, from pacemaker, realized the virtual IP was gone and re-started it in the same node:

----

(k)inaddy@k...

Read more...

summary: - [master] Restarting systemd-networkd breaks keepalived clusters
+ [master] Restarting systemd-networkd breaks keepalived, heartbeat,
+ corosync, pacemaker (interface aliases are restarted)

The commits bellow implement support to "keep configuration":

commit 1e498853a39b46155cb89b5c9e74ecb27aaba3ed
Author: Yu Watanabe <email address hidden>
Date: Mon Jun 3 01:21:13 2019

    test-network: add tests for KeepConfiguration=

commit c98d78d32abba6aadbe89eece7acf0742f59047c
Author: Yu Watanabe <email address hidden>
Date: Mon Jun 3 03:37:25 2019

    man: add documentation about KeepConfiguration

commit db51778f85cb076e9ed1fe7f7e29cc740365c245
Author: Yu Watanabe <email address hidden>
Date: Mon Jun 3 00:33:13 2019

    network: make KeepConfiguration=static drop DHCP addresses and routes

    Also, KeepConfiguration=dhcp drops static foreign addresses and routes.

commit 95355a281c06c5970b7355c38b066910c3be4958
Author: Yu Watanabe <email address hidden>
Date: Mon Jun 3 14:05:26 2019

    network: add KeepConfiguration=dhcp-on-stop

    The option prevents to drop lease address on stop.
    By setting this, we can safely restart networkd.

commit 7da377ef16a2112a673247b39041a180b07e973a
Author: Susant Sahani <email address hidden>
Date: Mon Jun 3 00:31:13 2019

    networkd: add support to keep configuration

for systemd-networkd.

IMO, we should rely in setting the keep configuration flag for the interfaes to be managed by 3rd part software (adding/removing aliases for virtual networks, VRRP interfaces, etc).

Edward Hope-Morley (hopem) wrote :

Thanks Rafael/Christian,

I see that all those patches are in 243 and Eoan is currently on 242 (albeit -6 but i dont think any are already backported) so we'll need to get this backported all the way down to Bionic.

max@power:~/git/systemd$ _c=( 7da377e 95355a2 db51778 c98d78d 1e49885 )
max@power:~/git/systemd$ for c in ${_c[@]}; do git tag --contains $c| egrep -v "\-rc"; done| sort -u
v243

Do we have a feel for if/when the keepalived fix(es) will be backportable to B (1.x) as well? Since those fixes already exist in Discco (2.0.10) it might be easier to start with those?

I will add the charm-keepalived to this LP since it will need support for the networkd/netplan fix once that is available.

@ed,

I just finished the backport to Eoan it was straightforward, I'll finish tests tomorrow with HA related software and networkd enabled HA clusters. After that I'll give you a better estimation about Disco and Bionic.

This is the total size of changes (systemd-networkd-tests.py is not so great to backport, will review that):

$ cat *.patch | diffstat
 man/systemd.network.xml | 27 +-
 src/network/networkd-dhcp4.c | 8
 src/network/networkd-link.c | 57 +++++-
 src/network/networkd-link.h | 2
 src/network/networkd-manager.c | 2
 src/network/networkd-network-gperf.gperf | 3
 src/network/networkd-network.c | 44 ++++
 src/network/networkd-network.h | 26 ++
 test/fuzz/fuzz-network-parser/directives.network | 1
 test/test-network/conf/24-keep-configuration-static.network | 5
 test/test-network/conf/dhcp-client-keep-configuration-dhcp-on-stop.network | 4
 test/test-network/conf/dhcp-client-keep-configuration-dhcp.network | 7
 test/test-network/systemd-networkd-tests.py | 94 +++++++++-
 13 files changed, 235 insertions(+), 45 deletions(-)

Good thing is that the logic is not drastically changed for this feature to exist. Sorry for the delay here, because of freeze we were running to close out some urgent issues for Eoan.

Test Case:

(k)rafaeldtinoco@kcluster03:~$ crm status
Stack: corosync
Current DC: kcluster02 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Thu Oct 10 17:13:19 2019
Last change: Thu Oct 10 17:11:48 2019 by root via cibadmin on kcluster01

3 nodes configured
5 resources configured

Online: [ kcluster01 kcluster02 kcluster03 ]

Full list of resources:

 fence_kcluster01 (stonith:fence_virsh): Started kcluster02
 fence_kcluster02 (stonith:fence_virsh): Started kcluster01
 fence_kcluster03 (stonith:fence_virsh): Started kcluster01
 Resource Group: webserver_virtual_ip
     webserver (systemd:lighttpd): Started kcluster03
     virtual_ip (ocf::heartbeat:IPaddr2): Started kcluster03

(k)rafaeldtinoco@kcluster03:~$ ip addr show eth3
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:b0:c3:06 brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.4/24 brd 10.0.3.255 scope global eth3
       valid_lft forever preferred_lft forever
    inet 10.0.3.1/24 brd 10.0.3.255 scope global secondary eth3
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:feb0:c306/64 scope link
       valid_lft forever preferred_lft forever

(k)rafaeldtinoco@kcluster03:~$ systemctl restart systemd-networkd

(k)rafaeldtinoco@kcluster03:~$ ip addr show eth3
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:b0:c3:06 brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.4/24 brd 10.0.3.255 scope global eth3
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:feb0:c306/64 scope link
       valid_lft forever preferred_lft forever

<wait for resource monitor timeout, pacemaker starts virtual_ip again>

(k)rafaeldtinoco@kcluster03:~$ ip addr show eth3
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:b0:c3:06 brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.4/24 brd 10.0.3.255 scope global eth3
       valid_lft forever preferred_lft forever
    inet 10.0.3.1/24 brd 10.0.3.255 scope global secondary eth3
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:feb0:c306/64 scope link
       valid_lft forever preferred_lft forever
(k)rafaeldtinoco@kcluster03:~$

Pacemaker logs:

Oct 10 17:14:37 kcluster03 IPaddr2(virtual_ip)[6901]: INFO: IP status = no, IP_CIP=
Oct 10 17:14:37 kcluster03 pacemaker-controld[1266]: notice: Result of stop operation for virtual_ip on kcluster03: 0 (ok)
Oct 10 17:14:37 kcluster03 IPaddr2(virtual_ip)[6951]: INFO: Adding inet address 10.0.3.1/24 with broadcast address 10.0.3.255 to device eth3
Oct 10 17:14:37 kcluster03 IPaddr2(virtual_ip)[6956]: INFO: Bringing device eth3 up
Oct 10 17:14:37 kcluster03 IPaddr2(virtual_ip)[6961]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /run/resource-agents/send_arp-10.0.3.1 eth3 10.0.3.1 auto not_used not_used
Oct 10 17:14:37 kcluster03 pacemaker-controld[1266]: notice: Result of start operation for virtual_ip on kcluster03: 0 (ok)

for the operation.

(k)rafaeldtinoco@kcluster01:~$ sudo vi /etc/systemd/network/10-netplan-eth3.network

<add KeepConfiguration=static to .network file>

(k)rafaeldtinoco@kcluster01:~$ systemctl restart systemd-networkd

(k)rafaeldtinoco@kcluster01:~$ ip addr show eth3
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:11:f0:03 brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.2/24 brd 10.0.3.255 scope global eth3
       valid_lft forever preferred_lft forever
    inet 10.0.3.1/24 brd 10.0.3.255 scope global secondary eth3
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe11:f003/64 scope link
       valid_lft forever preferred_lft forever

(k)rafaeldtinoco@kcluster01:~$ ip addr show eth3
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:11:f0:03 brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.2/24 brd 10.0.3.255 scope global eth3
       valid_lft forever preferred_lft forever
    inet 10.0.3.1/24 brd 10.0.3.255 scope global secondary eth3
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe11:f003/64 scope link
       valid_lft forever preferred_lft forever

(k)rafaeldtinoco@kcluster01:~$ systemctl restart systemd-networkd

(k)rafaeldtinoco@kcluster01:~$ ip addr show eth3
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:11:f0:03 brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.2/24 brd 10.0.3.255 scope global eth3
       valid_lft forever preferred_lft forever
    inet 10.0.3.1/24 brd 10.0.3.255 scope global secondary eth3
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe11:f003/64 scope link
       valid_lft forever preferred_lft forever

<interface does NOT restart the aliases>

Voila. Needs better testing with KeepConfiguration=dhcp.

tags: added: sts
Dan Streetman (ddstreet) on 2019-11-07
description: updated
description: updated

Hello Leroy, or anyone else affected,

Accepted systemd into eoan-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/242-7ubuntu3.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-eoan to verification-done-eoan. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-eoan. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Eoan):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-eoan

All autopkgtests for the newly accepted systemd (242-7ubuntu3.2) for eoan have finished running.
The following regressions have been reported in tests triggered by the package:

gvfs/1.42.1-1ubuntu1 (amd64)
systemd/242-7ubuntu3.2 (ppc64el)
ndctl/unknown (armhf)
casper/1.427 (amd64)
netplan.io/0.98-0ubuntu1 (ppc64el)
munin/unknown (armhf)
linux-oem-osp1/5.0.0-1026.29 (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/eoan/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Download full text (4.5 KiB)

(k)rafaeldtinoco@kcluster01:~$ dpkg -l | grep "ii systemd "
ii systemd 243-3ubuntu1 amd64 system and service manager

k)rafaeldtinoco@kcluster01:~$ for name in kcluster01 kcluster02 kcluster03; do ssh $name "dpkg -l | grep systemd "; done | grep "ii systemd "

ii systemd 243-3ubuntu1 amd64 system and service manager
ii systemd 243-3ubuntu1 amd64 system and service manager
ii systemd 243-3ubuntu1 amd64 system and service manager
----

(k)rafaeldtinoco@kcluster01:~$ for name in kcluster01 kcluster02 kcluster03; do ssh $name "cat /etc/systemd/network/10-netplan-eth3.network"; done
[Match]
Name=eth3

[Network]
LinkLocalAddressing=ipv6
Address=10.0.3.2/24
KeepConfiguration=static
[Match]
Name=eth3

[Network]
LinkLocalAddressing=ipv6
Address=10.0.3.3/24
KeepConfiguration=static
[Match]
Name=eth3

[Network]
LinkLocalAddressing=ipv6
Address=10.0.3.4/24
KeepConfiguration=static

----

(k)rafaeldtinoco@kcluster01:~$ crm status
Stack: corosync
Current DC: kcluster01 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Tue Nov 19 16:38:15 2019
Last change: Mon Nov 18 12:41:14 2019 by root via crm_resource on kcluster01

3 nodes configured
5 resources configured

Online: [ kcluster01 kcluster02 kcluster03 ]

Full list of resources:

 fence_kcluster01 (stonith:fence_virsh): Started kcluster02
 fence_kcluster02 (stonith:fence_virsh): Started kcluster01
 fence_kcluster03 (stonith:fence_virsh): Started kcluster01
 Resource Group: webserver_virtual_ip
     webserver (systemd:lighttpd): Started kcluster01
     virtual_ip (ocf::heartbeat:IPaddr2): Started kcluster01

----

(k)rafaeldtinoco@kcluster01:~$ for name in kcluster01 kcluster02 kcluster03; do ssh $name "hostname ; ip addr show eth3"; done

kcluster01
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:11:a0:03 brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.2/24 brd 10.0.3.255 scope global eth3
       valid_lft forever preferred_lft forever
    inet 10.0.3.1/24 brd 10.0.3.255 scope global secondary eth3
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe11:a003/64 scope link
       valid_lft forever preferred_lft forever
kcluster02
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:1d:1a:cc brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.3/24 brd 10.0.3.255 scope global eth3
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe1d:1acc/64 scope link
       valid_lft forever preferred_lft forever
kcluster03
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:b0:13:16 brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.4/24 brd 10.0.3.255 scope global eth3
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:feb0:1316/64 scope link
       valid_lft forever preferred_lft forever

----

in parallel:

(k)rafaeldtinoco@kcluster01:~$ journalctl -f -u pacemaker

and check if events are generated (vip monitor detects changes)

----

(k)rafaeldtinoco@kcluster01:~$ systemctl restart sy...

Read more...

tags: added: verification-done verification-done-eoan
removed: verification-needed verification-needed-eoan

Flagging this as wont fix as heartbeat is already being kept just for historical reasons (and systemd-networkd can workaround that by the fix we're backporting to it: KeepConfiguration .service file stanza).

Changed in heartbeat (Ubuntu Eoan):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in heartbeat (Ubuntu Disco):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in heartbeat (Ubuntu Bionic):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in heartbeat (Ubuntu):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
no longer affects: heartbeat (Ubuntu Eoan)
no longer affects: heartbeat (Ubuntu Disco)
no longer affects: heartbeat (Ubuntu Bionic)
Changed in heartbeat (Ubuntu):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers