Undercloud - changing os-net-config conf kills undercloud_[admin, public]_host IPs

Bug #1791238 reported by Harald Jensås
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Harald Jensås

Bug Description

In the containerized undercloud re-running the 'openstack undercloud install* command removes the undercloud_admin_host and undercloud_public_host ip addresses if config for os-net-config is changed. (For example if the the undercloud_dnsnameservers option is changed in undercloud.conf)

The br-ctlplane interface is restarted by os-net-config and this removes the undercloud_admin_host and undercloud_public_host ip addresses set up by keepalived. The install/update operation fails later on because services fail to connect to the ip that is no longer there.

Reproduce:

1. Deploy undercloud with the following configuration

[DEFAULT]

enable_routed_networks = false
enable_tempest = false
enable_ui = false
inspection_interface = br-ctlplane
ipxe_enabled = true
local_interface = eth1
local_ip = 172.20.0.200/26
local_mtu = 1500
local_subnet = ctlplane-subnet
overcloud_domain_name = localdomain
scheduler_max_attempts = 3
subnets = ctlplane-subnet
undercloud_admin_host = 172.20.0.201
undercloud_debug = true
undercloud_hostname = container-undercloud.lab.example.com
undercloud_nameservers = 172.20.0.254
undercloud_ntp_servers = 0.se.pool.ntp.org
undercloud_public_host = 172.20.0.203

[ctlplane-subnet]
cidr = 172.20.0.192/26
dhcp_start = 172.20.0.210
dhcp_end = 172.20.0.219
inspection_iprange = 172.20.0.220,172.20.0.229
gateway = 172.20.0.254
masquerade = true

2. Change the undercloud_nameservers option in undercloud.conf

sed -i s/undercloud_nameservers = 172.20.0.254/undercloud_nameservers = 192.168.122.1/g /home/stack/undercloud.conf

3. Re-run undercloud install

openstack undercloud install

RESULTS:

1. The os-net-config is config.json is updated with the new dnsserver.

Every 5.0s: diff -aur /etc/os-net-config/config.json /tmp/os-net-config.json.orig Fri Sep 7 08:51:26 2018

--- /etc/os-net-config/config.json 2018-09-07 08:45:39.054174371 +0200
+++ /tmp/os-net-config.json.orig 2018-09-07 08:17:38.597808977 +0200
@@ -1 +1 @@
-{"network_config": [{"addresses": [{"ip_netmask": "172.20.0.200/26"}], "dns_servers": ["192.168.122.1"], "members": [{"mtu": 1500, "name": "eth1", "primary": true, "type": "interface"}], "name": "br-ctlplane",
"ovs_extra": ["br-set-external-id br-ctlplane bridge-id br-ctlplane"], "routes": [], "type": "ovs_bridge", "use_dhcp": false}]}
+{"network_config": [{"addresses": [{"ip_netmask": "172.20.0.200/26"}], "dns_servers": ["172.20.0.254"], "members": [{"mtu": 1500, "name": "eth1", "primary": true, "type": "interface"}], "name": "br-ctlplane", "
ovs_extra": ["br-set-external-id br-ctlplane bridge-id br-ctlplane"], "routes": [], "type": "ovs_bridge", "use_dhcp": false}]}

2. After os-net-config applied config the keepalived VIPs are gone:

Every 2.0s: ip addr show br-ctlplane Fri Sep 7 08:51:08 2018

47: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 52:54:00:7a:f6:c5 brd ff:ff:ff:ff:ff:ff
    inet 172.20.0.200/26 brd 172.20.0.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe7a:f6c5/64 scope link
       valid_lft forever preferred_lft forever

3. The upgrade is stuck on starting the containers:

TASK [Start containers for step 3] **********************************************

4. Log's show that services are failing to connect to the database via the keepalived VIPs:
/var/log/containers/nova/nova-compute.log:2018-09-07 08:52:47.462 6 ERROR oslo_service.periodic_task RemoteError: Remote error: DBConnectionError (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.20.0.201' ([Errno 113] EHOSTUNREACH)") (Background on this error at: http://sqlalche.me/e/e3q8)

description: updated
description: updated
Revision history for this message
Harald Jensås (harald-jensas) wrote :

Restarting the keepalived container make the VIP's re-apper:

[root@container-undercloud mysql]# docker container list | grep keepalived
bfe1c1cd76e3 docker.io/tripleomaster/centos-binary-keepalived:9ad93affedba8870315dd72c714770875ce24759_b72f0c42 "/usr/local/bin/ko..." 12 hours ago Up 7 minutes keepalived

[root@container-undercloud ~]# docker container restart bfe1c1cd76e3
bfe1c1cd76e3

[root@container-undercloud ~]# ip addr show br-ctlplane
47: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 52:54:00:7a:f6:c5 brd ff:ff:ff:ff:ff:ff
    inet 172.20.0.200/26 brd 172.20.0.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 172.20.0.201/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 172.20.0.203/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe7a:f6c5/64 scope link
       valid_lft forever preferred_lft forever

Once the VIP's are back up TASK [Start containers for step 3] completes and the undercloud install compltes successfully.

NOTE: If we had a recent enough keepalived we could probably have enabled the ``dynamic_interfaces`` option. But this is not implemented in the version used.

  """# Allow configuration to include interfaces that don't exist at startup.
           # This allows keepalived to work with interfaces that may be deleted and restored
           # and also allows virtual and static routes and rules on VMAC interfaces.
           dynamic_interfaces
  """

Revision history for this message
Harald Jensås (harald-jensas) wrote :

Recent version of keepalived have support for 'dynamic_interfaces', looks like that would solve this problem. We would have to package keepalived 2.0.in RDO? And

 # Allow configuration to include interfaces that don't exist at startup.
 # This allows keepalived to work with interfaces that may be deleted and restored
 # and also allows virtual and static routes and rules on VMAC interfaces.
   dynamic_interfaces

I built keepalived-2.0.6-1.el7.x86_64.rpm using the SRPM[1] from Fedora Rawhide in Centos 7. (With only a small tweak the RPM builds.)

Enabling dynamic_interfaces and using 2.0.6 version of keepalived in the keepalived container fixes this issue.

Suggest we package keepalived 2.0.x and place this in the OSP repositories.

[1] https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/development/rawhide/Everything/source/tree/Packages/k/keepalived-2.0.6-1.fc29.src.rpm

Revision history for this message
Harald Jensås (harald-jensas) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/603587

Changed in tripleo:
assignee: nobody → Harald Jensås (harald-jensas)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/603587
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=b766e253f4df4bb61247640850c3490b988c36d0
Submitter: Zuul
Branch: master

commit b766e253f4df4bb61247640850c3490b988c36d0
Author: Harald Jensås <email address hidden>
Date: Wed Sep 19 09:28:16 2018 +0200

    Undercloud - Restart keepalived on update

    instack-undercloud had a workaround (30-reload-keepalived)
    in place to always restart keepalived on install/upgrade.
    This is required to ensure VIP's are present in case the
    network config was changed and os-net-config restarts
    the network interface. When containerizing the undercloud
    this workaround was missed.

    This change adds a similar workaround. A pre_deploy
    NodeExtraconfig script will restart the keepalived
    container when the undercloud installer is (re-)run.

    NOTE: We can remove this workaround once keepalived
          v2.0.6 or later is available.

    Closes-Bug: #1791238
    Change-Id: I8cada7be57cd50c54ca5f2f38ec010062512ae06

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/605604

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/rocky)

Change abandoned by Alex Schultz (<email address hidden>) on branch: stable/rocky
Review: https://review.openstack.org/605604
Reason: http://lists.openstack.org/pipermail/openstack-dev/2018-September/135224.html

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/rocky)

Reviewed: https://review.openstack.org/605604
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=7c52f9489b00d36b8907827221ac71259da3bad4
Submitter: Zuul
Branch: stable/rocky

commit 7c52f9489b00d36b8907827221ac71259da3bad4
Author: Harald Jensås <email address hidden>
Date: Wed Sep 19 09:28:16 2018 +0200

    Undercloud - Restart keepalived on update

    instack-undercloud had a workaround (30-reload-keepalived)
    in place to always restart keepalived on install/upgrade.
    This is required to ensure VIP's are present in case the
    network config was changed and os-net-config restarts
    the network interface. When containerizing the undercloud
    this workaround was missed.

    This change adds a similar workaround. A pre_deploy
    NodeExtraconfig script will restart the keepalived
    container when the undercloud installer is (re-)run.

    NOTE: We can remove this workaround once keepalived
          v2.0.6 or later is available.

    Closes-Bug: #1791238
    Change-Id: I8cada7be57cd50c54ca5f2f38ec010062512ae06
    (cherry picked from commit b766e253f4df4bb61247640850c3490b988c36d0)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 10.0.0

This issue was fixed in the openstack/tripleo-heat-templates 10.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/623093

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/623093
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=be61d8a2b5537e4ea3374f5245afaa299972a03e
Submitter: Zuul
Branch: master

commit be61d8a2b5537e4ea3374f5245afaa299972a03e
Author: Emilien Macchi <email address hidden>
Date: Wed Dec 5 17:45:52 2018 -0500

    Re-implement keepalived restart without pre_deploy

    ... and use host_prep_tasks from config-download.
    We are trying to HostPrepConfig resource that use OS::Heat::SoftwareConfig
    and the old fashion to run Ansible, for more native config-downlaod.
    undercloud_pre is the only service that needs HostPrepConfig now, so
    let's switch to config-download.

    It restarts keepalived container at each undercloud install & upgrade.
    Also it adds support for podman as it uses container_cli variable.

    Note: the workaround can still be removed once we have Keepalived 2.0.6
    but it won't happen before CentOS8 probably.

    Change-Id: I7454013c2e37058b5010a2a6cacfae0d0f873744
    Related-Bug: #1791238

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 9.1.0

This issue was fixed in the openstack/tripleo-heat-templates 9.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/718492

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/718492
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=f3d4eaef225098d4323cc44f852762d617a10f54
Submitter: Zuul
Branch: master

commit f3d4eaef225098d4323cc44f852762d617a10f54
Author: Emilien Macchi <email address hidden>
Date: Wed Apr 8 11:33:53 2020 -0400

    Deprecate KeepalivedRestart

    KeepalivedRestart is deprecated and has no effect. The workaround isn't
    needed anymore since we now deploy keepalived-2.0.10-4.
    This version has support for 'dynamic_interfaces' which is required when
    the network config was changed and os-net-config restarts the network
    interface.

    Related-Bug: #1791238
    Change-Id: I14c51106ad1ee40a6edfa520d330d1ea0a52edee

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/718679

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/train)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: stable/train
Review: https://review.opendev.org/718679
Reason: keepalived on centos7 is shipped on 1.3.5-16.el7

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.