M/N upgrade: undercloud upgrade fails when using ssl.

Bug #1640213 reported by Sofer Athlan-Guyot
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Sofer Athlan-Guyot

Bug Description

Hi,

this bug was first opened
https://bugs.launchpad.net/tripleo/+bug/1638029, but now that I get a
better grip on the problem I prefer to start a new one for clarity.

So, when using ssl, all the important services are under keepalived ip
where haproxy binds.

During a M/N upgrade, the file os-net-config/config.json is updated
(mtu is added to the parameters).

Then we run puppet, os-net-config is unconditionally run during the
pre-main stage (before everything). The python script catch the diff
and re-apply the configuration. In the process all keepalived ip are
cleaned up:

    Nov 08 09:50:48 instack.localdomain Keepalived_healthcheckers[1558]: Netlink: filter function error
    Nov 08 09:50:48 instack.localdomain Keepalived_vrrp[1559]: Netlink: filter function error
    Nov 08 09:50:48 instack.localdomain Keepalived_vrrp[1559]: Netlink: filter function error
    Nov 08 09:50:48 instack.localdomain Keepalived_healthcheckers[1558]: Netlink: filter function error
    Nov 08 09:50:48 instack.localdomain Keepalived_vrrp[1559]: Netlink: filter function error
    Nov 08 09:50:48 instack.localdomain Keepalived_healthcheckers[1558]: Netlink: filter function error
    Nov 08 09:50:48 instack.localdomain Keepalived_vrrp[1559]: Netlink: filter function error
    Nov 08 09:50:48 instack.localdomain Keepalived_healthcheckers[1558]: Netlink: filter function error
    Nov 08 10:01:53 instack.localdomain Keepalived_vrrp[1559]: Netlink reflector reports IP fe80::219:a2ff:fe59:cd1d removed
    Nov 08 10:01:53 instack.localdomain Keepalived_healthcheckers[1558]: Netlink reflector reports IP fe80::219:a2ff:fe59:cd1d removed
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: Netlink reflector reports IP 192.0.2.1 removed
    Nov 08 10:01:55 instack.localdomain Keepalived_healthcheckers[1558]: Netlink reflector reports IP 192.0.2.1 removed
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: Netlink reflector reports IP 192.0.2.3 removed
    Nov 08 10:01:55 instack.localdomain Keepalived_healthcheckers[1558]: Netlink reflector reports IP 192.0.2.3 removed
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: Netlink reflector reports IP 192.0.2.2 removed
    Nov 08 10:01:55 instack.localdomain Keepalived_healthcheckers[1558]: Netlink reflector reports IP 192.0.2.2 removed
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: Netlink reflector reports IP fe80::219:a2ff:fe59:cd1d removed
    Nov 08 10:01:55 instack.localdomain Keepalived_healthcheckers[1558]: Netlink reflector reports IP fe80::219:a2ff:fe59:cd1d removed
    Nov 08 10:01:55 instack.localdomain Keepalived_healthcheckers[1558]: Netlink reflector reports IP fe80::a016:1dff:fe4b:201b removed
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: Netlink reflector reports IP fe80::a016:1dff:fe4b:201b removed
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: Netlink reflector reports IP 10.0.0.1 removed
    Nov 08 10:01:55 instack.localdomain Keepalived_healthcheckers[1558]: Netlink reflector reports IP 10.0.0.1 removed
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: Netlink: filter function error
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: Netlink: filter function error
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: Netlink: filter function error
    Nov 08 10:01:55 instack.localdomain Keepalived_healthcheckers[1558]: Netlink: filter function error
    Nov 08 10:01:55 instack.localdomain Keepalived_healthcheckers[1558]: Netlink: filter function error
    Nov 08 10:01:55 instack.localdomain Keepalived_healthcheckers[1558]: Netlink: filter function error
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: Kernel is reporting: interface br-ctlplane DOWN
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: VRRP_Instance(51) Entering FAULT STATE
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: VRRP_Instance(51) removing protocol VIPs.
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: Netlink: error: No such device, type=(21), seq=1478616560, pid=0
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: VRRP_Instance(51) Now in FAULT state
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: Kernel is reporting: interface br-ctlplane DOWN
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: VRRP_Instance(52) Entering FAULT STATE
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: VRRP_Instance(52) removing protocol VIPs.
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: Netlink: error: No such device, type=(21), seq=1478616561, pid=0
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: VRRP_Instance(52) Now in FAULT state
    Nov 08 10:01:55 instack.localdomain Keepalived_healthcheckers[1558]: Netlink: filter function error
    Nov 08 10:01:55 instack.localdomain Keepalived_vrrp[1559]: Netlink: filter function error
    Nov 08 10:02:01 instack.localdomain Keepalived_vrrp[1559]: Netlink reflector reports IP fe80::219:a2ff:fe59:cd1d added
    Nov 08 10:02:01 instack.localdomain Keepalived_healthcheckers[1558]: Netlink reflector reports IP fe80::219:a2ff:fe59:cd1d added

Then puppet continues and fails with

   Could not evaluate: Execution of '/bin/openstack token issue
   --format value' returned 1: Unable to establish connection to
   https://192.168.0.2:13000/v3/auth/tokens (tried 22, for a total of
   170 seconds)

as services like keystone don't respond on they disappeared https ip
address.

ip on br-ctrlplane before os-net-config:

    br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether 00:19:a2:59:cd:1d brd ff:ff:ff:ff:ff:ff
    inet 192.0.2.1/24 brd 192.0.2.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.0.2.3/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.0.2.2/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::219:a2ff:fe59:cd1d/64 scope link
       valid_lft forever preferred_lft forever

ip after:

    br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether 00:19:a2:59:cd:1d brd ff:ff:ff:ff:ff:ff
    inet 192.0.2.1/24 brd 192.0.2.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::219:a2ff:fe59:cd1d/64 scope link
       valid_lft forever preferred_lft forever

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (master)

Fix proposed to branch: master
Review: https://review.openstack.org/395053

Changed in tripleo:
assignee: nobody → Sofer Athlan-Guyot (sofer-athlan-guyot)
status: New → In Progress
Revision history for this message
Marios Andreou (marios-b) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/395053
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=b9dcee028795bb1132176df7ef3f435039cc20cb
Submitter: Jenkins
Branch: master

commit b9dcee028795bb1132176df7ef3f435039cc20cb
Author: Sofer Athlan-Guyot <email address hidden>
Date: Tue Nov 8 16:44:26 2016 +0100

    Ensure keepalived is restarted when necessary.

    If os-collect-config/config.json is updated before an upgrade/update,
    then the os-net-config run will automatically erase the keepalived
    managed ips.

    This is a hackish way to ensure that keepalived is restarted during the
    next phase in order to have the ip recreated.

    It basically adds a comment line to the keepalived.conf file (making it
    different than the puppet one) if it's there. This will force a puppet
    restart of the keepalive service puting the ips back on the undercloud.

    Change-Id: I56b706ff44ba31aa87a63f870940831ce02a6e77
    Closes-Bug: #1640213
    Related-Bug: #1638029

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/396731

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/newton)

Reviewed: https://review.openstack.org/396731
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=3e586d0fb22cfd9c08e8f4663ab9be140af83d7b
Submitter: Jenkins
Branch: stable/newton

commit 3e586d0fb22cfd9c08e8f4663ab9be140af83d7b
Author: Sofer Athlan-Guyot <email address hidden>
Date: Tue Nov 8 16:44:26 2016 +0100

    Ensure keepalived is restarted when necessary.

    If os-collect-config/config.json is updated before an upgrade/update,
    then the os-net-config run will automatically erase the keepalived
    managed ips.

    This is a hackish way to ensure that keepalived is restarted during the
    next phase in order to have the ip recreated.

    It basically adds a comment line to the keepalived.conf file (making it
    different than the puppet one) if it's there. This will force a puppet
    restart of the keepalive service puting the ips back on the undercloud.

    Change-Id: I56b706ff44ba31aa87a63f870940831ce02a6e77
    Closes-Bug: #1640213
    Related-Bug: #1638029
    (cherry picked from commit b9dcee028795bb1132176df7ef3f435039cc20cb)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 6.0.0

This issue was fixed in the openstack/puppet-tripleo 6.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 5.5.0

This issue was fixed in the openstack/puppet-tripleo 5.5.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.