OAM IP change needs double lock/unlock controllers for IPV6 system

Bug #1895555 reported by Andy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Andre Kantek

Bug Description

Brief Description
-----------------
IPv6 two nodes system OAM IP change need double lock/unlock of the controllers for the new IP to take effect on the OAM network IF. (while IPv4 system only need single lock/unlock of the controllers)

Severity
--------
Major

Steps to Reproduce
------------------
On a multiple nodes system with two controllers:
1. record OAM IP addresses before making the change:
[root@controller-0 sysadmin(keystone_admin)]# system oam-show
+-----------------+--------------------------------------+
| Property | Value |
+-----------------+--------------------------------------+
| created_at | 2020-09-14T03:18:54.352301+00:00 |
| isystem_uuid | 4a6501e4-d564-4426-b4f6-102bd247e899 |
| oam_c0_ip | 2620:10a:a001:a103::164 |
| oam_c1_ip | 2620:10a:a001:a103::236 |
| oam_floating_ip | 2620:10a:a001:a103::165 |
| oam_gateway_ip | 2620:10a:a001:a103::6:0 |
| oam_subnet | 2620:10a:a001:a103::/64 |
| updated_at | None |
| uuid | 6b51882b-6b2b-4677-abfa-da7c08c503d4 |
+-----------------+--------------------------------------+

On active controller (controller-0 in this test):

[root@controller-0 sysadmin(keystone_admin)]# ip addr
11: enp135s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 3c:fd:fe:b1:29:f5 brd ff:ff:ff:ff:ff:ff
    inet6 2620:10a:a001:a103::165/64 scope global
       valid_lft forever preferred_lft forever1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
    inet6 2620:10a:a001:a103::164/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::3efd:feff:feb1:29f5/64 scope link
       valid_lft forever preferred_lft forever

On standby controller:

[sysadmin@controller-1 ~(keystone_admin)]$ ip addr
2: enp134s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:15:17:c7:bf:1c brd ff:ff:ff:ff:ff:ff
    inet6 2620:10a:a001:a103::236/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::215:17ff:fec7:bf1c/64 scope link
       valid_lft forever preferred_lft forever

2. Change OAM IP address:
[root@controller-0 sysadmin(keystone_admin)]# system oam-modify oam_c0_ip=2620:10a:a001:a103::166 oam_c1_ip=2620:10a:a001:a103::234 oam_floating_ip=2620:10a:a001:a103::167
+-----------------+--------------------------------------+
| Property | Value |
+-----------------+--------------------------------------+
| created_at | 2020-09-14T03:18:54.352301+00:00 |
| isystem_uuid | 4a6501e4-d564-4426-b4f6-102bd247e899 |
| oam_c0_ip | 2620:10a:a001:a103::166 |
| oam_c1_ip | 2620:10a:a001:a103::234 |
| oam_floating_ip | 2620:10a:a001:a103::167 |
| oam_gateway_ip | 2620:10a:a001:a103::6:0 |
| oam_subnet | 2620:10a:a001:a103::/64 |
| updated_at | None |
| uuid | 6b51882b-6b2b-4677-abfa-da7c08c503d4 |
+-----------------+--------------------------------------+

3. lock/unlock standby controller (controller-1 in this case)

4. On the standby controller (controller-1 in this case), check its OAM network IF again:

controller-1:/home/sysadmin# ip addr
2: enp134s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:15:17:c7:bf:1c brd ff:ff:ff:ff:ff:ff
    inet6 2620:10a:a001:a103::236/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::215:17ff:fec7:bf1c/64 scope link

Notice that the IP address is still the old one (2620:10a:a001:a103::236/64), instead of the new one (2620:10a:a001:a103::234) we changed to.

5. lock/unlock the standby controller again

6. check standby controller's OAM IF

controller-1:/home/sysadmin# ip addr
2: enp134s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:15:17:c7:bf:1c brd ff:ff:ff:ff:ff:ff
    inet6 2620:10a:a001:a103::234/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::215:17ff:fec7:bf1c/64 scope link
       valid_lft forever preferred_lft forever

Notice that the IP address now changes to the new one (2620:10a:a001:a103::234/64)

Expected Behavior
------------------
standby controller's OAM network IF IP address should change to reflect the new IP after first lock/unlock.

Actual Behavior
----------------
The standby controller's OAM IF IP remains unchanged until a second lock/unlock.

Reproducibility
---------------
100 % reproducible

System Configuration
--------------------
System with two controllers and IPv6 on OAM.

Branch/Pull Time/Commit
-----------------------
latest on master.

Last Pass
---------
Unknown

Timestamp/Logs
--------------
See steps to reproduce.

Test Activity
-------------
Developer Testing

Workaround
----------
double lock/unlock the controllers for the new IP to take effect.

CVE References

Revision history for this message
Andy (andy.wrs) wrote :

Also want to point out that, after swact to make controller-1 as active controller, controller-0 would need double lock/unlock for the new OAM IP to take effect on its OAM IF as well.

Revision history for this message
Andy (andy.wrs) wrote :

The same is observed on a single node IPv6 system as well, ie, after running OAM IP change command, the controller need double lock/unlock for the OAM IF to have the new IP.

Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.5.0 stx.config
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.5.0 / medium - would be nice to fix to avoid requiring two lock/unlock cycles for the change to take effect

Revision history for this message
Bart Wensley (bartwensley) wrote :

This LP looks like it is closely related to work being done under the following story:
https://storyboard.openstack.org/#!/story/2008531

I am assigning it to Ghada so she can get her team to look at whether this issue will be fixed as part of that story.

tags: added: stx.n
tags: added: stx.networking
removed: stx.n
Revision history for this message
Ghada Khalil (gkhalil) wrote :

The work for https://storyboard.openstack.org/#!/story/2008531 is only for simplex systems.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

However, I will assign it to Andre Kanteck to investigate

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Andre Kantek (akantek)
Revision history for this message
Andre Kantek (akantek) wrote :

Started investigation

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/786468

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/786468
Committed: https://opendev.org/starlingx/stx-puppet/commit/ffddc103ca66f87fb96ae02e9cfbb656d39f38ab
Submitter: "Zuul (22348)"
Branch: master

commit ffddc103ca66f87fb96ae02e9cfbb656d39f38ab
Author: Andre Fernando Zanella Kantek <email address hidden>
Date: Thu Apr 15 09:59:55 2021 -0400

    OAM IP change needs double lock/unlock controllers for IPV6 system

    Added IPv6 address fields on the list used to detect if the interface
    have changed on apply_network_config.sh. Without it was only copying
    the interface config file from /var/run/network-scripts.puppet/ to
    /etc/sysconfig/network-scripts/ which explains why it was working
    on the second reboot.

    Tested on:
    AIO-DX
    AIO-SX

    Closes-Bug: 1895555
    Signed-off-by: Andre Fernando Zanella Kantek <email address hidden>
    Change-Id: I25e60a04b4aec38c254ff3e3a7b2f0d80ce5daaf

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Andre, please cherrypick this change to the r/stx.5.0 release branch once it's open for submissions.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (r/stx.5.0)

Fix proposed to branch: r/stx.5.0
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/786868

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (r/stx.5.0)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/786868
Committed: https://opendev.org/starlingx/stx-puppet/commit/5b6e5d53e3cd63e9c8904018645bc4c3ef6d87f0
Submitter: "Zuul (22348)"
Branch: r/stx.5.0

commit 5b6e5d53e3cd63e9c8904018645bc4c3ef6d87f0
Author: Andre Fernando Zanella Kantek <email address hidden>
Date: Fri Apr 16 13:25:09 2021 -0400

    OAM IP change needs double lock/unlock controllers for IPV6 system

    Added IPv6 address fields on the list used to detect if the interface
    have changed on apply_network_config.sh. Without it was only copying
    the interface config file from /var/run/network-scripts.puppet/ to
    /etc/sysconfig/network-scripts/ which explains why it was working
    on the second reboot.

    Tested on:
    AIO-DX
    AIO-SX

    Closes-Bug: 1895555
    Signed-off-by: Andre Fernando Zanella Kantek <email address hidden>
    Change-Id: I25e60a04b4aec38c254ff3e3a7b2f0d80ce5daaf

Ghada Khalil (gkhalil)
tags: added: in-r-stx50
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792009

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on stx-puppet (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792009

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792013

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on stx-puppet (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792013

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792018

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on stx-puppet (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792018

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792029

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (f/centos8)
Download full text (48.0 KiB)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/792029
Committed: https://opendev.org/starlingx/stx-puppet/commit/2b026190a3cb6d561b6ec4a46dfb3add67f1fa69
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 3e3940824dfb830ebd39fd93265b983c6a22fc51
Author: Dan Voiculeasa <email address hidden>
Date: Thu May 13 18:03:45 2021 +0300

    Enable kubelet support for pod pid limit

    Enable limiting the number of pids inside of pods.

    Add a default value to protect against a missing value.
    Default to 750 pids limit to align with service parameter default
    value for most resource consuming StarlingX optional app (openstack).
    In fact any value above service parameter minimum value is good for the
    default.

    Closes-Bug: 1928353
    Signed-off-by: Dan Voiculeasa <email address hidden>
    Change-Id: I10c1684fe3145e0a46b011f8e87f7a23557ddd4a

commit 0c16d288fbc483103b7ba5dad7782e97f59f4e17
Author: Jessica Castelino <email address hidden>
Date: Tue May 11 10:21:57 2021 -0400

    Safe restart of the etcd SM service in etcd upgrade runtime class

    While upgrading the central cloud of a DC system, activation failed
    because there was an unexpected SWACT to controller-1. This was due
    to the etcd upgrade script. Part of this script runs the etcd
    manifest. This triggers a reload/restart of the etcd service. As this
    is done outside of the sm, sm saw the process failure and triggered
    the SWACT.

    This commit modifies platform::etcd::upgrade::runtime puppet class
    to do a safe restart of the etcd SM service and thus, solve the
    issue.

    Change-Id: I3381b6976114c77ee96028d7d96a00302ad865ec
    Signed-off-by: Jessica Castelino <email address hidden>
    Closes-Bug: 1928135

commit eec3008f600aeeb69a42338ed44332228a862d11
Author: Mihnea Saracin <email address hidden>
Date: Mon May 10 13:09:52 2021 +0300

    Serialize updates to global_filter in the AIO manifest

    Right now, looking at the aio manifest:
    https://review.opendev.org/c/starlingx/stx-puppet/+/780600/15/puppet-manifests/src/manifests/aio.pp
    there are 3 classes that update
    in parallel the lvm global_filter:
    - include ::platform::lvm::controller
    - include ::platform::worker::storage
    - include ::platform::lvm::compute
    And this generates some errors.

    We fix this by adding dependencies between the above classes
    in order to update the global_filter in a serial mode.

    Closes-Bug: 1927762
    Signed-off-by: Mihnea Saracin <email address hidden>
    Change-Id: If6971e520454cdef41138b2f29998c036d8307ff

commit 97371409b9b2ae3f0db6a6a0acaeabd74927160e
Author: Steven Webster <email address hidden>
Date: Fri May 7 15:33:43 2021 -0400

    Add SR-IOV rate-limit dependency

    Currently, the binding of an SR-IOV virtual function (VF) to a
    driver has a dependency on platform::networking. This is needed
    to ensure that SR-IOV is enabled (VFs created) before actually
    doing the bind.

    This dependency does not exist for configuring the VF rate-limits
    however. There is a cha...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.