controller-1 fails to unlock with IPv6 when mgmt and cluster network are shared on the same vlan

Bug #1834234 reported by Ghada Khalil
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Teresa Ho

Bug Description

Brief Description
-----------------
We are trying to setup an IPv6 system with one physical NIC for the pxeboot, mgmt, cluster and oam network. pxeboot is untagged and the mgmt network is using a vlan with the cluster network sharing it (doesn’t have its own vlan). Controller-0 was unlocked successfully, but controller-1 is failing to unlock.

Note: If the cluster network is assigned its own vlan, the system is functional and controller-1 can be unlocked.

Severity
--------
Major - the above configuration should be supported

Steps to Reproduce
------------------
- Setup the first controller with IPv6. See https://bugs.launchpad.net/starlingx/+bug/1830779 for sample ansible file
- Configure the system so that mgmt and cluster network share the same vlan. Example:
system host-if-modify -n pxeboot0 -c platform --networks pxeboot $controller eno1
system host-if-add -c platform --networks oam -V 2801 $controller oam0 vlan pxeboot0
system host-if-add -c platform --networks mgmt -V 970 $controller mgmt0 vlan pxeboot0
system host-if-modify $controller mgmt0 --networks cluster-host

Expected Behavior
------------------
The system is functional with the above configuration. All nodes can be unlocked.

Actual Behavior
----------------
controller-0 was unlocked successfully, but controller-1 failed to unlock.

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
All configurations where IPv6 is used and the mgmt & cluster network are shared

Branch/Pull Time/Commit
-----------------------
cengn build from 2019-06-24

Last Pass
---------
never - this is the first time this specific configuration is attempted

Timestamp/Logs
--------------
From the logs, it appears that controller-1 doesn't have the cluster network ip address assigned to the interface:
controller-1:/tmp# ip a | eno1
3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether f4:03:43:56:ce:54 brd ff:ff:ff:ff:ff:ff
    inet 192.168.87.72/26 brd 192.168.87.127 scope global eno1
       valid_lft forever preferred_lft forever
    inet6 fe80::f603:43ff:fe56:ce54/64 scope link
       valid_lft forever preferred_lft forever
controller-1:/tmp# ip a | eno1.2801
12: eno1.2801@eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether f4:03:43:56:ce:54 brd ff:ff:ff:ff:ff:ff
    inet6 fd00:4888:2000:7f7::12/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::f603:43ff:fe56:ce54/64 scope link
       valid_lft forever preferred_lft forever
controller-1:/tmp# ip a | eno1.970
13: eno1.970@eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether f4:03:43:56:ce:54 brd ff:ff:ff:ff:ff:ff
    inet6 fd00:4888:2000:1090::12/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::f603:43ff:fe56:ce54/64 scope link
       valid_lft forever preferred_lft forever

cat /etc/sysconfig/network-scripts/ifcfg-eno1.970:1
# HEADER: This file is is being managed by puppet. Changes to
# HEADER: interfaces that are not being managed by puppet will persist;
# HEADER: however changes to interfaces that are being managed by puppet will
# HEADER: be overwritten. In addition, file order is NOT guaranteed.
# HEADER: Last generated at: 2019-06-24 02:15:27 +0000
BOOTPROTO=static
ONBOOT=yes
DEVICE=eno1.970:1
MTU=1500
VLAN=yes
post_up="/usr/local/bin/cgcs_tc_setup.sh eno1.970 mgmt 10000 > /dev/null"
pre_up="/sbin/modprobe -q 8021q"
IPV6INIT=yes
IPV6ADDR=fd00:4888:2000:1090::12/64

cat /etc/sysconfig/network-scripts/ifcfg-eno1.970:5
# HEADER: This file is is being managed by puppet. Changes to
# HEADER: interfaces that are not being managed by puppet will persist;
# HEADER: however changes to interfaces that are being managed by puppet will
# HEADER: be overwritten. In addition, file order is NOT guaranteed.
# HEADER: Last generated at: 2019-06-24 02:15:37 +0000
BOOTPROTO=static
ONBOOT=yes
DEVICE=eno1.970:5
MTU=1500
VLAN=yes
pre_up="/sbin/modprobe -q 8021q"
IPV6INIT=yes
IPV6ADDR=2001:db8:3::4/64

This configuration results in two if-aliases being created by the system and it seems that only the first alias got an IP address assigned. This appears to be an issue with only IPv6. Setting the same configuration with IPv4 was successful.

We also experimented with manually setting up IPV6ADDR_SECONDARIES based on the link below and restarting networking (i.e. not use aliases). Both IP addresses came up and the network was pingable.
https://serverfault.com/questions/697720/how-to-assign-multiple-ipv6-alias-addresses-to-one-network-interface
This may be an option to investigate instead of using aliases

Test Activity
-------------
Other - setting up a system with the above config

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to Matt to review and make a recommendation on the solution.

description: updated
tags: added: stx.networking
Changed in starlingx:
assignee: nobody → Matt Peters (mpeters-wrs)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.2.0 gating -- starlingx should support this config

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.2.0
Revision history for this message
Matt Peters (mpeters-wrs) wrote :

This limitation was found during the initial feature development. There is no specific recommendation other than fixing the issue with alias interfaces. Puppet does not support secondary interface addresses, therefore that is not an option.

Revision history for this message
Forrest Zhao (forrest.zhao) wrote :

Agree to defer to stx.3.0 and note this config as a limitation for IPv6. Workaround is to use a vlan for the cluster network. Should it be noted in stx 2.0 release note?

tags: removed: stx.2.0
Revision history for this message
Forrest Zhao (forrest.zhao) wrote :

This bug was discussed in Aug 08, 2019 networking sub-project meeting. The conclusion is, Matt will provide input, someone will update stx 2.0 release notes for this limitation.

Final fix would be in STX 3.0.

Frank Miller (sensfan22)
tags: added: stx.3.0
Teresa Ho (teresaho)
Changed in starlingx:
assignee: Matt Peters (mpeters-wrs) → Teresa Ho (teresaho)
Ghada Khalil (gkhalil)
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/685796

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/685797

Ghada Khalil (gkhalil)
tags: added: stx.retestneeded
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/685796
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=caac6ebf35cdd009b9e9eea9276dcf225fffa9ae
Submitter: Zuul
Branch: master

commit caac6ebf35cdd009b9e9eea9276dcf225fffa9ae
Author: Teresa Ho <email address hidden>
Date: Mon Sep 30 08:26:19 2019 -0400

    Fix missing IP of alias interface

    The ifup-aliases script assumes that the IPv4 address is always
    defined. If the configuration is only for IPv6, the script would
    generate an error and not process the IPv6 address of the interface.
    This commit is to bring up the IPv6 interface even if the IPv4 address
    is not defined.

    Partial-Bug: 1834234

    Change-Id: Ib0c4cbc7ec19cc0c0c485e4ad63c380aa8a49a4c
    Signed-off-by: Teresa Ho <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/685797
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=b21975fc24863c0b269c681ca547e8ab73e35f22
Submitter: Zuul
Branch: master

commit b21975fc24863c0b269c681ca547e8ab73e35f22
Author: Teresa Ho <email address hidden>
Date: Mon Sep 30 08:22:49 2019 -0400

    Disable DAD on parent of IPv6 vlan interface

    To disable DAD on IPv6 vlan alias interfaces, the flag must be
    set on the parent of the alias interface instead of the alias itself.
    This commit handles this case.

    Closes-Bug: 1834234
    Depends-On: https://review.opendev.org/#/c/685796/

    Change-Id: I1c4e06132ee4f2e6a47cf61952fb73c6576afa69
    Signed-off-by: Teresa Ho <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Yosief Gebremariam, from the WR test team, tested this configuration with the 2019-10-09_20-00-00 load and confirmed that he was able to bring up his system w/ the cluster network shared on the mgmt network w/o its own vlan.

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.