config_controller fails when cluster network set to AE (balanced mode)

Bug #1819738 reported by Chris Winnicki
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
ChenjieXu

Bug Description

Title
-----
config_controller fails when Cluster set to AE (balnaced mode)

Brief Description
-----------------
config_controller fails when Cluster set to AE (balnaced mode) (see attached config.ini file)
Configuration failed during config_controller due to:
Unsupported LAG mode (2) for CLUSTER interface - use LAG mode [1, 4] instead
* Balanced mode should be supported

Severity
--------
Major

Steps to Reproduce
------------------
Install controller-0
Run: config_controller supplying the attached config.ini

Expected Behavior
------------------
config_controller should run to completion withou any errors

Actual Behavior
----------------
config_controller fails with:
Configuration failed: Unsupported LAG mode (2) for CLUSTER interface - use LAG mode [1, 4] instead

System Configuration
--------------------
System mode: All in one duplex (Cluster network configured on seperate AE NICs in balanced mode)

Reproducibility
---------------
100%

Branch/Pull Time/Commit
-----------------------
### StarlingX
### Built from master
###

OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20190308T183747Z"

JOB="STX_build_master_master"
<email address hidden>"
BUILD_NUMBER="10"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-03-08 18:37:47 +0000"

Timestamp/Logs
--------------
n/a

Revision history for this message
Chris Winnicki (chriswinnicki) wrote :
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Teresa Ho (teresaho)
importance: Undecided → Medium
tags: added: stx.networking
Ghada Khalil (gkhalil)
summary: - config_controller fails when Cluster set to AE (balnaced mode)
+ config_controller fails when Cluster set to AE (balanced mode)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating; related to cluster network feature.
There should be no restriction on the AE mode for a cluster network if it's not shared with the mgmt interface. For mgmt interface, ae balanced is not supported.

Changed in starlingx:
assignee: Teresa Ho (teresaho) → Forrest Zhao (forrest.zhao)
status: New → Triaged
summary: - config_controller fails when Cluster set to AE (balanced mode)
+ config_controller fails when cluster network set to AE (balanced mode)
tags: added: stx.2019.05
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Any questions during investigation should be directed to Teresa Ho. Teresa should also be included on the code inspection.

Revision history for this message
Le, Huifeng (hle2) wrote :

Chris,
While we can working on reproducing and investigating this issue, could you please help on below questions?
1. Is this new issue or regression issue?
2. Could you please help to share the whole failure log (sysinv.log, puppet.log) which we can compare with our environment?
Thanks!

Revision history for this message
Ghada Khalil (gkhalil) wrote :
Revision history for this message
Chris Winnicki (chriswinnicki) wrote :

The logs are attached from the most recent reproduction attempt: yow-cgcs-wildcat-3_20190318.154947.tar

Revision history for this message
Le, Huifeng (hle2) wrote :

Chris,

From which log file, do you see the message "Unsupported LAG mode (2) for CLUSTER interface - use LAG mode [1, 4] instead"?
From the attached yow-cgcs-wildcat-3_20190318.154947.tar, I can not find log file from /var/log/puppet/puppet.log. Could you please help to check?

Thanks much!

Revision history for this message
Chris Winnicki (chriswinnicki) wrote :

Huifeng,

config_controller fails before any puppet configuration (hence no puppet logs)
The best way to reproduce this issue is to run config_controller supplying the attached config.ini
ex:

config_controller --config-file config.ini

Revision history for this message
Le, Huifeng (hle2) wrote :

@Teresa, @Ghada

By checking the code, this error is generated when validate the cluster network configuration (stx-config\configutilities\configutilities\configutilities\common\validator.py)

Due to LAG_MODE=2 is not in supported mode list [1, 4], with code as below:

validate_cluster()
{
...
   if self.cluster_network.logical_interface.lag_interface:
            supported_lag_mode = [1, 4]
            if (self.cluster_network.logical_interface.lag_mode not in
                    supported_lag_mode):
                raise ConfigFail(
                    "Unsupported LAG mode (%d) for %s interface"
                    " - use LAG mode %s instead" %
                    (self.cluster_network.logical_interface.lag_mode,
                     cluster_prefix, supported_lag_mode))

and config.ini is configured as
[CLUSTER_NETWORK]
CIDR = 192.168.206.0/24
LOGICAL_INTERFACE = LOGICAL_INTERFACE_2

[LOGICAL_INTERFACE_2]
LAG_INTERFACE = Y
LAG_MODE = 2
INTERFACE_MTU = 9216
INTERFACE_PORTS = ens787f1,ens787f2
INTERFACE_LINK_CAPACITY=10000

So "LAG_MODE = 2 is not supported" is by design or a real bug (e.g. supported_lag_mode should be [1, 2, 4] instead of [1, 4])?

Revision history for this message
Teresa Ho (teresaho) wrote :

The AE balanced mode should be supported for the cluster-host interface. The supported_lag_mode should be [1, 2, 4].

Revision history for this message
Le, Huifeng (hle2) wrote :

Teresa,
Just changing supported_lag_mode to [1,2,4] to fix this bug? Is there any more code required to support lag_mode 2?

Revision history for this message
ChenjieXu (midone) wrote :

Hi Teresa,

We have verified that config_controller can succeed after changing supported_lag_mode to [1,2,4]. Is there any more code required to support lag_mode 2?

Our Steps:
1. Install controller-0 on a bare metal.
2. Changing supported_lag_mode for cluster network in following file from [1,2] to [1,2,4]:
   /usr/lib64/python2.7/site-packages/configutilities/common/validator.py
3. Execute command "sudo config_controller --config-file config_1819738.ini".

config_controller can execute successfully.

Revision history for this message
ChenjieXu (midone) wrote :
Revision history for this message
Teresa Ho (teresaho) wrote :

Yes, that code change is correct.

Changed in starlingx:
assignee: Forrest Zhao (forrest.zhao) → ChenjieXu (midone)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-config (master)

Fix proposed to branch: master
Review: https://review.openstack.org/648304

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (master)

Reviewed: https://review.openstack.org/648304
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=1207bd850e46564552957ec80be479faa47c74ec
Submitter: Zuul
Branch: master

commit 1207bd850e46564552957ec80be479faa47c74ec
Author: mid_one <email address hidden>
Date: Thu Mar 28 19:29:52 2019 +0800

    Add mode 2 to supported LAG mode for cluster interface

    LAG mode 2 should be supported because AE balanced mode
    should be supported for the cluster-host interface. Thus
    the supported_lag_mode for cluser network should be
    [1, 2, 4].

    Co-Authored-By: Huifeng Le<email address hidden>

    Change-Id: I0b81c963705820a9fec6225dac1cee2a14bbe030
    Closes-Bug: #1819738
    Story: #2004273

Changed in starlingx:
status: In Progress → Fix Released
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
Ghada Khalil (gkhalil)
tags: added: stx.retestneeded
Revision history for this message
Chris Winnicki (chriswinnicki) wrote :

Retested in: 2019-06-03_18-34-53
Verdict: Passed

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.