Stein: default cluster-host-network set on wrong i/f on controller-0

Bug #1815053 reported by Nimalini Rasa
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Teresa Ho

Bug Description

Brief Description
-----------------
Default Cluster host network was setup on the wrong i/f when mgmt is tagged (vlan) and pxeboot present.

Severity
--------
Major

Steps to Reproduce
------------------
1. bring up a system with pxeboot i/f (untagged), mgmt tagged.
2. run config_controller using config_file

Expected Behavior
------------------
default Cluster host network should be configured on mgmt(vlan) instead of on the pxeboot network.

Actual Behavior
----------------
Default cluster-host network configured on pxeboot instead of on mgmt(vlan)

Reproducibility
---------------
yes

System Configuration
--------------------
AIO-DX system

Branch/Pull Time/Commit
-----------------------
JOB="STX_build_stein_master"
<email address hidden>"
BUILD_DATE="2019-02-05 16:42:41 +0000"

[wrsroot@controller-0 ~(keystone_admin)]$ system interface-network-list controller-0
+--------------+--------------------------------------+----------+--------------+
| hostname | uuid | ifname | network_name |
+--------------+--------------------------------------+----------+--------------+
| controller-0 | 02a2184a-3cc5-4af4-b719-f802bfa10b29 | eno4.160 | mgmt |
| controller-0 | 13d69fb4-84df-4960-b7f5-c7a601f9a2c8 | eno4 | pxeboot |
| controller-0 | aa97d074-c6b7-4241-bc2b-3106724e899b | eno4 | cluster-host |
| controller-0 | e30f3f44-5b57-44bd-be1a-bdc25bb2f0a1 | oam0 | oam |
+--------------+--------------------------------------+----------+--------------+
[wrsroot@controller-0 ~(keystone_admin)]$ system interface-network-list controller-1
+--------------+--------------------------------------+----------+--------------+
| hostname | uuid | ifname | network_name |
+--------------+--------------------------------------+----------+--------------+
| controller-1 | 01449dd8-e55c-4e53-94c3-6a0e325509e3 | pxeboot0 | pxeboot |
| controller-1 | 22a23061-f271-4aac-a0db-4c6d1a8129d5 | mgmt0 | mgmt |
| controller-1 | d14e62de-2c30-4217-9839-1f00d6434b3c | mgmt0 | cluster-host |
| controller-1 | d1b50b49-1e1e-43a1-9dd4-9de83e151469 | oam0 | oam |
+--------------+--------------------------------------+----------+--------------+

[VERSION]
RELEASE = 19.01

[PXEBOOT_NETWORK]
PXEBOOT_CIDR = 192.168.202.0/24

[MGMT_NETWORK]
VLAN = 160
CIDR = 192.168.204.0/24
MULTICAST_CIDR=239.1.1.0/28
DYNAMIC_ALLOCATION = Y
LOGICAL_INTERFACE = LOGICAL_INTERFACE_2

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Teresa Ho (teresaho)
summary: - Stein:default cluster-host-network set on wrong i/f on controller-0
+ Stein: default cluster-host-network set on wrong i/f on controller-0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-config (master)

Fix proposed to branch: master
Review: https://review.openstack.org/635848

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (master)

Reviewed: https://review.openstack.org/635848
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=e7c0133967ce5fc5ad4a3a1392976237673b8700
Submitter: Zuul
Branch: master

commit e7c0133967ce5fc5ad4a3a1392976237673b8700
Author: Teresa Ho <email address hidden>
Date: Fri Feb 8 09:20:44 2019 -0500

    Fix cluster host network when mgmt is tagged

    When management interface is tagged on pxeboot, the cluster-host
    network should be assigned to the mgmt interface by default, not
    the pxeboot interface.
    This commit is to set the vlan id of the cluster-host interface
    to that of the management interface if the cluster-host interface
    is not specified in the configuration file.

    Closes-Bug: 1815053
    Change-Id: Ia9b003af118df21eecba4d1b644c2738761e7553
    Signed-off-by: Teresa Ho <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating; issue related to recent feature

Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.2019.05 stx.containers stx.networking
tags: removed: stx.containers
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-config (f/stein)

Fix proposed to branch: f/stein
Review: https://review.openstack.org/636195

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (f/stein)
Download full text (9.6 KiB)

Reviewed: https://review.openstack.org/636195
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=d94e998e455ca0b8b830f314e2292fade5ea7b49
Submitter: Zuul
Branch: f/stein

commit f990ded2116383eb3075fbc9af8e40d0a8173f12
Author: Kristine Bujold <email address hidden>
Date: Mon Feb 11 10:54:12 2019 -0500

    Move cinder static config to Armada manifest

    Review https://review.openstack.org/635952 is missing changes
    required to the manifest.yaml file. This commit fixes that.

    Story: 2003909
    Task: 29419

    Change-Id: I68f79c3b0a155d5687842697bdb7babc6082ac91
    Signed-off-by: Kristine Bujold <email address hidden>

commit b5ef279bd5a93be942e28d98d41712f268854626
Author: Sun Austin <email address hidden>
Date: Mon Feb 11 09:11:11 2019 +0800

    Remove un-necessary exception log

    Closes-Bug: 1814912

    Change-Id: Ic500ca78ace5d95d2356cc06fd332384f99ac28d
    Signed-off-by: Sun Austin <email address hidden>

commit c1385750620d77590d9ca5a6c2a5eb952cf0eeb6
Author: Don Penney <email address hidden>
Date: Fri Feb 8 14:20:04 2019 +0200

    Ceph initialization on AIO is done only in 'controller' manifests

    On AIO deployments puppet is run twice with two different manifests:
    1. 'controller': to configure controller services
    2. 'worker': to configure worker services.

    Ceph is configured when 'controller' manifests are applied, there is
    no need to run them a second time, when 'worker' set is applied.

    Commit adds new puppet classes to encapsulate ceph configuration
    based on node personality and adds a check to not apply it a 2nd
    time on controllers.

    If the ceph manifests are executed a second time then we get into
    a racing issue between SM's process monitoring and 'worker' puppet
    manifests triggering a restart of ceph-mon as part of reconfiguration

    After a reboot on AIO, SM takes control of ceph-mon monitoring
    after 'controller' puppet manifests finish applying. As part of this,
    SM monitors processes death notification and gets the pid from the
    .pid file. And periodically executes '/etc/init.d/ceph status
    mon.controller' for a more advanced monitoring.

    When the 'worker' manifests are executed, they trigger a restart
    of ceph-mon through /etc/init.d/ceph restart that has two steps: 'stop'
    in which ceph-mon is stopped, and 'start' in which it is restarted.

    In the first step, stopping ceph-mon leads to the death of ceph-mon
    process and removal of its PID file. This is promptly detected by
    SM which immediately triggers a start of ceph-mon that creates a
    new pid file. Problem is that ceph-mon was already in a restart,
    and at the end of the 'stop' step the init script cleans up the
    new pid file instead of the old.

    This leads to controllers swacting a couple of times before the system
    gets rid of the rogue process.

    Change-Id: I2a0df3bab716a553e71e322e1515bee2bb2f700d
    Co-authored-by: Ovidiu Poncea <email address hidden>
    Story: 2002844
    Task: 29214
    Signed-off-by: Ovidiu Poncea <ovidiu.poncea@w...

Read more...

tags: added: in-f-stein
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.