controller-0 mgmt_mac not set after ansible configuration

Bug #1828880 reported by Allain Legacy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Tee Ngo

Bug Description

Brief Description
-----------------
After configuring the system using the Ansible playbook, configuring the remaining attributes, and unlock controller-0 the "mgmt_mac" remains uninitialized at 00:00:00:00:00:00. This does not seem to cause any issues until the system is swacted away from controller-0 and controller-0 is rebooted. Once in this state the node will not recover as mtce is failing to enable services.

The node's current task is set to:
Service Failure, threshold reached, Lock/Unlock to retry

The mtcClient.log on the local node is repeatedly reporting (which seems to indicate a mismatch in the expected MAC address):

2019-05-13T18:24:41.473 [90749.00343] controller-0 mtcClient msg mtcCompMsg.cpp ( 129) mtc_service_command : Warn : mtcAlive req command not for this host (exp:68:05:ca:3a:18:08 det:00:00:00:00:00:00) ; ignoring ...
2019-05-13T18:24:41.473 [90749.00344] controller-0 mtcClient --- nodeBase.cpp ( 298) print_mtc_message : Info : controller-0 rx <- mtcAlive req (Mgmnt network) 1.0 6:0:0.0.0.0 [cgts mtc cmd req:00:00:00:00:00:00] 192.168.144.2

Severity
--------
Critical, controller-0 will not recover following a reboot or lock/unlock.

Steps to Reproduce
------------------
Install a system using the Ansible playbook, manually configured the required attributes, and install/configure the remaining nodes. Once the system is completely installed/configured observe that the mgmt_mac attribute is still incorrect and that a reboot of controller-0 fails to automatically recover it.

Expected Behavior
------------------
The mgmt_mac should be set based on the MAC address of the mgmt interface once the controller-0 node is configured and unlocked.

Actual Behavior
----------------
mgmt_mac is never initialized properly.

Reproducibility
---------------
100%

System Configuration
--------------------
Standard system (2+4)

Branch/Pull Time/Commit
-----------------------
Private load rebased from May 10.

Last Pass
---------
Unknown

Timestamp/Logs
--------------
See above

Test Activity
-------------
Developer testing

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Tee Ngo (teewrs)
Ghada Khalil (gkhalil)
Changed in starlingx:
status: New → Triaged
importance: Undecided → High
Ghada Khalil (gkhalil)
tags: added: stx.config
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating; related to ansible deployment feature. system impact is high as the controller won't recover if rebooted/swacted.

tags: added: stx.2.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/659865

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to config (master)

Reviewed: https://review.opendev.org/658174
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=0dddabca4dd9c65996367f6a8e7140077f626558
Submitter: Zuul
Branch: master

commit 0dddabca4dd9c65996367f6a8e7140077f626558
Author: Tee Ngo <email address hidden>
Date: Thu May 9 14:29:25 2019 -0400

    Enable platform APIs from pods at bootstrap

    This commit enables access to platform service APIs from within
    Kubernetes pods prior to initial controller unlock. Prior to
    this changes, service endpoints were only reconfigured right
    before the unlock making sysinv apis inaccessible to services
    running inside the pods as they can not reach the loopback IP
    (127.0.0.1).

    This is achieved by reconfiguring service endpoints
      a) during initial bootstrap play from loopback IP to the provided
         management and OAM IPs
      b) during subsequent replays with newly provided management
         and/or oam network config values.

    Tests performed:
      - Bootstrap with defaults, verify endpoints
      - Change management subnet value and replay, verify endpoints
      - Change oam floating IP and replay, verify endpoints
      - Configure host for unlock
      - Unlock controller

    Story: 2004695
    Task: 30914
    Related-Bug: #1828880

    Change-Id: I9ef9d30bbf8713c75206b338aefd53c3e77db0cb
    Signed-off-by: Tee Ngo <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/659865
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=319b5602df48594c495489b900ddda73979b1e66
Submitter: Zuul
Branch: master

commit 319b5602df48594c495489b900ddda73979b1e66
Author: Tee Ngo <email address hidden>
Date: Fri May 17 14:23:43 2019 -0400

    Correct controller-0 mgmt mac following bootstrap

    This commit corrects the mgmt_mac of controller-0 as part
    of mgmt interface provisioning following Ansible bootstrap.

    Test:
      Bring up a standard system. Verify that after a force
      reboot of the active controller, the controller is able
      to recover successfully.

    Closes-Bug: #1828880
    Closes-Bug: #1829545
    Depends-On: I9ef9d30bbf8713c75206b338aefd53c3e77db0cb

    Change-Id: I3536202a396c47bc0cf8463505f6de48815fee02
    Signed-off-by: Tee Ngo <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.