Locking host failed when primary_reselect not specified

Bug #1928461 reported by Teresa Ho
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Teresa Ho

Bug Description

Brief Description
-----------------
On a system configured with AE interface in active/standby mode without specifying the primary_reselect attribute, after locking a controller, it gets be rebooted automatically and returned to unlocked state.

Severity
--------
Critical: System/Feature is not usable after the defect

Steps to Reproduce
------------------
- Install lab with active/standby AE interface without specifying primary_reselect
- lock a controller-1
- wait till the host get locked (check the status of host continuously, as it may start rebooting)
- After the host is locked, check system host-show controller-1 and task will be booting after few min

Expected Behavior
------------------
After system host-lock controller-1, it is expected that the host be locked and stay in that state.

Actual Behavior
----------------
host get rebooted after system host-lock controller-1

Reproducibility
---------------
100%

System Configuration
--------------------
IPv4 lab, duplex, bond interface configuration with default parameter

Branch/Pull Time/Commit
-----------------------
BUILD_ID="2021-04-20_00-00-10"

Last Pass
---------
N/A

Timestamp/Logs
--------------
[sysadmin@controller-0 ~(keystone_admin)]$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | controller-1 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+
[sysadmin@controller-0 ~(keystone_admin)]$ date
Wed Apr 21 14:04:40 UTC 2021

[sysadmin@controller-0 ~(keystone_admin)]$ system host-lock controller-1
+-----------------------+--------------------------------------------+
| Property | Value |
+-----------------------+--------------------------------------------+
| action | none |
| administrative | unlocked |
| availability | available |
| bm_ip | 128.224.64.63 |
| bm_type | dynamic |
| bm_username | root |
| boot_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0 |
| capabilities | {u'stor_function': u'monitor'} |
| clock_synchronization | ntp |
| config_applied | 89167117-4d41-43c1-bf34-d489e048f57c |
| config_status | None |
| config_target | 89167117-4d41-43c1-bf34-d489e048f57c |
| console | ttyS0,115200n8 |
| created_at | 2021-04-20T16:00:09.329735+00:00 |
| device_image_update | None |
| hostname | controller-1 |
| id | 2 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| inv_state | inventoried |
| invprovision | provisioned |
| location | {} |
| mgmt_ip | 192.168.204.3 |
| mgmt_mac | 24:8a:07:58:d0:d0 |
| operational | enabled |
| personality | controller |
| reboot_needed | False |
| reserved | False |
| rootfs_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0 |
| serialid | None |
| software_load | 21.05 |
| subfunction_avail | available |
| subfunction_oper | enabled |
| subfunctions | controller,worker |
| task | Locking |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2021-04-21T14:04:25.687941+00:00 |
| uptime | 73313 |
| uuid | f185f4d4-8fcb-43d4-a966-09777f790534 |
| vim_progress_status | services-enabled |
+-----------------------+--------------------------------------------+
system host-show controller-1
+-----------------------+-----------------------------------------------------------------------+
| Property | Value |
+-----------------------+-----------------------------------------------------------------------+
| action | none |
| administrative | unlocked |
| availability | offline |
| bm_ip | 128.224.64.63 |
| bm_type | dynamic |
| bm_username | root |
| boot_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0 |
| capabilities | {u'stor_function': u'monitor', u'Personality': u'Controller-Standby'} |
| clock_synchronization | ntp |
| config_applied | 89167117-4d41-43c1-bf34-d489e048f57c |
| config_status | None |
| config_target | 89167117-4d41-43c1-bf34-d489e048f57c |
| console | ttyS0,115200n8 |
| created_at | 2021-04-20T16:00:09.329735+00:00 |
| device_image_update | None |
| hostname | controller-1 |
| id | 2 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| inv_state | inventoried |
| invprovision | provisioned |
| location | {} |
| mgmt_ip | 192.168.204.3 |
| mgmt_mac | 24:8a:07:58:d0:d0 |
| operational | disabled |
| personality | controller |
| reboot_needed | False |
| reserved | False |
| rootfs_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0 |
| serialid | None |
| software_load | 21.05 |
| subfunction_avail | online |
| subfunction_oper | disabled |
| subfunctions | controller,worker |
| task | Booting |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2021-04-21T14:06:15.554826+00:00 |
| uptime | 0 |
| uuid | f185f4d4-8fcb-43d4-a966-09777f790534 |
| vim_progress_status | services-disabled |
+-----------------------+-----------------------------------------------------------------------+
Wed Apr 21 14:07:00 UTC 2021
[sysadmin@controller-0 ~(keystone_admin)]$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | controller-1 | controller | unlocked | disabled | offline |
+----+--------------+-------------+----------------+-------------+--------------+

Test Activity
-------------
Feature Testing

Workaround
----------
Always specify the primaryReselect value

Teresa Ho (teresaho)
Changed in starlingx:
assignee: nobody → Teresa Ho (teresaho)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/791448

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to gui (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/gui/+/791449

Teresa Ho (teresaho)
summary: - DM sync failed when primary_reselect not specified in config model
+ Locking host failed when primary_reselect not specified
description: updated
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Screening: stx.6.0 / medium priority - issue is introduced by a recent feature: https://storyboard.openstack.org/#!/story/2008706 in stx.5.0. However, there is an easy workaround.

Changed in starlingx:
importance: Undecided → High
tags: added: stx.5.0 stx.6.0 stx.networking
description: updated
Changed in starlingx:
importance: High → Medium
tags: removed: stx.5.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/791448
Committed: https://opendev.org/starlingx/config/commit/d4f82539e0d421ab8a7f1cd466bdbc269727bddc
Submitter: "Zuul (22348)"
Branch: master

commit d4f82539e0d421ab8a7f1cd466bdbc269727bddc
Author: Teresa Ho <email address hidden>
Date: Thu May 13 22:23:59 2021 -0400

    Leave parameter_reselect as null if not specified

    The sysinv API for interface returns the optional parameter
    'primary_reselect' with its default value when the attribute
    is not specified.
    This update is to leave the parameter as null if it is not
    specified.

    Closes-Bug: 1928461

    Change-Id: I67629aec1e58c26b1ed76c0cd1e37cd53e74b0b2
    Signed-off-by: Teresa Ho <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to gui (master)

Reviewed: https://review.opendev.org/c/starlingx/gui/+/791449
Committed: https://opendev.org/starlingx/gui/commit/e05e1a43531499d94cfb1e538683ee36eea92b43
Submitter: "Zuul (22348)"
Branch: master

commit e05e1a43531499d94cfb1e538683ee36eea92b43
Author: Teresa Ho <email address hidden>
Date: Thu May 13 23:04:04 2021 -0400

    Do not display primary_reselect if not specified

    If the attribute 'primary_reselect' is not specified, the sysinv API
    will leave it as null and GUI will not display the attribute.

    Closes-Bug: 1928461

    Change-Id: I5b8ef8b29fb7775dde8607bb14cd733015269f82
    Signed-off-by: Teresa Ho <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to gui (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/gui/+/792252

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to gui (f/centos8)
Download full text (16.7 KiB)

Reviewed: https://review.opendev.org/c/starlingx/gui/+/792252
Committed: https://opendev.org/starlingx/gui/commit/63d6de4701a7f21779ad9ea4060fce9ed85bc71f
Submitter: "Zuul (22348)"
Branch: f/centos8

commit e05e1a43531499d94cfb1e538683ee36eea92b43
Author: Teresa Ho <email address hidden>
Date: Thu May 13 23:04:04 2021 -0400

    Do not display primary_reselect if not specified

    If the attribute 'primary_reselect' is not specified, the sysinv API
    will leave it as null and GUI will not display the attribute.

    Closes-Bug: 1928461

    Change-Id: I5b8ef8b29fb7775dde8607bb14cd733015269f82
    Signed-off-by: Teresa Ho <email address hidden>

commit f1a4d30eca91c7a239ebd7479a56fef7870a4b2e
Author: Pablo Bovina <email address hidden>
Date: Fri May 7 16:59:50 2021 -0300

    Display DataNetworks list

    DataNetworks are listed for pci-sriov
    under Create/Edit Interface forms.

    Closes-bug: 1927782
    Signed-off-by: Pablo Bovina <email address hidden>
    Change-Id: If927bb0facdec9e587a13354bef56eca5df08785

commit 7973677a3d7d518c31757b36037373d2c4ac769c
Author: Andre Fernando Zanella Kantek <email address hidden>
Date: Thu May 6 07:32:59 2021 -0400

    In AIO-SX, interface edit rejected with Host administrative unlocked

    It was detected the edit rejection when the user, on an unlocked
    host, tries to convert an ethernet non-SRIOV to an SRIOV-PF
    interface, with the server responding "Host 'controller-0' is
    administrative 'unlocked'".

    This is caused because UpdateInterface.handle() executes first the
    datanetwork assignment and then modifies the interface. Since the
    assignment, on an unlocked host, is only possible for SRIOV
    interfaces, the order of execution matters, we need to have the
    interface modified and then assigned. The correction consists of
    altering the order (first modify then assign) to do the described.

    Tests:
    To ensure the continuous operation of the other types of conversion
    the following combinations were tested (all were done adding the
    interface to a network or datanetwork, depending on the class):

    Unlocked state:
    ethernet/[none,data,pci-pt,platform] to pci-sriov: accepted
    modify parameters of a pci-sriov: rejected
    conversion to other than pci-sriov: rejected

    Locked state:
    all conversions (with network/datanetwork assignment) are accepted

    Closes-Bug: 1925183

    Signed-off-by: Andre Fernando Zanella Kantek <email address hidden>
    Change-Id: Ib124bf7222e07966becbb81198f65f5bc55715ce

commit ddcc4fd3ccb4c02580c71414345993252b089761
Author: Enzo Candotti <email address hidden>
Date: Tue May 4 11:08:57 2021 -0300

    Enable add/edit Worker personality on DC AIO-DX's GUI

    This update is to allow the option to add a new host with Worker
    personality on Distributed Cloud mode.

    Closes-Bug: 1927107

    Signed-off-by: Enzo Candotti <email address hidden>
    Change-Id: Idfed9352c7c6467014a2ed2cf10b70f6b470c28c

commit de43c019c0b7f038d0184d10aab2bf61b6c5e147
Author: Andre Fer...

tags: added: in-f-centos8
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793460

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793696

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/794611

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/794906

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on config (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/794611

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (f/centos8)
Download full text (147.3 KiB)

Reviewed: https://review.opendev.org/c/starlingx/config/+/794906
Committed: https://opendev.org/starlingx/config/commit/75758b37a5a23c8811355b67e2a430a1713cd85b
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 9e420d9513e5fafb1df4d29567bc299a9e04d58d
Author: Bin Qian <email address hidden>
Date: Mon May 31 14:45:52 2021 -0400

    Add more logging to run docker login

    Add error log for running docker login. The new log could
    help identify docker login failure.

    Closes-Bug: 1930310
    Change-Id: I8a709fb6665de8301fbe3022563499a92b2a0211
    Signed-off-by: Bin Qian <email address hidden>

commit 31c77439d2cea590dfcca13cfa646522665f8686
Author: albailey <email address hidden>
Date: Fri May 28 13:42:42 2021 -0500

    Fix controller-0 downgrade failing to kill ceph

    kill_ceph_storage_monitor tried to manipulate a pmon
    file that does not exist in an AIO-DX environment.

    We no longer invoke kill_ceph_storage_monitor in an
    AIO SX or DX env.

    This allows: "system host-downgrade controller-0"
    to proceed in an AIO-DX environment where that second
    controller (controller-0) was upgraded.

    Partial-Bug: 1929884
    Signed-off-by: albailey <email address hidden>
    Change-Id: I633853f75317736084feae96b5b849c601204c13

commit 0dc99eee608336fe01b58821ea404286371f1408
Author: albailey <email address hidden>
Date: Fri May 28 11:05:43 2021 -0500

    Fix file permissions failure during duplex upgrade abort

    When issuing a downgrade for controller-0 in a duplex upgrade
    abort and rollback scenario, the downgrade command was failing
    because the sysinv API does not have root permissions to set
    a file flag.
    The fix is to use RPC so the conductor can create the flag
    and allow the downgrade for controller-0 to get further.

    Partial-Bug: 1929884
    Signed-off-by: albailey <email address hidden>
    Change-Id: I913bcad73309fe887a12cbb016a518da93327947

commit 7ef3724dad173754e40b45538b1cc726a458cc1c
Author: Chen, Haochuan Z <email address hidden>
Date: Tue May 25 16:16:29 2021 +0800

    Fix bug rook-ceph provision with multi osd on one host

    Test case:
    1, deploy simplex system
    2, apply rook-ceph with below override value
    value.yaml
    cluster:
      storage:
        nodes:
        - name: controller-0
          devices:
          - name: sdb
          - name: sdc
    3, reboot

    Without this fix, only osd pod could launch successfully after boot
    as vg start with ceph could not correctly add in sysinv-database

    Closes-bug: 1929511

    Change-Id: Ia5be599cd168d13d2aab7b5e5890376c3c8a0019
    Signed-off-by: Chen, Haochuan Z <email address hidden>

commit 23505ba77d76114cf8a0bf833f9a5bcd05bc1dd1
Author: Angie Wang <email address hidden>
Date: Tue May 25 18:49:21 2021 -0400

    Fix issue in partition data migration script

    The created partition dictonary partition_map is not
    an ordered dict so we need to sort it by its key -
    device node when iterating it to adjust the device
    nodes/paths for user created extra partitions to ensure
    the number of device node...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on config (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793696

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793460

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.