worker nodes failed after initial unlock

Bug #1882251 reported by Peng Peng
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Bob Church

Bug Description

Brief Description
-----------------
During system install and config, both controller nodes had been installed and unlocked successfully, but compute nodes failed after unlock.

Severity
--------
Major

Steps to Reproduce
------------------
install multi nodes system

Expected Behavior
------------------
all compute nodes are available after unlock

Actual Behavior
----------------
compute nodes failed after initial unlock

Reproducibility
---------------
Reproducible 2 of 2

System Configuration
--------------------
Multi-node system

Lab-name: wcp_8-12

Branch/Pull Time/Commit
-----------------------
2020-06-04_20-00-00

Last Pass
---------
2020-06-03_20-00-00

Timestamp/Logs
--------------

[sysadmin@controller-0 ~(keystone_admin)]$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | compute-0 | worker | unlocked | disabled | failed |
| 3 | compute-1 | worker | unlocked | disabled | failed |
| 4 | compute-2 | worker | unlocked | disabled | failed |
| 5 | controller-1 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+
[sysadmin@controller-0 ~(keystone_admin)]$ system host-show compute-0
+-----------------------+----------------------------------------------------------------+
| Property | Value |
+-----------------------+----------------------------------------------------------------+
| action | none |
| administrative | unlocked |
| availability | failed |
| bm_ip | 2620:10a:a001:a102::134 |
| bm_type | dynamic |
| bm_username | root |
| boot_device | /dev/disk/by-path/pci-0000:18:00.0-scsi-0:0:8:0 |
| capabilities | {} |
| clock_synchronization | ntp |
| config_applied | 6fc32a18-c718-4a2c-b1e5-7c1ba3c65e72 |
| config_status | None |
| config_target | 6fc32a18-c718-4a2c-b1e5-7c1ba3c65e72 |
| console | ttyS0,115200n8 |
| created_at | 2020-06-05T03:33:18.372476+00:00 |
| device_image_update | None |
| hostname | compute-0 |
| id | 2 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| inv_state | inventoried |
| invprovision | provisioning |
| location | {} |
| mgmt_ip | face::31f0:b4b7:17d6:b129 |
| mgmt_mac | 3c:fd:fe:b1:26:d8 |
| operational | disabled |
| personality | worker |
| reboot_needed | False |
| reserved | False |
| rootfs_device | /dev/disk/by-path/pci-0000:18:00.0-scsi-0:0:8:0 |
| serialid | None |
| software_load | 20.06 |
| subfunctions | worker,lowlatency |
| task | Configuration failure, threshold reached, Lock/Unlock to retry |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2020-06-05T14:34:08.504008+00:00 |
| uptime | 37323 |
| uuid | 9a7b34ac-28c9-4854-a97d-a546d5f59952 |
| vim_progress_status | None |
+-----------------------+----------------------------------------------------------------+

Test Activity
-------------
Sanity

Revision history for this message
Peng Peng (ppeng) wrote :
tags: added: stx.retestneeded
Peng Peng (ppeng)
description: updated
Yang Liu (yliu12)
summary: - compute nodes offline during system initialize
+ worker nodes failed after initial unlock
description: updated
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Bob Church (rchurch)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Issue was introduced by the following recent commits:
https://review.opendev.org/#/c/732724/
https://review.opendev.org/#/c/732726/

which merged on 2020-06-04

tags: added: stx.
tags: added: stx.4.0 stx.containers stx.metal
removed: stx.
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Frank Miller (sensfan22) wrote :
Bob Church (rchurch)
Changed in starlingx:
status: Triaged → Fix Released
Revision history for this message
Peng Peng (ppeng) wrote :

Issue was verified on
2020-06-09_20-00-00

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.