Compute remains on degraded after lock/unlock

Bug #1839692 reported by Cristopher Lemus
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
zhipeng liu

Bug Description

Brief Description
-----------------
On Standard Dedicated Storage (2+2+2), compute-1 node remains on degrade status after lock/unlock.

Severity
--------
Provide the severity of the defect.
Major: The other compute remains online.

Steps to Reproduce
------------------
This is part of sanity execution. The actions are in summary:

- system host-lock compute-1
- system host-unlock compute-1
- Monitor progress/status with system host-show compute-1 (node stays on degraded status).

Expected Behavior
------------------
Node compute-1 should be available after lock/unlock operation.

Actual Behavior
----------------
Node stays on degraded status.

Reproducibility
---------------
Seen once. Will update if this appears with newer builds.

System Configuration
--------------------
Dedicated storage (2+2+2).

Branch/Pull Time/Commit
-----------------------
###
### StarlingX
### Built from master
###

OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20190809T053000Z"

JOB="STX_build_master_master"
<email address hidden>"
BUILD_NUMBER="207"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-08-09 05:30:00 +0000"

Last Pass
---------
Build from 2019-08-07.

Timestamp/Logs
--------------
According to the logs, node is on degraded status for the following reason:

| 200. | compute-1 is degraded due to the | host=compute-1.process= | major | 2019-08-09T |
| 006 | failure of its 'pci-irq-affinity- | pci-irq-affinity-agent | | 17:34:16. |
| | agent' process. Auto recovery of this | | | 006704 |
| | major process is in progress. | | |

Full collect is attached from all nodes.

Test Activity
-------------
Sanity.

Note that this might be related to: https://bugs.launchpad.net/starlingx/+bug/1839160 . However, the lock was not forced.

Revision history for this message
Cristopher Lemus (cjlemusc) wrote :
Revision history for this message
Frank Miller (sensfan22) wrote :

Please re-test with a load built Aug 10 or later as this issue may be fixed via this commit:
https://review.opendev.org/#/c/675503/

Changed in starlingx:
status: New → Incomplete
Revision history for this message
Ghada Khalil (gkhalil) wrote :

As noted above, this is likely a duplicate of https://bugs.launchpad.net/starlingx/+bug/1839525
which was addressed by https://review.opendev.org/#/c/675503/

Waiting for confirmation from the reporter after re-test with a load including the fix.

Revision history for this message
Cristopher Lemus (cjlemusc) wrote :

Hi Frank, Ghada,

I confirm that with BUILD_ID=20190812T033004Z this issue is no longer appearing.

Thanks for checking.

Revision history for this message
Ghada Khalil (gkhalil) wrote :
tags: added: stx.2.0 stx.distro.openstack
Changed in starlingx:
status: Incomplete → Fix Released
importance: Undecided → High
assignee: nobody → zhipeng liu (zhipengs)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.