subcloud2 FPGA update failed, but "dcmanager fw-update-strategy show" shows complete

Bug #1890521 reported by Difu Hu
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Teresa Ho

Bug Description

Brief Description
-----------------
subcloud2 FPGA update failed, but "dcmanager fw-update-strategy show" shows complete

Severity
--------
Major

Steps to Reproduce
------------------
precondition: subcloud2 FPGA has flashed root-key image
"dcmanager fw-update-strategy apply" to update subcloud2 FPGA with an unsigned image

Expected Behavior
------------------
subcloud2 FPGA updates failed, and "dcmanager fw-update-strategy show" shows failed

Actual Behavior
----------------
subcloud2 FPGA updates failed, but "dcmanager fw-update-strategy show" shows complete

Reproducibility
---------------
yes

System Configuration
--------------------
Lab-name: DC-3

Branch/Pull Time/Commit
-----------------------
2020-07-31_20-00-00

Last Pass
---------
N/A

Timestamp/Logs
--------------
### on SystemController:
[sysadmin@controller-0 ~(keystone_admin)]$ system --os-region-name SystemController device-image-show 1e43227a-13db-41f4-a9c9-0e95c9219001
+----------------+--------------------------------------------------+
| Property | Value |
+----------------+--------------------------------------------------+
| uuid | 1e43227a-13db-41f4-a9c9-0e95c9219001 |
| bitstream_type | functional |
| pci_vendor | 8086 |
| pci_device | 0b30 |
| bitstream_id | 1 |
| key_signature | None |
| revoke_key_id | None |
| name | None |
| description | None |
| image_version | None |
| applied | True |
| applied_labels | [{u'subcloud2': u'002'}, {u'subcloud4': u'004'}] |
+----------------+--------------------------------------------------+

[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager fw-update-strategy show
+------------------------+----------------------------+
| Field | Value |
+------------------------+----------------------------+
| strategy type | firmware |
| subcloud apply type | None |
| max parallel subclouds | None |
| stop on failure | False |
| state | complete |
| created_at | 2020-08-06 02:54:57.357858 |
| updated_at | 2020-08-06 03:29:32.879757 |
+------------------------+----------------------------+

[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager strategy-step list
+-----------+-------+----------+---------+----------------------------+----------------------------+
| cloud | stage | state | details | started_at | finished_at |
+-----------+-------+----------+---------+----------------------------+----------------------------+
| subcloud2 | 3 | complete | | 2020-08-06 03:01:01.264087 | 2020-08-06 03:29:24.086676 |
+-----------+-------+----------+---------+----------------------------+----------------------------+

### on subcloud2:
[sysadmin@controller-0 ~(keystone_admin)]$ system device-image-show 1e43227a-13db-41f4-a9c9-0e95c9219001
+----------------+--------------------------------------+
| Property | Value |
+----------------+--------------------------------------+
| uuid | 1e43227a-13db-41f4-a9c9-0e95c9219001 |
| bitstream_type | functional |
| pci_vendor | 8086 |
| pci_device | 0b30 |
| bitstream_id | 1 |
| key_signature | None |
| revoke_key_id | None |
| name | None |
| description | None |
| image_version | None |
| applied | True |
| applied_labels | [{u'subcloud2': u'002'}] |
+----------------+--------------------------------------+

[sysadmin@controller-0 ~(keystone_admin)]$ system device-image-state-list
+--------------+----------+--------------------------------------+--------+-----------------+----------------------------------+
| hostname | PCI | Device image uuid | status | Update start | updated_at |
| | device | | | time | |
| | address | | | | |
+--------------+----------+--------------------------------------+--------+-----------------+----------------------------------+
| controller-0 | 0000:b2: | 1e43227a-13db-41f4-a9c9-0e95c9219001 | failed | 2020-08-06T03: | 2020-08-06T03:16:52.330096+00:00 |
| | 00.0 | | | 01:32.432129+00 | |
| | | | | :00 | |
| | | | | | |
+--------------+----------+--------------------------------------+--------+-----------------+----------------------------------+

[sysadmin@controller-0 ~(keystone_admin)]$ cat /var/log/kern.log | grep intel-max10
2020-08-06T03:16:51.944 controller-0 kernel: err [ 3723.365952] intel-max10 spi2.0: RSU error status: 0x02020104
2020-08-06T03:16:51.944 controller-0 kernel: err [ 3723.371703] intel-max10 spi2.0: RSU auth result: 0x00000011

### subcloud2 after automatically lock/unlock:
[sysadmin@controller-0 ~(keystone_admin)]$ system device-image-show 1e43227a-13db-41f4-a9c9-0e95c9219001
+----------------+--------------------------------------+
| Property | Value |
+----------------+--------------------------------------+
| uuid | 1e43227a-13db-41f4-a9c9-0e95c9219001 |
| bitstream_type | functional |
| pci_vendor | 8086 |
| pci_device | 0b30 |
| bitstream_id | 1 |
| key_signature | None |
| revoke_key_id | None |
| name | None |
| description | None |
| image_version | None |
| applied | False |
| applied_labels | None |
+----------------+--------------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$ system device-image-state-list

[sysadmin@controller-0 ~(keystone_admin)]$

Test Activity
-------------
Functional Testing

Ghada Khalil (gkhalil)
description: updated
Revision history for this message
Difu Hu (difuhu) wrote :
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Teresa Ho (teresaho)
Revision history for this message
Teresa Ho (teresaho) wrote :

The device PCI address changed after reboot causing the device-image-state list to be empty. DC manager sees that this list is empty and there are no longer any enabled devices and declares the state as complete.

This issue should be addressed by this commit https://review.opendev.org/#/c/744684/

Ghada Khalil (gkhalil)
tags: added: stx.5.0
Changed in starlingx:
importance: Undecided → Medium
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as Fix Released.
https://review.opendev.org/#/c/744684/ merged on 2020-08-04

Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Difu, please re-test.

Changed in starlingx:
status: New → Fix Released
Revision history for this message
Difu Hu (difuhu) wrote :

Verification blocked on 2020-08-07_20-00-00.
Hit another issue: https://bugs.launchpad.net/starlingx/+bug/1890915
This LP-1890521 only happened on subcloud2, while new LP-1890915 happened on all subclouds.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

LP-1890521 has been addressed; the re-test can proceed.

tags: added: stx.config
Revision history for this message
Difu Hu (difuhu) wrote :

Verified on build 2020-06-27_18-35-20 with PATCH_0002.

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.