IPv6 DX Plus: 200.015 "board management controller sensor group read failures" raised after compute reboot

Bug #1847324 reported by Peng Peng
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Eric MacDonald

Bug Description

Brief Description
-----------------
config DX plus BMC address for IPv6 system.
Bring up DX plus system without any major alarms. Force reboot one compute node. After this node recovered, check alarm-list, there are 200.015 alarm raised and never cleared.

Severity
--------
Major

Steps to Reproduce
------------------
As description

TC-name: mtc/test_ungraceful_reboot.py::test_force_reboot_host[compute]

Expected Behavior
------------------
no 200.015 raised

Actual Behavior
----------------
200.015 alarm raised

Reproducibility
---------------
Seen once

System Configuration
--------------------
Dx plus system
IPv6

Lab-name: wolfpass-8-12

Branch/Pull Time/Commit
-----------------------
master 2019-10-07_20-00-00

Last Pass
---------
2019-10-01_20-00-00

Timestamp/Logs
--------------
[2019-10-08 14:39:40,810] 311 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::2]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2019-10-08 14:39:42,282] 433 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+-----------------------------------------------------------------------------+------------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+-----------------------------------------------------------------------------+------------------------------------------+----------+----------------------------+
| 9a8227ab-3bae-4927-ad0b-6074a84f611b | 100.114 | NTP address 64:ff9b::c632:eea3 is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::c632:eea3 | minor | 2019-10-08T14:37:40.924752 |
| 81b822df-2946-4038-b82a-228c157c0338 | 100.114 | NTP address 64:ff9b::9538:2f3c is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::9538:2f3c | minor | 2019-10-08T14:37:40.919852 |
| f53c2442-c36b-46d7-a07e-a4b38798d9ea | 100.114 | NTP address 64:ff9b::4713:9082 is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::4713:9082 | minor | 2019-10-08T14:37:40.914754 |
| 689def5e-1895-4fe6-b353-0d9be3fbaa91 | 100.114 | NTP address 64:ff9b::450a:a107 is not a valid or a reachable NTP server. | host=controller-1.ntp=64:ff9b::450a:a107 | minor | 2019-10-08T14:37:29.756240 |
| ec8a4b0e-b296-4f06-9e36-08f32b93bfef | 100.114 | NTP address 64:ff9b::45a4:d588 is not a valid or a reachable NTP server. | host=controller-1.ntp=64:ff9b::45a4:d588 | minor | 2019-10-08T14:37:29.754172 |
| fad9b0fe-074e-464e-a645-516afca425e9 | 400.003 | License key is not installed; a valid license key is required for operation | host=controller-0 | critical | 2019-10-08T14:37:06.599435 |
| 7a598885-bea6-458e-93e4-e3f70bb32eea | 100.114 | NTP configuration does not contain any valid or reachable NTP servers. | host=controller-0.ntp | major | 2019-10-08T14:31:33.084513 |
| ce461bc5-7425-40e3-9085-5b13dc44a20a | 400.003 | License key is not installed; a valid license key is required for operation | host=controller-1 | critical | 2019-10-08T14:29:21.733696 |
| 46e22817-b280-48bd-9e63-3c1859b0084f | 100.114 | NTP configuration does not contain any valid or reachable NTP servers. | host=controller-1.ntp | major | 2019-10-08T14:17:31.625496 |
+--------------------------------------+----------+-----------------------------------------------------------------------------+------------------------------------------+----------+----------------------------+
controller-1:~$

[2019-10-08 14:39:51,344] 311 DEBUG MainThread ssh.send :: Send 'hostname'
[2019-10-08 14:39:51,449] 433 DEBUG MainThread ssh.expect :: Output:
compute-2
compute-2:~$
[2019-10-08 14:39:51,449] 166 INFO MainThread host_helper.reboot_hosts:: Rebooting compute-2
[2019-10-08 14:39:51,449] 311 DEBUG MainThread ssh.send :: Send 'sudo reboot -f'

[2019-10-08 14:45:44,219] 311 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::2]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2019-10-08 14:45:45,756] 433 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+-------------------------------------------------------------------------------------+--------------------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+-------------------------------------------------------------------------------------+--------------------------------------------------+----------+----------------------------+
| b7851233-9447-43c8-8daa-1b66c93aba40 | 200.015 | compute-0 has one or more board management controller sensor group read failures | host=compute-0.sensorgroup=server voltage | major | 2019-10-08T14:45:35.562245 |
| c0de45fe-62fb-4074-b200-acd077b57b4c | 200.015 | compute-0 has one or more board management controller sensor group read failures | host=compute-0.sensorgroup=server power | major | 2019-10-08T14:45:35.497240 |
| 7865c1dc-261f-46ce-b8b8-3c83ddeb149c | 200.015 | compute-0 has one or more board management controller sensor group read failures | host=compute-0.sensorgroup=server temperature | major | 2019-10-08T14:45:35.435249 |
| f3627bb9-e526-42ef-b23b-25a33ce2caab | 200.015 | compute-0 has one or more board management controller sensor group read failures | host=compute-0.sensorgroup=server fans | major | 2019-10-08T14:45:35.363257 |
| 0166283f-b546-488a-80a4-5713c334582b | 200.015 | compute-1 has one or more board management controller sensor group read failures | host=compute-1.sensorgroup=server voltage | major | 2019-10-08T14:45:34.296306 |
| 7784f7af-cb37-4b52-9e89-a942e390160d | 200.015 | compute-1 has one or more board management controller sensor group read failures | host=compute-1.sensorgroup=server power | major | 2019-10-08T14:45:34.230268 |
| 7e4c2b83-d1b3-483d-b30a-56431bfe838c | 200.015 | compute-1 has one or more board management controller sensor group read failures | host=compute-1.sensorgroup=server temperature | major | 2019-10-08T14:45:34.161262 |
| 311b7153-31b4-4180-89d0-4dec40243719 | 200.015 | compute-1 has one or more board management controller sensor group read failures | host=compute-1.sensorgroup=server fans | major | 2019-10-08T14:45:34.055257 |
| 7bdaed67-3960-45b3-adc4-1ca5af2e31f8 | 200.015 | controller-0 has one or more board management controller sensor group read failures | host=controller-0.sensorgroup=server voltage | major | 2019-10-08T14:45:20.814219 |
| f295c57f-22d5-4187-8cc6-63481be5fd3f | 200.015 | controller-0 has one or more board management controller sensor group read failures | host=controller-0.sensorgroup=server power | major | 2019-10-08T14:45:20.595303 |
| 4e2e064b-fb80-4a79-840a-1c1c4eb9e23e | 200.015 | controller-0 has one or more board management controller sensor group read failures | host=controller-0.sensorgroup=server temperature | major | 2019-10-08T14:45:20.501314 |
| 06bd6e12-cabd-452e-bbd3-406c4009a31a | 200.015 | controller-0 has one or more board management controller sensor group read failures | host=controller-0.sensorgroup=server fans | major | 2019-10-08T14:45:20.432218 |
| ece0368d-d8a0-4649-b54c-826f4865baec | 200.015 | compute-2 has one or more board management controller sensor group read failures | host=compute-2.sensorgroup=server voltage | major | 2019-10-08T14:44:35.759203 |
| d4608834-e34a-44ab-8a66-3c04907d540a | 200.015 | compute-2 has one or more board management controller sensor group read failures | host=compute-2.sensorgroup=server power | major | 2019-10-08T14:44:35.699359 |
| 3960329d-c3f4-40bb-b5a5-0ca0e1e6d71c | 200.015 | compute-2 has one or more board management controller sensor group read failures | host=compute-2.sensorgroup=server temperature | major | 2019-10-08T14:44:35.642312 |
| 4347c441-4505-47fe-bf92-a33958372cda | 200.015 | compute-2 has one or more board management controller sensor group read failures | host=compute-2.sensorgroup=server fans | major | 2019-10-08T14:44:35.583216 |
| 8155d09f-bb03-41c9-b237-1bb21c76a730 | 200.015 | controller-1 has one or more board management controller sensor group read failures | host=controller-1.sensorgroup=server voltage | major | 2019-10-08T14:44:25.771374 |
| b4e4b80b-515a-4e1d-88f8-b1c80609f556 | 200.015 | controller-1 has one or more board management controller sensor group read failures | host=controller-1.sensorgroup=server power | major | 2019-10-08T14:44:25.712209 |
| 95187113-5e5a-482d-b071-a21428e13884 | 200.015 | controller-1 has one or more board management controller sensor group read failures | host=controller-1.sensorgroup=server temperature | major | 2019-10-08T14:44:25.655209 |
| 926f2e17-0de4-4217-81cf-ac95427d76f4 | 200.015 | controller-1 has one or more board management controller sensor group read failures | host=controller-1.sensorgroup=server fans | major | 2019-10-08T14:44:25.597495 |
| 9a8227ab-3bae-4927-ad0b-6074a84f611b | 100.114 | NTP address 64:ff9b::c632:eea3 is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::c632:eea3 | minor | 2019-10-08T14:37:40.924752 |
| 81b822df-2946-4038-b82a-228c157c0338 | 100.114 | NTP address 64:ff9b::9538:2f3c is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::9538:2f3c | minor | 2019-10-08T14:37:40.919852 |
| f53c2442-c36b-46d7-a07e-a4b38798d9ea | 100.114 | NTP address 64:ff9b::4713:9082 is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::4713:9082 | minor | 2019-10-08T14:37:40.914754 |
| 689def5e-1895-4fe6-b353-0d9be3fbaa91 | 100.114 | NTP address 64:ff9b::450a:a107 is not a valid or a reachable NTP server. | host=controller-1.ntp=64:ff9b::450a:a107 | minor | 2019-10-08T14:37:29.756240 |
| ec8a4b0e-b296-4f06-9e36-08f32b93bfef | 100.114 | NTP address 64:ff9b::45a4:d588 is not a valid or a reachable NTP server. | host=controller-1.ntp=64:ff9b::45a4:d588 | minor | 2019-10-08T14:37:29.754172 |
| fad9b0fe-074e-464e-a645-516afca425e9 | 400.003 | License key is not installed; a valid license key is required for operation | host=controller-0 | critical | 2019-10-08T14:37:06.599435 |
| 7a598885-bea6-458e-93e4-e3f70bb32eea | 100.114 | NTP configuration does not contain any valid or reachable NTP servers. | host=controller-0.ntp | major | 2019-10-08T14:31:33.084513 |
| ce461bc5-7425-40e3-9085-5b13dc44a20a | 400.003 | License key is not installed; a valid license key is required for operation | host=controller-1 | critical | 2019-10-08T14:29:21.733696 |
| 46e22817-b280-48bd-9e63-3c1859b0084f | 100.114 | NTP configuration does not contain any valid or reachable NTP servers. | host=controller-1.ntp | major | 2019-10-08T14:17:31.625496 |
+--------------------------------------+----------+-------------------------------------------------------------------------------------+--------------------------------------------------+----------+----------------------------+
controller-1:~$

Test Activity
-------------
Sanity

Revision history for this message
Peng Peng (ppeng) wrote :
Ghada Khalil (gkhalil)
summary: - APv6 DX Plus: 200.015 "board management controller sensor group read
+ IPv6 DX Plus: 200.015 "board management controller sensor group read
failures" raised after compute reboot
description: updated
Yang Liu (yliu12)
tags: added: stx.retestneeded
Revision history for this message
Ghada Khalil (gkhalil) wrote :

As an FYI, there was an earlier bug from the same lab reporting the same issue, but it was deemed a configuration issue at the time: https://bugs.launchpad.net/starlingx/+bug/1846536

Changed in starlingx:
assignee: nobody → Eric MacDonald (rocksolidmtce)
tags: added: stx.metal
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Will wait for Eric to triage before deciding on priority / release gate

description: updated
Revision history for this message
Eric MacDonald (rocksolidmtce) wrote :

I found this issue during my Redfish Feature Customer Usability testing.

The fix for that and a few other issues is posted for review here

https://review.opendev.org/#/c/686435/

Please retest once this update is merged.

Changed in starlingx:
status: New → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0 / medium priority given that Eric confirmed that this is an issue

tags: added: stx.3.0
Changed in starlingx:
importance: Undecided → Medium
importance: Medium → High
importance: High → Medium
Revision history for this message
Ghada Khalil (gkhalil) wrote :

https://review.opendev.org/#/c/686435/ was merged on 2019-10-10

Marking this bug as Fix Released

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Peng Peng (ppeng) wrote :

verified on
Lab: WP_8_12
Load: 2019-10-30_20-00-00

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.