Redfish tool was not selected automatically when bmc ip address was corrected.

Bug #1853358 reported by Anujeyan Manokeran
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Eric MacDonald

Bug Description

Brief Description
-----------------
When BMC IP address was changed from incorrect ip to correct ip address selection of redfish tool also changed to ipmi tool . This was discussed with Eric
and understood there was no relearn mechanism when IP address was changed . When I tried re-learn it did not switch to use redfish tool.

Below alarm was seen when compute-2 was provisioned with incorrect ip

:~$ source /etc/platform/openrc
[sysadmin@controller-1 ~(keystone_admin)]$ fm alarm-list
+----------+---------------------------------------------------------+----------------+----------+----------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+---------------------------------------------------------+----------------+----------+----------------+
| 200.010 | compute-2 access to board management module has failed. | host=compute-2 | warning | 2019-11-20T16: |
| | | | | 10:47.777277 |
| | | | | |
+----------+---------------------------------------------------------+----------------+----------+----------------+

After correcting the ip address there was no files created under redfishtool.
ls -lrt /var/run/bmc/redfishtool/*sensor*
-rw-r--r-- 1 root root 18143 Nov 19 18:44 /var/run/bmc/redfishtool/_compute-0_thermal_sensor_data
-rw-r--r-- 1 root root 4532 Nov 20 16:26 /var/run/bmc/redfishtool/hwmond_compute-1_power_sensor_data
-rw-r--r-- 1 root root 18143 Nov 20 16:26 /var/run/bmc/redfishtool/hwmond_compute-1_thermal_sensor_data
-rw-r--r-- 1 root root 4510 Nov 20 16:26 /var/run/bmc/redfishtool/hwmond_controller-1_power_sensor_data
-rw-r--r-- 1 root root 22750 Nov 20 16:26 /var/run/bmc/redfishtool/hwmond_controller-1_thermal_sensor_data
-rw-r--r-- 1 root root 4510 Nov 20 16:27 /var/run/bmc/redfishtool/hwmond_controller-0_power_sensor_data
-rw-r--r-- 1 root root 22750 Nov 20 16:27 /var/run/bmc/redfishtool/hwmond_controller-0_thermal_sensor_data
-rw-r--r-- 1 root root 4533 Nov 20 16:27 /var/run/bmc/redfishtool/hwmond_compute-0_power_sensor_data
-rw-r--r-- 1 root root 18143 Nov 20 16:27 /var/run/bmc/redfishtool/hwmond_compute-0_thermal_sensor_data

ls -lrt /var/run/bmc/ipmitool/*sensor*
-rw-r--r-- 1 root root 9796 Nov 19 18:38 /var/run/bmc/ipmitool/hwmond_compute-0_sensor_data
-rw-r--r-- 1 root root 9796 Nov 20 16:29 /var/run/bmc/ipmitool/hwmond_compute-2_sensor_data

Ipmi tool was used to gather data.
system host-sensor-list compute-2
+--------------------------------------+------------------+-------------+---------+---------+
| uuid | name | sensortype | state | status |
+--------------------------------------+------------------+-------------+---------+---------+
| 089a8fb9-349d-401e-9ebd-a3ab4474e188 | Agg Therm Mgn 1 | temperature | enabled | ok |
| e3c6cf50-141a-4a0f-aa71-8e2b15e9afb7 | Agg Therm Mgn 2 | temperature | enabled | ok |
| 375a7998-8869-40aa-b4c5-a94a0178627f | BB +12.0V | voltage | enabled | ok |
| 1f349a5c-0db9-44b0-8e5c-44ddebe4f879 | BB +3.3V Vbat | voltage | enabled | ok |
| f4590360-3607-4038-8aab-11c9a8219e2c | BB BMC Temp | temperature | enabled | ok |
| 37d94ddb-89d2-4d43-a076-808ad7103223 | BB Lft Rear Temp | temperature | enabled | ok |
| 6b2db4d5-1a76-4df2-9f52-b8ba81b17370 | BB P1 VR Temp | temperature | enabled | ok |
| 8cbb1ab7-22c4-4834-9fa6-61160eb426d0 | BB P2 VR Temp | temperature | enabled | ok |
| 2d221917-58fa-4d7b-8b34-961c95393edb | BB Rt Rear Temp | temperature | enabled | ok |

Severity
--------
Major

Steps to Reproduce
------------------
1. BMC was provisioned on host and using redfish tool sensors are collected.
2. Provision with wrong ip address.
3. Verify alarm for BMC connection failure
4. Correct the ip address and verify alarm clear and sensor data collection.
5. Verify what tool used to collect sensor data.It should go back to the original
    tool that was used redfish.

TC-name:

Expected Behavior
------------------
Selection of tool should redfish after correcting the ip address.

Actual Behavior
----------------
As per description when wrong IP address is selected and corrected dynamically the tool was selection is changed from redfish to ipmitool

Reproducibility
---------------
Reproducible 100%

System Configuration
--------------------
AIO-DX+N IPv6 lab wolfpass-8-12
Lab-name:

Branch/Pull Time/Commit
-----------------------
019-11-18_20-00-00

Last Pass
---------
Never tested new feature.

Timestamp/Logs
--------------
2019-11-20T16:10:47

Test Activity
-------------
Feature testing

Tags: stx.metal
Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Low / not gating, but would be nice to fix - issue related to handling an error condition (incorrect provisioning)

tags: added: stx.metal
Changed in starlingx:
importance: Undecided → Medium
importance: Medium → Low
status: New → Triaged
assignee: nobody → Eric MacDonald (rocksolidmtce)
Revision history for this message
Eric MacDonald (rocksolidmtce) wrote :

This is not a bug.

The hardware monitor uses whatever protocol mtcAgent tells it to.

The mtcAgent does not currently try to relearn the host's BMC protocol over a re-provisioning change, even the IP address change. I see argument to change that but that is not how it is currently implemented.

The mtcAgent will only try to relearn what BMC protocol to use, after its already been learned, if the BMC is de-provisioned and then provisioned ; not simply re-provisioned.

It was done this way for the following reasons

1. Want both Maintenance and Hardware Monitor to use the same protocol.
2. To enforce item 1 above, Maintenance tells the Hardware Monitor what
   protocol to use to create its sensor model and then monitor those sensors.
3. Don't want a situation where a sensor model is created with one protocol
   and then monitored with another. There is no guarantee that a BMC will
   publish the same sensor model exactly the same way using exactly the same
   sensor names and types between protocols.
   Could lead to false alarms or silent faults.

If Maintenance were to allow a BMC protocol relearn or reprovisioning and that resulted in a change in protocol then then the hardware monitor would have to delete its current sensor model and recreate it automatically ; which it could do but does not now.

A design choice was made not to do that.

If, architecturally, we want to relearn the BMC protocol over a BMC IP address change then we must accept that the BMC would automatically delete and recreate its sensor model over that change. Which is capability the hardware monitor already has but not utilized in this case.

Therefore, this report should be treated as an enhancement or design change request ; not a bug.

Revision history for this message
Eric MacDonald (rocksolidmtce) wrote :

Just to be clear, in this case the tester accidentally provisioned IP for BMC that only supported ipmitool. Then changed the IP to the correct server. However, mtcAgent had already learned the usable protocol for that original host was IPMI and the hardware monitor created a sensor model for it based on IPMI.

If the initial IP address was just wrong and mtcAgent never connected to a BMC and therefore never learned a protocol, once the IP was corrected then mtcAgent would learn the correct protocol for the re-provisioned IP address.

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
Eric MacDonald (rocksolidmtce) wrote :

Fix Release in the following update.

https://review.opendev.org/#/c/697309/

Changed in starlingx:
status: In Progress → Fix Released
Yang Liu (yliu12)
tags: added: stx.retestneeded
Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :

Verified in load 2020-01-13 00:14:42 .

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.