Rack controller cannot access the BMC, even if accessible on the OS level on HMC Z

Bug #2070445 reported by Frank Heimes
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Status tracked in 3.6
3.6
Invalid
Medium
Unassigned
Ubuntu on IBM z Systems
New
Undecided
Skipper Bug Screeners

Bug Description

I'm at the beginning of setting up MAAS v3.4 on s390x, using 'Power type' "IBM Hardware Management Console (HMC) for Z" (the BMC is called HMC here).

I'm pretty sure that the data that was specified for the Power Type is correct, since on the OS level I can netcat the HMC via it's specific port:

$ nc -vz <hmc IP> 6794
Connection to <hmc IP> 6794 port [tcp/*] succeeded!

( however, ICMP is disabled, hence ping is not possible, hope that this is not needed/used:
$ sudo ping -c 3 <hmc IP>
PING <hmc IP> (<hmc IP>) 56(84) bytes of data.

--- 10.103.16.10 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2073ms )

One thing is special on s390x, the Power driver does not directly connect to the HMC (BMC), but uses the intermediate 'zhmcclient' (Python API).

I have a brief Python script that I can run outside maas to check the accessibility to the HMC via the zhmcclient and it works:
$ cat test.py
#!/usr/bin/env python3

import zhmcclient
import requests.packages.urllib3
requests.packages.urllib3.disable_warnings()

# Set these variables for your environment:
host = "<hmc IP>"
userid = "<userid>"
password = "password"
verify_cert = False

session = zhmcclient.Session(host, userid, password, verify_cert=verify_cert)
client = zhmcclient.Client(session)
console = client.consoles.console

partitions = console.list_permitted_partitions()

for part in partitions:
    cpc = part.manager.parent
    print("{} {}".format(cpc.name, part.name))
$ ./test.py
P00711B8 MAAS-RRC-01
P00711B8 MAAS-RRC-01-G01

I'm obviously using the same credentials - in the Power driver and my test script.

(I just have "verify_cert = False" in my script, and changed /usr/lib/python3/dist-packages/zhmcclient/_session.py to this too.)

(Note that the HMC has an IP addresses that is outside of the network the MAAS server is in, but I think that this is often the case and shouldn't matter - since connectivity is given on OS level.)

Nevertheless whenever I try to do something with a system that I've manually enlisted (via Add Hardware --> Machine) I get such error messages:

in the UI:

- on commissioning:
  "Error:
   No rack controllers can access the BMC of node MAASH1G1"
- on Power cycle, Power on:
  "Error:
   Action failed for 1 machine: MAASH1G1"
   (probably because it is on, but the Power status is shown as unknown)
- on Power cycle, Power off:
  "Error:
   No rack controllers can access the BMC of node MAASH1G1"

(I could provide screenshots of these error messages in the UI, but I doubt that they provide further details.)

(Btw. the system that I've manually added, an LPAR, has a normal (long) name [MAAS-RRC-01-G01] and a short name [MAASH1G1]. I wasn't sure which one to take, so I tried both cases, but no luck.
Hence one will find both in the logs ...)

Further details:

Version and build:
$ apt list maas
Listing... Done
maas/jammy,now 1:3.4.2-14353-g.5a5221d57-0ubuntu1~22.04.1 all [installed]
$ dpkg -l | grep maas
ii maas 1:3.4.2-14353-g.5a5221d57-0ubuntu1~22.04.1 all "Metal as a Service" is a physical cloud and IPAM
ii maas-cli 1:3.4.2-14353-g.5a5221d57-0ubuntu1~22.04.1 all MAAS client and command-line interface
ii maas-common 1:3.4.2-14353-g.5a5221d57-0ubuntu1~22.04.1 all MAAS server common files
ii maas-dhcp 1:3.4.2-14353-g.5a5221d57-0ubuntu1~22.04.1 all MAAS DHCP server
ii maas-netmon 1:3.4.2-14353-g.5a5221d57-0ubuntu1~22.04.1 s390x MAAS Network Monitor
ii maas-proxy 1:3.4.2-14353-g.5a5221d57-0ubuntu1~22.04.1 all MAAS Caching Proxy
ii maas-rack-controller 1:3.4.2-14353-g.5a5221d57-0ubuntu1~22.04.1 all Rack Controller for MAAS
ii maas-region-api 1:3.4.2-14353-g.5a5221d57-0ubuntu1~22.04.1 all Region controller API service for MAAS
ii maas-region-controller 1:3.4.2-14353-g.5a5221d57-0ubuntu1~22.04.1 all Region Controller for MAAS
ii python3-django-maas 1:3.4.2-14353-g.5a5221d57-0ubuntu1~22.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 1:3.4.2-14353-g.5a5221d57-0ubuntu1~22.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 1:3.4.2-14353-g.5a5221d57-0ubuntu1~22.04.1 all MAAS server provisioning libraries (Python 3)

I've attached the relevant log files here.

I did quite some investigations on this, reading in LP, discourse and other sources, and I found several cases where the message "No rack controllers can access the BMC" was mentioned, but none of them seem to be close to my case.

I don't hope that this is not a misleading message, thought about it when I came across this: https://discourse.maas.io/t/better-error-message-for-out-of-order-operations-in-machine-lifecycle/6504/2

(Btw. I can make this system accessible for Canonical engineers for further analysis ...)

Tags: s390x maas
Revision history for this message
Frank Heimes (fheimes) wrote :
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Frank Heimes (fheimes)
summary: - Rack controller can access the BMC, even if accessible on the OS level
+ Rack controller cannot access the BMC, even if accessible on the OS
+ level
Revision history for this message
Frank Heimes (fheimes) wrote : Re: Rack controller cannot access the BMC, even if accessible on the OS level

These are the only relevant lines in the logs that I could find:

/var/log/maas/maas.log

2024-06-26T05:49:39.966121+00:00 maas-on-z maas.rpc.rackcontrollers: message repeated 12 times: [ [info] Existing rack controller 'maas-on-z' running version 3.4.2-14353-g.5a5221d57 has connected to region 'maas-on-z'.]
2024-06-26T05:51:12.227011+00:00 maas-on-z maas.node: [info] MAASH1G1: Status transition from NEW to COMMISSIONING
2024-06-26T05:51:48.160046+00:00 maas-on-z maas.node: [warn] MAASH1G1: Could not change the power state. No rack controllers can access the BMC.
2024-06-26T05:51:48.160447+00:00 maas-on-z maas.node: [info] MAASH1G1: Aborting COMMISSIONING and reverted to NEW. Unable to power control the node. Please check power credentials.
2024-06-26T05:51:48.162483+00:00 maas-on-z maas.node: [info] MAASH1G1: Status transition from COMMISSIONING to NEW
2024-06-26T05:51:48.170507+00:00 maas-on-z maas.node: [error] MAASH1G1: Could not start node for commissioning: No rack controllers can access the BMC of node MAASH1G1

Revision history for this message
Frank Heimes (fheimes) wrote :

I stumbled over this (pretty old) thread, from Mike Pontillo:
https://lists.ubuntu.com/archives/maas-devel/2017-February/002395.html
https://lists.ubuntu.com/archives/maas-devel/2017-February/thread.html#2395

So the HMC/BMC in this case is also NOT directly connected to a subnet configured on the MAAS server, it is only available via routing.
$ ip route get <hmc IP>
<hmc IP> via <fabric0>.1 dev eth0 src <MAAS RRC IP> uid 1000
    cache

I've tried to get that reflected in the MAAS fabrics, but tbh I'm unure how and if it's really needed.

And btw. there is just one MAAS server (region and rack controller on the same server).

Some more details on the network setup:

MAAS itself is installed in a container that runs on a s390x LPAR.
The LPAR has two interfaces:
- one to reach the LPAR directly
- and a second one that acts as bridge, used by LXD and where the container is attached to

LPAR and container are in the same /24 network that has reserved and dynamic ranges for MAAS usage.
(Like mentioned before, the HMC/BMC is not part of that nw, but available via (phys. switch) routing.)

Bill Wear (billwear)
Changed in maas:
status: New → Triaged
importance: Undecided → Low
importance: Low → Medium
milestone: none → 3.5.x
tags: added: bug-council
Changed in maas:
assignee: nobody → Anton Troyanov (troyanov)
Frank Heimes (fheimes)
summary: - Rack controller cannot access the BMC, even if accessible on the OS
- level
+ [HMC Z] Rack controller cannot access the BMC, even if accessible on the
+ OS level
summary: - [HMC Z] Rack controller cannot access the BMC, even if accessible on the
- OS level
+ Rack controller cannot access the BMC, even if accessible on the OS
+ level on HMC Z
Revision history for this message
Anton Troyanov (troyanov) wrote :

It seems that MAAS-RRC-01-G01 was added manually to MAAS. Without proper network configuration MAAS cannot determine connectivity to the BMC.

In order to fix the issue you need to go to the Network tab, select eth0 and then click Actions/Edit Physical and select subnet managed by MAAS there.

Credit to @skatsaounis for finding this

Revision history for this message
Frank Heimes (fheimes) wrote :

The hint in comment #4 worked for me, so thx!
The check box of the network adapter was initially grayed out for me, but with starting the commissioning and aborting quickly it (before it fails), it became editable (thx to skatsaounis).

Just winding back to the beginning:
The main reason why I've added the machine manually was that after having added the power driver was that I couldn't commission not power cycle, got "No rack controllers can access".

So my gut feeling tells me that it would be best to have that an selection option for "subnet managed by MAAS" already in the "Add hardware" --> "Machine" dialog, no?

Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

The issue was caused by incorrect configuration of the subnet of the BMC interface when manually adding the LPAR to MAAS.

Regarding the feature request to specify the subnet on BMC interface when manually adding hardware - please add it to the Product Feedback board so it can be evaluated further.

no longer affects: maas/3.5
tags: removed: bug-council
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.