[RFE] Decomposing out-of-band inspection

Bug #2049913 reported by Dmitry Tantsur
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Triaged
Wishlist
Unassigned

Bug Description

Even when the notion of *inspection* was introduced around 2014, it was clear
that it will take two very different shapes: invasive in-band inspection, using
a ramdisk and an extended set of Linux tools, as well as a faster but
vendor-specific out-of-band inspection. Implicit in this division was the
assumption that an operator chooses one or the other depending on what their
hardware is capable of.

The reality proved more complex than that. First, with the Nova's support
switching to resource classes and traits, the ``vcpus``, ``memory_mb`` and
``local_gb`` properties lost their importance (``vcpus`` being completely
unused). Second, the in-band inspection quickly grew a set of advanced features
that out-of-band inspection may not able to provide: LLDP discovery or
benchmarking, to name a few. Finally, the in-band inspection started depending
on out-of-band one in a certain sense. Since virtual media inspection cannot
use the "unmanaged" mode (it relies on (i)PXE), the current implementation
calls into the driver's management interface to pre-populate the port list
before the ramdisk is booted. Essentially, the current in-band inspection is
a hybrid implementation.

This proposal was prompted by a (yet another) multi-arch issue. Apparently,
the ``linuxefi`` Grub command we use in our virtual media and UEFI PXE
templates is an Intel-ism, introduced by distributions to support UEFI secure
boot. Grub on ARM64 does not have, causing virtual media to fail. Now that HPE
Gen 11 has an ARM64 option, this becomes a pressing problem to solve. However,
the ``cpu_arch`` property is optional, and so is inspection.

The idea of this proposal is to make most of out-of-band inspection automatic
instead of triggered by an operator. This is actually not a new thought. We
have a whole bunch of properties and node fields that are auto-populated:
vendor, boot mode and secure boot state. Setting these fields is, in fact,
out-of-band inspection, we've just never admitted that. This proposal picks
apart the out-of-band inspection into 3 parts: managing ports, updating
properties, and refreshing information.

Automated port management
-------------------------

Two new *verify steps* will be responsible for ports:

- ``populate_ports`` (enabled by default) creates ports if none exist yet
  (see ``inspect_utils.create_ports_if_not_exist``).

- ``validate_ports`` (disabled by default) makes sure that at least one port
  exists and that all ports match the list that the BMC reports (if available).

Generic properties discovery
----------------------------

Replace the specific ``ManagementInterface.detect_vendor`` with a more generic
``ManagementInterface.detect_properties``, which will also include ``cpu_arch``
and ``memory_mb`` if the BMC reports that. This call will be invoked whenever
right now we discover vendor, boot mode and secure boot state. Just as with
vendor, ``cpu_arch`` will also not be overridden if it's set by an operator.

I'm not including ``local_gb`` since its value depends on root device hints
and honestly we should stop using it because of that.

I'm not including ``vcpus`` either because Ironic has no usage for it.

Future of out-of-band inspection
--------------------------------

This change leaves out-of-band inspection as the way to refresh the information
or discover additional capabilities.

Tags: needs-spec rfe
Dmitry Tantsur (divius)
Changed in ironic:
importance: Undecided → Wishlist
tags: added: needs-spec rfe
Dmitry Tantsur (divius)
Changed in ironic:
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.