[RFE] Power Load Shedding Capabilities

Bug #2046803 reported by Julia Kreger
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Triaged
Wishlist
Unassigned

Bug Description

Proposing an overall bucket of ideas, which may be able to be distinctly surfaced.

I recently drove through the Mojave Desert in California, and I was reminded just how many cell towers, and related infrastructure has no "grid" connection. They are entirely powered by solar arrays, with small emergency backup generators. These locations are exceptionally remote, meaning to re-fuel backup them is also exceptionally costly, or even risky. One location I noticed seemed more like a "goat path" to access instead of any sort of road.

And it occurred to me that it would be reasonable for a monitoring system or some process to declare an entire physical site, or in Ironic parlance "conductor_group" as needing to reduce power consumption, or begin shedding load entirely.

For example:

- We just need to reduce load by some incremental percentage to "make it until morning". This could be tuning, or outright shutdown of non-critical systems.
- It could also be that the outside air temperature is sufficient that we "need to shed secondary loads" to reduce heat generation, so air conditioning equipment can keep up. It should also be noted, that in increased temperatures, backup generators have less "available" capacity to generate.

These cases are not just unique to desert conditions, but isolated communications relays and cellular network stations tend to be setup with some amount of infrastructure to meet their unique. At least, the unique needs when the site was setup. These sites also evolve as time moves onward.

In both these cases, today a human would likely identify the situation, and begin to execute upon a "play book" and trigger some set of semi-automated actions. For example, they may issue a number of "power off" commands to Ironic. But doing so manually also increases risk, and it seems reasonable to be able to perform some set of actions "across a conductor group" with a filter. It could ultimately take the shape of "execute this step on all dell hardware", and "execute this step on all hpe hardware".

This could take various shapes as well, "only needing to take a tuning action" is very different from "I've got only an hour, or less of battery power. Or, even worse, I'm low on battery power, and I'm out of fuel in the backup generator.

Areas to research:
- Ways to signal to a machine it is going to be shutdown very soon (Think preemptable instance, typically via using NMI.
- Multi-conductor task execution across an entire conductor group.
- Step filtering, i.e. "execute this step if this condition applies"
- See if vendors have any "magic" here which can be brought to yield via standard interfaces such as Redfish and see if it could adapted.

Things to likely do:
- Ensure we can call a NMI as a step easily. Note: I'm thinking of NMI as the "your going to be shutdown very soon" hint, similar to the preemptable instance work which was originally discussed in nova. It seems reasonable and generally applicable if OS does not halt when triggered.

What I'm not proposing:

I'm not proposing ironic be the piece of software to make the decision something needs to happen. It would be awesome to enable some sort of lightweight trigger interface so that a monitoring system with a highly specific token and rights can say "hey, take this action" and "return to normal".

A further idea which needs research, is there any BMC action we can take to request the machine to lower power consumption.

Tags: rfe
Dmitry Tantsur (divius)
Changed in ironic:
importance: Undecided → Wishlist
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.