Ironic

Bug #1304673
Comment #8

Comment 8 for bug 1304673

Revision history for this message

Julia Kreger (juliaashleykreger) wrote on 2024-05-31:

So this is still an outstanding item, and stalled on the question of "how to know when to resume".

So to start off, where are we:

1) We can get events! https://github.com/openstack/ironic/blob/master/ironic/api/controllers/v1/event.py#L107
2) People even already configure it! But when you look at the link above you feel sad.
3) We *still* have a sleep in place deep inside of the ironic's networking code.
https://github.com/openstack/ironic/blob/268b28f52782d20cd3f7bf27ead36438695b786a/ironic/dhcp/neutron.py#L160

I thought there was another sleep someplace, but we'll have to look for it.

Anyway, the issue is neutron sometimes takes time to complete binding, and we should wait some period of time for a callback from neutron.

4) So, ideally, what we could do is setup a database table to append the events to, with a periodic to delete any events older than say 15 minutes. We have some prior art here with the node history table.

5) We can then swap the sleep code around to look for an event in the new events table, and then unblock the flow once the event has been observed.

This would require an RPC change, to allow the inbound event to be submitted over RPC, as conductors are the database writers.

And the other conductor thread would just poll the events to determine what is required.

Event would remain in the table until the periodic purges it.

The sleep(s) for configuration would then be updated to be the upper bounds "how long to wait" for network configuration to complete.