Comment 2 for bug 1427923

Revision history for this message
Dmitry Tantsur (divius) wrote :

Let me but in my thoughts about. I assume that the problem is not in API taking long to respond. There are two actualy problems:
1. API takes *undefined* amount of time to respond in case of BMC lock up
2. We block our messaging layer while we're waiting for the call to finish by using a sync AMQP call

So I don't think that a solution is just to make API asynchronous. It's a breaking change and one more step in making our errors hard to track. We already to this with power state, but at least power state in an inherently long procedure. Setting boot device should not - unless it breaks or hangs. If you just make API sync, you'll break inspector. Let me show. Inspector currently does roughly this:

 set_boot_device(uuid, 'pxe')
 set_power_state(uuid, 'reboot')

If we allow the former to go async and silently fail, we need to make the latter to fail as well. Otherwise inspector will report success, but actually node won't even try inspection (e.g. if it was set to local boot).

So what about making this call async on the conductor level, but sync on API level? I.e. API waiting for some kind of notification from conductor for some (probably settable timeout)?