Comment 5 for bug 1513144

If I understand this correctly, this boils down to the fact that today, if something goes wrong on the host the agent (l2 or otherwise) runs on, we can only assume a failure by the lack of heartbeat. However, this is not to be confused: the lack of heartbeat is to be interpreted as control plane failure, rather than data plane failure.

I think that Carlos is saying that we really need more than what we have today. As of today, we can mark an agent admin status down and that takes the agent out of the scheduling fabric. The OVS agent is kinda peculiar, because Neutron is not in charge of scheduling, Nova is. If we want to take the host out of the system, nova-manage service down does the trick though. The host will not be used to schedule VM's anymore.

So the question IMO is two-fold:

1) Do we enhance the agent ([1] is the schema of the agent table) to report a host-related health status? Not necessary, but nice.
2) Do we provide the ability to disable an agent based on broken health status or any other decision? We already do have that with the admin-status-up flag. However that doesn't really work for OVS because, because the scheduling is really beyond what Neutron does today.

I don't believe that 1) is really required, but it would be nice to clean up 2), and for that we could that in the of a simple bug report. Ultimately we'd make system honor the ADMIN_STATUS_UP=False for OVS, but then again, today when we create a port nothing happens and the binding is initiated by the host only at a later time (once the VM is scheduled to the host).

[1] http://paste.openstack.org/show/478474/