[RFE] Add new state "ERROR" for HA routers

Bug #1824856 reported by Slawek Kaplonski
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Confirmed
Wishlist
Unassigned

Bug Description

Bug originally found by Jakub Libosvar and Assaf Muller:

In the case where a router replica transitions from standby to active (but also in other cases), it might happen that the keepalived-state-change-monitor encounters an error (for example in this case as a result of a permissions issue in /var/lib/neutron), but generally speaking under any error condition, we thought that keepalived-state-change-monitor should update the L3 agent that an error has occurred. Then the L3 agent would put that router replica in 'ERROR' state and update neutron-server, which would update the DB and API responses. This would allow the operator to know that an error happened for that particular router replica and that they should investigate. Bonus points if we also have keepalived-state-change-monitor send the actual error message to the agent. We'd then update the RPC format between the agent and the server and add a DB field like 'error_message' which we could display to the operator.

I'm proposing it as RFE because it would add new state of router on L3 agent and that is user visible change.

Tags: rfe-approved
Miguel Lavalle (minsel)
tags: removed: l3-dvr-backlog
Changed in neutron:
importance: Low → Wishlist
Revision history for this message
Miguel Lavalle (minsel) wrote :

When you say state, do you mean status?

And the constant already exists: https://opendev.org/openstack/neutron-lib/src/branch/master/neutron_lib/constants.py#L402. Although it seems nobody uses it: http://codesearch.openstack.org/?q=ROUTER_STATUS_ERROR&i=nope&files=&repos=

In any case, it seems sensible to do this. Let's discuss it in the drivers meeting

tags: added: rfe-triaged
removed: rfe
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Yes, I mean status :)

Revision history for this message
Miguel Lavalle (minsel) wrote :

This RFE was approved by the drivers team

tags: added: rfe-appr
removed: rfe-triaged
tags: added: rfe-approved
removed: rfe-appr
Changed in neutron:
assignee: nobody → Slawek Kaplonski (slaweq)
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Unfortunately I will not have time to work on this but if anyone wants to work on it, feel free to take it. You can also reach out to me if You would need any more info about it.

Changed in neutron:
assignee: Slawek Kaplonski (slaweq) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.