neutron router shows active on a dead agent

Bug #1682145 reported by ymadhavi@in.ibm.com on 2017-04-12
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Medium
Brian Haley

Bug Description

When a router is active on only one network node and if the network node goes down by any reason, router still shows active status in controller

neutron l3-agent-list-hosting-router e5bae5bd-40ae-45b2-837d-9d00a74a1e1b
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+--------------------------------------+--------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+--------------+----------------+-------+----------+ |
| db94053c-f7a7-4bf8-a7ed-aa01f7c8ef34 | netowkr-node1| True | xxx | active |
+--------------------------------------+--------------+----------------+-------+----------+

Logic at https://github.com/openstack/neutron/blob/master/neutron/db/l3_hamode_db.py#L578, checks to update status based on dead agents, needs to executed in this scenario too.

Trevor McCasland (twm2016) wrote :

Versioned github link from bug description: https://github.com/openstack/neutron/blob/7bd521e7ce1c4ffe5b65aad12f3a1c1394c55473/neutron/db/l3_hamode_db.py#L578

Adding a case for the scenario described in the bug description will fix the issue but is it HA if you only have one node? "an HA cluster is a two-node cluster, since that is the minimum required to provide redundancy" https://en.wikipedia.org/wiki/High-availability_cluster

I think the bug can be reworded to exit HA mode if only one network node is detected, unless you have a use case for this?

Changed in neutron:
status: New → Opinion
tags: added: l3-ha
ymadhavi@in.ibm.com (ymadhavi) wrote :

Following is the scenario/use case which we are trying

1. networknode1, networkde2 exists in the environment.
2. HA router created and router is active on networknode1 and standby on networknode2
3. networknode1 goes down due to some reason, now router is active on networknode2 and standby on networknode1
4. networknode2 also goes down due to some reason, l3 agent is dead.

But router still shows active on networknode2.

Changed in neutron:
status: Opinion → New
Ann Taraday (akamyshnikova) wrote :

The original bug about this is https://bugs.launchpad.net/neutron/+bug/1461148
Actually at first we were setting standby for all dead agents but this caused us another bug https://bugs.launchpad.net/neutron/+bug/1648242, so I have to change logic here https://github.com/openstack/neutron/commit/1927da1bc7c4e56162dd3704d58d3b922d4ebce9.

Fix proposed to branch: master
Review: https://review.openstack.org/458814

Changed in neutron:
assignee: nobody → Drew Thorstensen (thorst)
status: New → In Progress
Ann Taraday (akamyshnikova) wrote :

In my opinion, this bug should be marked as an known issue with proper description in docs.

Drew Thorstensen (thorst) wrote :

Ann - why do you say that? It does not seem to be functionally correct from my perspective.

If a L3 agent is down, the router is still active. It has failed over. The router is not in standby, it just failed over.

If we had a more granular state of 'degraded', I could see that being useful. But that seems more pervasive.

Ann Taraday (akamyshnikova) wrote :

Pay attention to the links that I put with my first comment, I don't want us to go through the same cycle.

Changed in neutron:
assignee: Drew Thorstensen (thorst) → ymadhavi@in.ibm.com (ymadhavi)
Changed in neutron:
assignee: ymadhavi@in.ibm.com (ymadhavi) → Matthew Edmonds (edmondsw)
Changed in neutron:
assignee: Matthew Edmonds (edmondsw) → Brian Haley (brian-haley)
Changed in neutron:
assignee: Brian Haley (brian-haley) → Ann Taraday (akamyshnikova)
zhaobo (zhaobo6) on 2018-03-19
Changed in neutron:
importance: Undecided → Medium
Changed in neutron:
assignee: Ann Taraday (akamyshnikova) → Brian Haley (brian-haley)

Reviewed: https://review.openstack.org/458814
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b62d1bfdf71c2f8810d9b143d50127b8f3a4942d
Submitter: Zuul
Branch: master

commit b62d1bfdf71c2f8810d9b143d50127b8f3a4942d
Author: Drew Thorstensen <email address hidden>
Date: Fri Apr 21 08:02:17 2017 -0400

    Router should flip to standby if all L3 nodes down

    A HA router should always be active unless all of the agents hosting
    that router go down. In that event, the router should switch to
    standby. This behavior changed with review:
      https://review.openstack.org/#/c/411784

    That review seemed to be accounting for a flakey message bus. This
    change should account for that, but also revert to the original behavior
    of the router state only changing when its backing agent hosts are down.

    Change-Id: I89c3b2546382624f175f8de4de621c3e53adf527
    Closes-Bug: 1682145

Changed in neutron:
status: In Progress → Fix Released

This issue was fixed in the openstack/neutron 13.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers