Missing static routes after neutron-l3-agent restart

Bug #1930096 reported by Carlos Augusto da Silva Martins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Invalid
High
Unassigned

Bug Description

Hi guys.
Has a bug on neutron-l3-agent, in HA.

The routers are recreated without static routes defined previously on Horizon GUI.

This occurs after neutron-l3-agent is restarted.

My branch is Stein.

Tags: l3-ha
affects: networking-ovn → neutron
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Can You provide us full log from the neutron-l3-agent? Are there any errors there?
Or maybe in the neutron-server log?

Also, can You try to reproduce same issue in the master branch? Or is it happening only in Stein?

And last question, can You exactly describe how to reproduce that issue? Is something like:

1. Create HA router,
2. Add some static routes to the router,
3. restart L3 agent on node where router is active

enough to reproduce that issue?

tags: added: l3-ha
removed: critical miss neutron stein
Revision history for this message
Carlos Augusto da Silva Martins (carlos.martins) wrote :

Hi, some informations to help:

The logs from neutron-l3-agent are clear of errors or warning;

I can't test it on branch master.

Reprodution this issue:

1. Create HA router.
2. Add some static routes to the router.
3. Restart L3 agent on node where router is active.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Ok, thx.

I was able to check and reproduce that issue on the master branch. It seems for me that the problem is with the fact that interface is in DOWN state and keepalived wants to install route and gets error that address is unreachable.

Changed in neutron:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

When the router instance has the GW port down, no route related to this port is written in the routing table of the namespace. This is not a problem while the router is in "backup" mode. The problem is when the router becomes "active" and keepalived tries to write the routes.

(I'm deploying an environment to check the following statement) -->
Since [1], we set the GW port to DOWN when the router instance is "backup" and set it to UP when "active". The problem is that we raise the port when keepalived has declared this router instance as "active". That is too late and keepalived was not able to set those routes: http://paste.openstack.org/show/806281/

Once we set the GW port to UP, we should also force keepalived to refresh the config (send a SIGHUP signal to the running process).

I'll update the bug once I've confirmed that.

Regards.

[1]https://review.opendev.org/c/openstack/neutron/+/707406

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

I was was half wrong with the previous statement. When a router instance goes to "active", the keepalived process tries to set the routes in the interface but it cannot, as commented before, because the interface is DOWN an the GW IP is not reachable.

But in [1], when the interface becomes active, the method in charge of the GW port status updates the router namespace router, adding those routes that keepalived couldn't before.

This is the logs timeline: http://paste.openstack.org/show/806306/

I think we should close this bug.

Regards.

[1]https://github.com/openstack/neutron/blob/0bdf3b56e0d4ede2d46eed09a4bb07dd3c00807d/neutron/agent/l3/ha_router.py#L556

Revision history for this message
Brian Haley (brian-haley) wrote :

As in last comment, the l3-agent with HA should eventually add the route when the interface comes back up, so let's close this bug.

Changed in neutron:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.