ha router schedule to dvr agent in compute node

Bug #1526175 reported by zhang sheng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Expired
Medium
Unassigned

Bug Description

I use my conpany's environment to test the neutron DVR router.

At first, the environment use 2 network node provide L3-ha-router, in neutron.conf :

l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2

then I change the neutron.conf to:

l3_ha = False
router_distributed = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2

and run dvr mode l3-agent on compute nodes, now the strange things happened, All ha-router
bind to this compute node.
If i create a new ha-router ,and use "neutron l3-agent-list-hosting-router" command to watch binding

root@controller:~# neutron l3-agent-list-hosting-router 73a5308f-dd1e-4c0e-8ccf-b9e4d2a82c5e
+--------------------------------------+----------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+----------+----------------+-------+----------+
| 0f3f65bd-9349-4f9a-af2c-7872a4fddd1f | network2 | True | :-) | standby |
| b174f741-3a41-45ed-bae0-e00ef4c1b1f9 | network1 | True | :-) | standby |
+--------------------------------------+----------+----------------+-------+----------+
root@controller:~# neutron l3-agent-list-hosting-router 73a5308f-dd1e-4c0e-8ccf-b9e4d2a82c5e
+--------------------------------------+----------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+----------+----------------+-------+----------+
| 95f0c274-95ec-44c4-a6e7-f7e6de4b6e25 | compute3 | True | :-) | standby |
| 0f3f65bd-9349-4f9a-af2c-7872a4fddd1f | network2 | True | :-) | active |
| b174f741-3a41-45ed-bae0-e00ef4c1b1f9 | network1 | True | :-) | standby |
+--------------------------------------+----------+----------------+-------+----------+

It will first bind to network1 and network2,then bind to compute3.
I guess the reason is when dvr mode l3-agent start sync_router , neutron bind the ha-router to compute3

Revision history for this message
Hong Hui Xiao (xiaohhui) wrote :

Are you sure you config dvr right? Because I can see that current code [1] has such check to prevent your problem. What is your agent_mode in compute node's l3_agent?

[1] https://github.com/openstack/neutron/blob/bcd383f38cded0ef87ec8f042031814ce362a5f0/neutron/db/l3_agentschedulers_db.py#L523

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/257849

Changed in neutron:
assignee: nobody → zhang sheng (langyxxl)
status: New → In Progress
Revision history for this message
zhang sheng (langyxxl) wrote :

@xiaohhui, I set compute node's l3_agent to dvr mode.

when compute node's l3_agent, it will run sync_routers RPC call. And current code [1] will cause ha router
bind to compute node's dvr mode l3_agent.

[1]https://github.com/openstack/neutron/blob/master/neutron/scheduler/l3_agent_scheduler.py#L153

I add some check before _schedule_ha_routers_to_additional_agent. Then in my company' environment,The
problem disappered.

Assaf Muller (amuller)
tags: added: kilo-backport-potential l3-dvr-backlog l3-ha liberty-backport-potential
Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

I think this is another bug that came up because DVR binding is explicitly made to compute nodes in the first place. I hope Oleg is successful in fixing this once and for all with his current work. We should never explicitly schedule anything to an l3 agent in 'dvr' mode.

Changed in neutron:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/265270

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/265498

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by shengzhang (<email address hidden>) on branch: master
Review: https://review.openstack.org/265498

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/265499

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Carl Baldwin (<email address hidden>) on branch: master
Review: https://review.openstack.org/265270
Reason: Let's stick with one patch: I769b79bc7e53219cca1a416313cf4d50c1fb1b13

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Carl Baldwin (<email address hidden>) on branch: master
Review: https://review.openstack.org/265499

Revision history for this message
John Schwarz (jschwarz) wrote :

In the open fix for this patch, I769b79bc7e53219cca1a416313cf4d50c1fb1b13, Zhang Sheng mentioned he stopped working on OpenStack. Since the patch has been left untouched for about 2 months I've assigned myself to the bug and will work on it in the coming days.

Changed in neutron:
assignee: zhang sheng (langyxxl) → John Schwarz (jschwarz)
Revision history for this message
Carl Baldwin (carl-baldwin) wrote :

Is this still an issue since Oleg has changed how binding works?

Changed in neutron:
status: In Progress → Incomplete
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/257849
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

If you are still working on this please resume, or allow someone else to pick this up.

Changed in neutron:
assignee: John Schwarz (jschwarz) → nobody
Revision history for this message
John Schwarz (jschwarz) wrote :

This does not reproduce. I've done the following on a 3 nodes setup (2 network, 1 compute):

1. created a new HA router, made sure it was configured to the 2 network nodes (dvr_snat).
2. changed the configuration as required, also moved the l3 agents to be dvr (and not dvr_snat) and restarted the processes
3. l3-agent-list-hosting-router doesn't show the router to be scheduled to different nodes.

This was attempted a HA router, a DVR router (checked the DB) and a HA+DVR router.

LGTM.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.