excessive number of dvrs where vm got a fixed ip on floating network

Bug #1840579 reported by norman shen
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
norman shen

Bug Description

we are running into an unexpected situation where number of dvr routers is increasing to nearly 2000 on a compute node on which some instances got a nic on floating ip network.

We are using Queens release,

neutron-common/xenial,now 2:12.0.5-5~u16.04+mcp155 all [installed,automatic]
neutron-l3-agent/xenial,now 2:12.0.5-5~u16.04+mcp155 all [installed]
neutron-metadata-agent/xenial,now 2:12.0.5-5~u16.04+mcp155 all [installed,automatic]
neutron-openvswitch-agent/xenial,now 2:12.0.5-5~u16.04+mcp155 all [installed]
python-neutron/xenial,now 2:12.0.5-5~u16.04+mcp155 all [installed,automatic]
python-neutron-fwaas/xenial,xenial,now 2:12.0.1-1.0~u16.04+mcp6 all [installed,automatic]
python-neutron-lib/xenial,xenial,now 1.13.0-1.0~u16.04+mcp9 all [installed,automatic]
python-neutronclient/xenial,xenial,now 1:6.7.0-1.0~u16.04+mcp17 all [installed,automatic]

Currently, my guess is that some applications mistakenly invokes rpc calls like this https://github.com/openstack/neutron/blob/490471ebd3ac56d0cee164b9c1c1211687e49437/neutron/api/rpc/agentnotifiers/l3_rpc_agent_api.py#L166 with dvr associated with a floating ip address on a host which has fixed ip address allocated from floating network (aka device_owner prefix with compute:). Then such router will be kept by this https://github.com/openstack/neutron/blob/490471ebd3ac56d0cee164b9c1c1211687e49437/neutron/db/l3_dvrscheduler_db.py#L427 function, because `get_subnet_ids_on_router` does not filter out router:gateway ports.

I think this is a bug because as long as we do not have ports with specific device owners we should not have a dvr router on it.

besides it is pretty easy to replay this bug.

First create a dvr router with an external gateway on floating network
Then create on virtual machine with fixed ip on floating network
Then call `routers_updated_on_host` manually, then this dvr will be created on the host where vm resides on, but actually it should be there.

norman shen (jshen28)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/677092

Changed in neutron:
assignee: nobody → norman shen (jshen28)
status: New → In Progress
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

DVR routers will be created on host, where the Service port is bound.
If the VM that you are creating is bound to host A. then the DVR routers will be created on that host.

So you are creating a VM in a floatingIP Network and not on the fixed IP network. So you got an IP for a VM within the same range as the floatingIP.

That is fine, irrespective of the floatingIP, you VM has an IP now and since that VM is bound to a host, the DVR router is supposed to be provisioned on that host.

I don't understand this part of your comment. 'Then call `routers_updated_on_host` manually, then this dvr will be created on the host where vm resides on, but actually it should be there.'

As far as your device_owner is 'compute:none', or 'dhcp' or 'lbaas' you should see a router pop up.

Revision history for this message
norman shen (jshen28) wrote :

I do not understand. floating ip network does not associate with any router directly, it is just used as external gateway. So i personally do not believe it's necessary to have a dvr router with it.

Revision history for this message
norman shen (jshen28) wrote :

for `routers_updated_on_host`, again please take look at https://github.com/openstack/neutron/blob/490471ebd3ac56d0cee164b9c1c1211687e49437/neutron/db/l3_dvrscheduler_db.py#L174 this method, this method will return every port on the router, even router:gateway port will be returned, and I do no think it is necessary to check router:gateway port.

Revision history for this message
norman shen (jshen28) wrote :

for example, let's define `floating` as an floating ip network
let's assume that dvr `router1` has an external gateway using `floating`
and all of the servers use this router running on compute01, let's create a server using `floating` on compute02

for instance,

openstack server create --nic net-id=floating --availability-zone :compute02

until now this compute02 does not has a dvr router called `router1` but if I manually call

neutron.api.rpc.agentnotifiers.l3_rpc_agent_api.L3AgentNotifyAPI.routers_updated_on_host(context, ['router1'], 'comptue02')
I can see `router1` created on this host and I believe this is not necessary.

The root cause is that method https://github.com/openstack/neutron/blob/490471ebd3ac56d0cee164b9c1c1211687e49437/neutron/db/l3_dvrscheduler_db.py#L174 will return gateway's subnet id, if one looks at https://github.com/openstack/neutron/blob/490471ebd3ac56d0cee164b9c1c1211687e49437/neutron/db/l3_dvrscheduler_db.py#L426 this code, you can see `get_subnet_ids_on_router` will be called if router_ids is not in result_set, and then all the routers using `floating` as gateway will be qualified to be created on compute02 but it does not make sense.

tags: added: l3-dvr-backlog
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Norman:

First of all and not related directly to this bug, let me say that we are maybe mixing concepts here, between FIPs and DVR. Take the FIP part out from this bug.

If you have a network attached to a DVR router and you create a VM with a port in this network, a DVR router will be created in this host. That's what DVR is going to do: to distribute the router load between the compute nodes, creating instances of this router in the servers with ports associated to the router networks. This will distribute the routing load between servers.

If you want to have a centralized routing architecture, do not use DVR.

IMO, this bug is not valid.

Regards.

PS: https://assafmuller.com/category/dvr/

Revision history for this message
norman shen (jshen28) wrote :

Hi,

I totally disagree with your point. Again I do believe Dvr is only necessary when router's got an interface on the router it should have nothing to do with what network its gateway uses. In this scenario, I use fip as my instance's fixed ip (and this instance does not associate with a floating ip) and fip subnet does not have interface attached to router, so it should not have a dvr.

even if I step back and admit your point is valid, can you please tell me why this router is necessary? this dvr router does not even have a qr-xx port on fip subnet...

I do hope you please look at the extra test cases I added thanks.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

IIUC what Norman is saying, I think he is right here.
So let me explain how I understand this issue:

1. There is networks called e.g. "public",
2. There is dvr router R1 which has "public" network set as external gateway and some "private" networks plugged to it. So VMs connected to "private" networks can have FIP from "public" associated,
3. Now on some compute node new VM is spawned and it is plugged directly to the "public" network - here IIUC dvr router R1 is created on compute node with new VM. But it don't need to be there as this vm is no connected to R1 at all.

@Norman, is my understanding of the issue correct?

Revision history for this message
norman shen (jshen28) wrote :

Thank you sir, that's exactly what I want to say ....

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.opendev.org/677092
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/677092
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=480b04ce04f98bae6ff4ab13cec9e01b34204134
Submitter: Zuul
Branch: master

commit 480b04ce04f98bae6ff4ab13cec9e01b34204134
Author: ushen <email address hidden>
Date: Sun Aug 18 21:54:04 2019 +0800

    Unnecessary routers should not be created

    We observe an excessive amount of routers created on
    compute node on which some virtual machines got a fixed
    ip on floating network.

    Rpc servers should filter out those unnecessary routers
    during syncing.

    Change-Id: I299031a505f05cd0469e2476b867b9dbca59c5bf
    Partial-Bug: #1840579

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/706641

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/train)

Reviewed: https://review.opendev.org/706641
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3ddba37ccc134a1336802fe3693e439de6c33c34
Submitter: Zuul
Branch: stable/train

commit 3ddba37ccc134a1336802fe3693e439de6c33c34
Author: ushen <email address hidden>
Date: Sun Aug 18 21:54:04 2019 +0800

    Unnecessary routers should not be created

    We observe an excessive amount of routers created on
    compute node on which some virtual machines got a fixed
    ip on floating network.

    Rpc servers should filter out those unnecessary routers
    during syncing.

    Change-Id: I299031a505f05cd0469e2476b867b9dbca59c5bf
    Partial-Bug: #1840579
    (cherry picked from commit 480b04ce04f98bae6ff4ab13cec9e01b34204134)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/712208

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/712209

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/712208
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=5a28141fc86f6881f9520a6b06abfcff0a5a2e9f
Submitter: Zuul
Branch: stable/stein

commit 5a28141fc86f6881f9520a6b06abfcff0a5a2e9f
Author: ushen <email address hidden>
Date: Sun Aug 18 21:54:04 2019 +0800

    Unnecessary routers should not be created

    We observe an excessive amount of routers created on
    compute node on which some virtual machines got a fixed
    ip on floating network.

    Rpc servers should filter out those unnecessary routers
    during syncing.

    Change-Id: I299031a505f05cd0469e2476b867b9dbca59c5bf
    Partial-Bug: #1840579
    (cherry picked from commit 480b04ce04f98bae6ff4ab13cec9e01b34204134)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/731763

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/712209
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f6401020f7ba406623d49fb43671cb359aca2372
Submitter: Zuul
Branch: stable/rocky

commit f6401020f7ba406623d49fb43671cb359aca2372
Author: ushen <email address hidden>
Date: Sun Aug 18 21:54:04 2019 +0800

    Unnecessary routers should not be created

    We observe an excessive amount of routers created on
    compute node on which some virtual machines got a fixed
    ip on floating network.

    Rpc servers should filter out those unnecessary routers
    during syncing.

    Change-Id: I299031a505f05cd0469e2476b867b9dbca59c5bf
    Partial-Bug: #1840579
    (cherry picked from commit 480b04ce04f98bae6ff4ab13cec9e01b34204134)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/731763
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7b36eefd1d63c24d25ae6387a88b00600730447e
Submitter: Zuul
Branch: stable/queens

commit 7b36eefd1d63c24d25ae6387a88b00600730447e
Author: ushen <email address hidden>
Date: Sun Aug 18 21:54:04 2019 +0800

    Unnecessary routers should not be created

    We observe an excessive amount of routers created on
    compute node on which some virtual machines got a fixed
    ip on floating network.

    Rpc servers should filter out those unnecessary routers
    during syncing.

    Conflicts:
        neutron/tests/unit/db/test_agentschedulers_db.py

    Change-Id: I299031a505f05cd0469e2476b867b9dbca59c5bf
    Partial-Bug: #1840579
    (cherry picked from commit 480b04ce04f98bae6ff4ab13cec9e01b34204134)

tags: added: in-stable-queens
Changed in neutron:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.