2019-08-18 13:51:34 |
norman shen |
description |
we are running into an unexpected situation where number of dvr routers is increasing to nearly 2000 on a compute node on which some instances got a nic on floating ip network.
We are using Queens release,
neutron-common/xenial,now 2:12.0.5-5~u16.04+mcp155 all [installed,automatic]
neutron-l3-agent/xenial,now 2:12.0.5-5~u16.04+mcp155 all [installed]
neutron-metadata-agent/xenial,now 2:12.0.5-5~u16.04+mcp155 all [installed,automatic]
neutron-openvswitch-agent/xenial,now 2:12.0.5-5~u16.04+mcp155 all [installed]
python-neutron/xenial,now 2:12.0.5-5~u16.04+mcp155 all [installed,automatic]
python-neutron-fwaas/xenial,xenial,now 2:12.0.1-1.0~u16.04+mcp6 all [installed,automatic]
python-neutron-lib/xenial,xenial,now 1.13.0-1.0~u16.04+mcp9 all [installed,automatic]
python-neutronclient/xenial,xenial,now 1:6.7.0-1.0~u16.04+mcp17 all [installed,automatic]
Currently, my guess is that some applications mistakenly invokes rpc calls like this https://github.com/openstack/neutron/blob/490471ebd3ac56d0cee164b9c1c1211687e49437/neutron/api/rpc/agentnotifiers/l3_rpc_agent_api.py#L166 with dvr associated with a floating ip address on a host which has fixed ip address allocated from floating network (aka device_owner prefix with compute:). Then such router will be kept by this https://github.com/openstack/neutron/blob/490471ebd3ac56d0cee164b9c1c1211687e49437/neutron/db/l3_dvrscheduler_db.py#L427 function, because `get_subnet_ids_on_router` does not filter out router:gateway ports.
I think this is a bug because as long as we do not have ports with specific device owners we should not have a dvr router on it. |
we are running into an unexpected situation where number of dvr routers is increasing to nearly 2000 on a compute node on which some instances got a nic on floating ip network.
We are using Queens release,
neutron-common/xenial,now 2:12.0.5-5~u16.04+mcp155 all [installed,automatic]
neutron-l3-agent/xenial,now 2:12.0.5-5~u16.04+mcp155 all [installed]
neutron-metadata-agent/xenial,now 2:12.0.5-5~u16.04+mcp155 all [installed,automatic]
neutron-openvswitch-agent/xenial,now 2:12.0.5-5~u16.04+mcp155 all [installed]
python-neutron/xenial,now 2:12.0.5-5~u16.04+mcp155 all [installed,automatic]
python-neutron-fwaas/xenial,xenial,now 2:12.0.1-1.0~u16.04+mcp6 all [installed,automatic]
python-neutron-lib/xenial,xenial,now 1.13.0-1.0~u16.04+mcp9 all [installed,automatic]
python-neutronclient/xenial,xenial,now 1:6.7.0-1.0~u16.04+mcp17 all [installed,automatic]
Currently, my guess is that some applications mistakenly invokes rpc calls like this https://github.com/openstack/neutron/blob/490471ebd3ac56d0cee164b9c1c1211687e49437/neutron/api/rpc/agentnotifiers/l3_rpc_agent_api.py#L166 with dvr associated with a floating ip address on a host which has fixed ip address allocated from floating network (aka device_owner prefix with compute:). Then such router will be kept by this https://github.com/openstack/neutron/blob/490471ebd3ac56d0cee164b9c1c1211687e49437/neutron/db/l3_dvrscheduler_db.py#L427 function, because `get_subnet_ids_on_router` does not filter out router:gateway ports.
I think this is a bug because as long as we do not have ports with specific device owners we should not have a dvr router on it.
besides it is pretty easy to replay this bug.
First create a dvr router with an external gateway on floating network
Then create on virtual machine with fixed ip on floating network
Then call `routers_updated_on_host` manually, then this dvr will be created on the host where vm resides on, but actually it should be there. |
|