DHCP agent in the network segment in Routed provider network reports the error and does not start dnsmasq process
======
I have routed network based on Mellanox VMS L3 - https://community.mellanox.com/docs/DOC-1432 and a compute nodes on two segments (actually 2 racks with 2 different subnets). I was following the guide on implementing Routed provider networks https://docs.openstack.org/neutron/queens/admin/config-routed-networks.html
======
Rack 1 openvswitch_agent.ini:
[ovs]
bridge_mappings = 40Grack1:br-vlan
Rack 1 openvswitch_agent.ini:
[ovs]
bridge_mappings = 40Grack2:br-vlan
======
Reproduction steps:
openstack network create --project proj --provider-physical-network 40Grack1 --provider-network-type vlan --provider-segment 403 vsorokin-VLAN403-net
openstack network segment set --name vsorokin-VLAN403-net-rack1 17cf03cb-0165-46c4-9586-598ca2239c75
openstack subnet create --network vsorokin-VLAN403-net --network-segment vsorokin-VLAN403-net-rack1 --ip-version 4 --subnet-range 10.243.64.0/22 --gateway none --allocation-pool start=10.243.64.2,end=10.243.67.253 --host-route destination=10.243.64.0/18,gateway=10.243.67.254 vsorokin-VLAN403-subnet-rack1
At this point I can see qdhcp-* netns created and dnsmasq process running on Rack 1 node.
openstack network segment create --physical-network 40Grack2 --network-type vlan --segment 403 --network vsorokin-VLAN403-net vsorokin-VLAN403-net-rack2
openstack subnet create --network vsorokin-VLAN403-net --network-segment vsorokin-VLAN403-net-rack2 --ip-version 4 --subnet-range 10.243.68.0/22 --gateway none --allocation-pool start=10.243.68.2,end=10.243.71.253 --host-route destination=10.243.64.0/18,gateway=10.243.71.254 vsorokin-VLAN403-subnet-rack2
That command causes the error in neutron-dhcp-agent.log in the node in Rack 2(repeating every 30 seconds):
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent [req-279d513d-652e-46dc-94ab-8
90d90a13235 - - - - -] Unable to enable dhcp for 99cfc13a-adec-4dc0-baeb-864437829b3d.: Ke
yError: u'287d9d56-1c0f-4d4b-a5cc-4718efc80436'
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent Traceback (most recent call la
st):
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dis
t-packages/neutron/agent/dhcp/agent.py", line 144, in call_driver
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent getattr(driver, action)(**
action_kwargs)
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dis
t-packages/neutron/agent/linux/dhcp.py", line 219, in enable
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent self.spawn_process()
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dis
t-packages/neutron/agent/linux/dhcp.py", line 446, in spawn_process
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent self._spawn_or_reload_proc
ess(reload_with_HUP=False)
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dis
t-packages/neutron/agent/linux/dhcp.py", line 455, in _spawn_or_reload_process
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent self._output_config_files(
)
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dis
t-packages/neutron/agent/linux/dhcp.py", line 499, in _output_config_files
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent self._output_opts_file()
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dis
t-packages/neutron/agent/linux/dhcp.py", line 872, in _output_opts_file
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent options, subnet_index_map
= self._generate_opts_per_subnet()
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dis
t-packages/neutron/agent/linux/dhcp.py", line 933, in _generate_opts_per_subnet
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent subnet_dhcp_ip = subnet_to
_interface_ip[subnet.id]
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent KeyError: u'287d9d56-1c0f-4d4b
-a5cc-4718efc80436'
2018-07-16 15:48:03.350 3713 ERROR neutron.agent.dhcp.agent
WHERE 287d9d56-1c0f-4d4b-a5cc-4718efc80436 is the uuid of the subnet in rack1
Then if I restart the dhcp-agent in rack 1, I got the same error referring the uuid of the subnet in rack 2
[vsorokin@xnode12-15 ~(keystone_admin)]$ neutron subnet-list
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+--------------------------------------+-------------------------------+----------------------------------+-----------------+----------------------------------------------------+
| id | name | tenant_id | cidr | allocation_pools |
+--------------------------------------+-------------------------------+----------------------------------+-----------------+----------------------------------------------------+
| 287d9d56-1c0f-4d4b-a5cc-4718efc80436 | vsorokin-VLAN403-subnet-rack1 | 86d919dd7c984631aefd9dddb828a5bc | 10.243.64.0/22 | {"start": "10.243.64.2", "end": "10.243.67.253"} |
| 657be337-c316-442a-9742-082773714655 | vsorokin-priv-subnet | 86d919dd7c984631aefd9dddb828a5bc | 10.1.1.0/24 | {"start": "10.1.1.2", "end": "10.1.1.254"} |
| 7ca43b64-0be1-4e75-abda-7c9a4f7aa4c2 | vsorokin-VLAN403-subnet-rack2 | 86d919dd7c984631aefd9dddb828a5bc | 10.243.68.0/22 | {"start": "10.243.68.2", "end": "10.243.71.253"} |
| f5eeabde-8ab1-49a6-845b-2df4f860fec1 | public_subnet1 | 862f6b357fb2496ba1350628a8b08657 | 172.31.192.0/18 | {"start": "172.31.240.1", "end": "172.31.240.254"} |
+--------------------------------------+-------------------------------+----------------------------------+-----------------+----------------------------------------------------+
[vsorokin@xnode12-15 ~(keystone_admin)]$
[vsorokin@xnode12-15 ~(keystone_admin)]$ openstack network segment list --network vsorokin-VLAN403-net
+--------------------------------------+----------------------------+--------------------------------------+--------------+---------+
| ID | Name | Network | Network Type | Segment |
+--------------------------------------+----------------------------+--------------------------------------+--------------+---------+
| 17cf03cb-0165-46c4-9586-598ca2239c75 | vsorokin-VLAN403-net-rack1 | 99cfc13a-adec-4dc0-baeb-864437829b3d | vlan | 403 |
| 705ef7dd-7210-46c1-a8ca-da8e02d32d82 | vsorokin-VLAN403-net-rack2 | 99cfc13a-adec-4dc0-baeb-864437829b3d | vlan | 403 |
+--------------------------------------+----------------------------+--------------------------------------+--------------+---------+
[vsorokin@xnode12-15 ~(keystone_admin)]$
[vsorokin@xnode12-15 ~(keystone_admin)]$ neutron agent-list
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+--------------------------------------+--------------------+-----------------------------+-------------------+-------+----------------+---------------------------+
| id | agent_type | host | availability_zone | alive | admin_state_up | binary |
+--------------------------------------+--------------------+-----------------------------+-------------------+-------+----------------+---------------------------+
| 22536bc5-0876-4c22-8637-66d269514eb1 | DHCP agent | xnode12-16.pub.pic2.ibm.com | nova | :-) | True | neutron-dhcp-agent |
| 5d537319-4d51-43dd-a227-d61913e37c5b | Open vSwitch agent | xnode12-16.pub.pic2.ibm.com | | :-) | True | neutron-openvswitch-agent |
| 8dee041d-87f9-42e9-9866-21a5b7353041 | Metadata agent | tnode2-15 | | :-) | True | neutron-metadata-agent |
| 8ef4aa73-4090-4e3c-a7b5-fa0e6ce26c1e | Metadata agent | hnode1-5 | | :-) | True | neutron-metadata-agent |
| 98cf9262-5271-4a1c-b0a1-f60b9f049c88 | DHCP agent | tnode2-15 | nova | :-) | True | neutron-dhcp-agent |
| b4ea7c48-87f8-40cd-a543-2143d3b7354a | DHCP agent | hnode1-5 | nova | :-) | True | neutron-dhcp-agent |
| c7d263c7-f128-4618-944b-73debea1e670 | Metering agent | xnode12-16.pub.pic2.ibm.com | | :-) | True | neutron-metering-agent |
| ce076f13-0a99-4a40-baa2-7c2f1afb5cff | Open vSwitch agent | hnode1-5 | | :-) | True | neutron-openvswitch-agent |
| e90db66d-9d94-437f-8d13-4d7e63ee04a9 | L3 agent | xnode12-16.pub.pic2.ibm.com | nova | :-) | True | neutron-l3-agent |
| f11864a0-f423-4933-b6e9-74654896b80b | Metadata agent | xnode12-16.pub.pic2.ibm.com | | :-) | True | neutron-metadata-agent |
| f2db4ea0-af33-4499-b388-e9589bd4fe12 | Open vSwitch agent | tnode2-15 | | :-) | True | neutron-openvswitch-agent |
+--------------------------------------+--------------------+-----------------------------+-------------------+-------+----------------+---------------------------+
[vsorokin@xnode12-15 ~(keystone_admin)]$ neutron net-list-on-dhcp-agent b4ea7c48-87f8-40cd-a543-2143d3b7354a
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+--------------------------------------+----------------------+----------------------------------+-----------------------------------------------------+
| id | name | tenant_id | subnets |
+--------------------------------------+----------------------+----------------------------------+-----------------------------------------------------+
| 99cfc13a-adec-4dc0-baeb-864437829b3d | vsorokin-VLAN403-net | 86d919dd7c984631aefd9dddb828a5bc | 287d9d56-1c0f-4d4b-a5cc-4718efc80436 10.243.64.0/22 |
| | | | 7ca43b64-0be1-4e75-abda-7c9a4f7aa4c2 10.243.68.0/22 |
+--------------------------------------+----------------------+----------------------------------+-----------------------------------------------------+
[vsorokin@xnode12-15 ~(keystone_admin)]$
[vsorokin@xnode12-15 ~(keystone_admin)]$
[vsorokin@xnode12-15 ~(keystone_admin)]$ neutron net-list-on-dhcp-agent 98cf9262-5271-4a1c-b0a1-f60b9f049c88
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+--------------------------------------+----------------------+----------------------------------+-----------------------------------------------------+
| id | name | tenant_id | subnets |
+--------------------------------------+----------------------+----------------------------------+-----------------------------------------------------+
| 99cfc13a-adec-4dc0-baeb-864437829b3d | vsorokin-VLAN403-net | 86d919dd7c984631aefd9dddb828a5bc | 287d9d56-1c0f-4d4b-a5cc-4718efc80436 10.243.64.0/22 |
| | | | 7ca43b64-0be1-4e75-abda-7c9a4f7aa4c2 10.243.68.0/22 |
+--------------------------------------+----------------------+----------------------------------+-----------------------------------------------------+
[vsorokin@xnode12-15 ~(keystone_admin)]$
As you can see, neutron is trying to make each DHCP agent serving both subnets. Which I beleive is wrong.
Versions:
OpenStack controller backplane: Queens RDO/CentOS 7.4 x86-64
Nodes hosting openvswitch-agent and dhcp-agent: Queens/Ubuntu 16.04.4 4.13.0-45-generic ppc64le
I suppose neutron "net-list- on-dhcp- agent" will simply show the network dict made from the network ID, which means the subnets associated with it above does not make much sense in this routed network scenario here. We can say that both agents server the routed network: vsorokin- VLAN403- net which is expected but we can't tell whether neutron is trying to make each DHCP agent serving both subnets from the CLI result at least.
Indeed, per the routed network config guide [1], "Unlike conventional provider networks, a DHCP agent cannot support more than one segment within a network". The current neutron implementation should have checked and prevented this multi-segment connection from a given agent. And it should also have checked if a dhcp agent should be scheduled per segment with a dhcp enabled subnet while scheduling. So the issue might be caused by somewhere hidden.
Would you please kindly verify that each IPv4 subnet associates with at least one DHCP agent? Furthermore, I guess a neutron-server log would be much appreciated. :)
[1] https:/ /docs.openstack .org/neutron/ queens/ admin/config- routed- networks. html