With a large number of VMs, at some point, the dhcp agent throws this index error trying to read the lease file:
2018-08-15 16:17:40.771 40391 ERROR neutron.agent.dhcp.agent [req-cf1c7a8e-b718-4c46-b7be-999747c7e526 afe623c9e78e47febd76617008b9138e c4f22248feb9430093858a0404b779d5 - - -] Unable to reload_allocations dhcp for ef71f918-dc0d-4a6e-8d37-0f5f0720e295.: IndexError: list index out of range
2018-08-15 16:17:40.771 40391 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2018-08-15 16:17:40.771 40391 ERROR neutron.agent.dhcp.agent File "/opt/stack/venv/neutron-20180718T154642Z/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 142, in call_driver
2018-08-15 16:17:40.771 40391 ERROR neutron.agent.dhcp.agent getattr(driver, action)(**action_kwargs)
2018-08-15 16:17:40.771 40391 ERROR neutron.agent.dhcp.agent File "/opt/stack/venv/neutron-20180718T154642Z/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 512, in reload_allocations
2018-08-15 16:17:40.771 40391 ERROR neutron.agent.dhcp.agent self._release_unused_leases()
2018-08-15 16:17:40.771 40391 ERROR neutron.agent.dhcp.agent File "/opt/stack/venv/neutron-20180718T154642Z/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 825, in _release_unused_leases
2018-08-15 16:17:40.771 40391 ERROR neutron.agent.dhcp.agent v6_leases = self._read_v6_leases_file_leases(leases_filename)
2018-08-15 16:17:40.771 40391 ERROR neutron.agent.dhcp.agent File "/opt/stack/venv/neutron-20180718T154642Z/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 810, in _read_v6_leases_file_leases
2018-08-15 16:17:40.771 40391 ERROR neutron.agent.dhcp.agent (iaid, ip, client_id) = parts[1], parts[2], parts[4]
2018-08-15 16:17:40.771 40391 ERROR neutron.agent.dhcp.agent IndexError: list index out of range
When this happens, the agent calls sync_state to fully resync the agent state, which is a serious problem when dealing with a lot of ports in a scale environment.
Is it possible to avoid a full resync of all ports?
@brian-haley Although we are ignoring lines with an incorrect number of fields with the partial fix proposed, we are also likely to miss the entire trailing section of the file or to have truncated entries in the client_id field.
What are the implications of this for the dhcp agent?