neutron-dhcp-agent doesn't hand out leases for recently used addresses
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
Aaron Rosen | ||
tripleo |
Fix Released
|
High
|
Unassigned |
Bug Description
Hi, last two days on ci-overcloud.
There is a bunch of rabbit disconnect messages in the log, but they are not correlated with the fault.
Existing VMs can DHCP successfully, and the overlay network seems fine.
I've checked the dhcp agent host file and it is being updated.
restarting the neutron-dhcp-agent 'fixes' the problem.
kill -HUPing the dnsmasqs doesn't seem to fix things
-> The issue is that a prior VM gets (say ) 192.168.1.25, then is shutdown.
-> we then issue 1.25 to a new VM, but the dnsmasq process still has 1.25 in its in-memory leases table as being for the other adress.
- DHCPRELEASE *is* being issued, perhaps sporadic failures or some other condition/
# WORKAROUNDS:
- restart the agent in a cronjob
- change reload_allocations to return self.restart(), avoiding the cache in dnsmasq but introducing short interrupts in DHCP on each new VM deployed.
# POSSIBLE SOLUTIONS
- change dnsmasq to discard leases that were statically allocated and no longer have that MAC address (e.g. to trust the config more)
- change neutron to only recycle port addresses when the discarded MAC has been deleted more than (lease time).
- use dnsmasqs multiple-MAC feature to tell dnsmasq that a handover is happening.[1]
- the lease change script is apparently called for all leases when SIGHUP is received, so maybe we can terminate dead leases there? [2]
[1] " As a special case, in DHCPv4, it is possible to include more than one hardware address. eg:
"
[2] "When it receives a SIGHUP, dnsmasq clears its cache and then re-loads /etc/hosts and /etc/ethers and any file given by --dhcp-hostsfile, --dhcp-optsfile or --addn-hosts. The dhcp lease change script is called for all existing DHCP leases."
Changed in tripleo: | |
importance: | Undecided → Critical |
status: | New → Triaged |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in neutron: | |
assignee: | nobody → Aaron Rosen (arosen) |
Changed in neutron: | |
importance: | Undecided → High |
Changed in neutron: | |
status: | New → Confirmed |
Changed in neutron: | |
milestone: | juno-1 → juno-2 |
Changed in neutron: | |
status: | Incomplete → Invalid |
status: | Invalid → Fix Released |
Changed in tripleo: | |
status: | Triaged → Fix Released |
Ok, so poked at logs...
2014-01-22 01:58:58.029 14424 ERROR neutron. openstack. common. rpc.common [-] Failed to consume message from queue: Socket closed openstack. common. rpc.common Traceback (most recent call last): openstack. common. rpc.common File "/opt/stack/ venvs/neutron/ local/lib/ python2. 7/site- packages/ neutron/ openstack/ common/ rpc/impl_ kombu.py" , line 576, in ensure openstack. common. rpc.common return method(*args, **kwargs) openstack. common. rpc.common File "/opt/stack/ venvs/neutron/ local/lib/ python2. 7/site- packages/ neutron/ openstack/ common/ rpc/impl_ kombu.py" , line 656, in _consume openstack. common. rpc.common return self.connection .drain_ events( timeout= timeout) openstack. common. rpc.common File "/opt/stack/ venvs/neutron/ local/lib/ python2. 7/site- packages/ kombu/connectio n.py", line 279, in drain_events openstack. common. rpc.common return self.transport. drain_events( self.connection , **kwargs) openstack. common. rpc.common File "/opt/stack/ venvs/neutron/ local/lib/ python2. 7/site- packages/ kombu/transport /pyamqp. py", line 90, in drain_events openstack. common. rpc.common return connection. drain_events( **kwargs) openstack. common. rpc.common File "/opt/stack/ venvs/neutron/ local/lib/ python2. 7/site- packages/ amqp/connection .py", line 299, in drain_events openstack. common. rpc.common chanmap, None, timeout=timeout, openstack. common. rpc.common File "/opt/stack/ venvs/neutron/ local/lib/ python2. 7/site- packages/ amqp/connection .py", line 362, in _wait_multiple openstack. common. rpc.common channel, method_sig, args, content = read_timeout( timeout) openstack. common. rpc.common File "/opt/stack/ venvs/neutron/ local/lib/ python2. 7/site- packages/ amqp/connection .py", line 326, in read_timeout openstack. common. rpc.common return self.method_ reader. read_method( ) openstack. common. rpc.common File "/opt/stack/ venvs/neutron/ local/lib/ python2. 7/site- packages/ amqp/method_ framing. py", line 189, in read_method openstack. common. rpc.common raise m openstack. common. rpc.common IOError: Socket closed openstack. common. rpc.common openstack. common. rpc.common [-] Failed to consume message from queue: Socket closed openstack. common. rpc.common Traceback (most recent call last): openstack. common. rpc....
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.029 14424 TRACE neutron.
2014-01-22 01:58:58.041 14424 ERROR neutron.
2014-01-22 01:58:58.041 14424 TRACE neutron.
2014-01-22 01:58:58.041 14424 TRACE neutron.