dnsmasq on DHCP Agent does not listen on tcp/53 after dnsmasq restart

Bug #1998621 reported by Sebastian Lohff
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Unassigned

Bug Description

When talking to dnsmasq using DNS over tcp dnsmasq will fork out for TCP connections. Forked processes will stay until all connections have been closed, meaning that dangling connections will keep the processes and with that will also keep the tcp/53 port in listening state. On dnsmasq restart (e.g. on network update, subnet create, ...) the parent process is killed with SIGKILL and a new process is started. This new process cannot listen on tcp/53, as it is still in use by the old child with the dangling connection.

This could be prevented by sending SIGTERM instead of SIGKILL, as dnsmasq then does a proper cleanup of its forks and all tcp/53 connections are properly closed.

This only happens when starting the dnsmasq with --bind-dynamic, as with this flag dnsmasq will ignore any errors resulting form it not being able to bind on tcp/53, see here:
https://github.com/imp/dnsmasq/blob/f186bdcbc76cd894133a043b115b4510c0ee1fcf/src/network.c#L725-L726
The flag has been introduced here:
https://bugs.launchpad.net/neutron/+bug/1828473

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/866489

Changed in neutron:
status: New → In Progress
Revision history for this message
Sebastian Lohff (sebageek) wrote :

To reproduce the error I did the following:
 * connect onto my network-agent
 * enter the namespace of my network via ip netns exec qdhcp-$network_id bash
 * check with "ss -tuplen" that dnsmasq is listening on tcp/53
 * nc localhost 53 # keep this open
 * add/remove a subnet from the current network (or just call directory.get_plugin().agent_notifiers['DHCP agent'].network_added_to_agent() on your network_id/agent_host)
 * wait shortly
 * kill the netcat (it keeps the one dnsmasq process busy)
 * check "ss -tuplen" and see that it is no longer listening on tcp/53

Changed in neutron:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/866489
Committed: https://opendev.org/openstack/neutron/commit/74224e79e031636018b970fac9c2aa72516eb12d
Submitter: "Zuul (22348)"
Branch: master

commit 74224e79e031636018b970fac9c2aa72516eb12d
Author: Sebastian Lohff <email address hidden>
Date: Fri Dec 2 17:36:44 2022 +0100

    Gracefully restart dnsmasq to not break tcp DNS

    When talking to dnsmasq using DNS over tcp dnsmasq will fork out for
    TCP connections. Forked processes will stay until all connections have
    been closed, meaning that dangling connections will keep the processes
    and with that will also keep the tcp/53 port in listening state. On
    dnsmasq restart (e.g. on network update, subnet create, ...) the parent
    process is killed with SIGKILL and a new process is started. This new
    process cannot listen on tcp/53, as it is still in use by the old child
    with the dangling connection.

    To prevent dangling dnsmasq connections on tcp we need to properly
    shutdown the child. This is done by first sending SIGTERM and only send
    a SIGKILL if the process is not shutting down properly. With that we
    get proper cleanup of all children and tcp will come up after a restart.

    Change-Id: Ie633148c512f5124e978648c50a4c6318c61baa8
    Closes-bug: #1998621

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 22.0.0.0rc1

This issue was fixed in the openstack/neutron 22.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.