Cannot delete zone from designate if zone's SOA is actually hosted by a forwarder/external dns

Bug #1807464 reported by Drew Freiberger
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Designate
New
Undecided
Unassigned
OpenStack Designate Charm
Triaged
Undecided
Unassigned

Bug Description

This is an upstream Designate bug based on packages installed by our bionic/queens cloud charm configs.

If you add a zone to designate that is actually a zone owned by an upstream DNS server serviced by the forwarders defined for bind, you cannot delete the zone from designate.

You will see the logs looping with:

https://pastebin.ubuntu.com/p/vgpzCQVRbb/

The flag "RA" denotes that this is a referred answer, not an authoritative answer.

In the code, the check is whether the response from Designate is authoritative.

With the DNS backend network included in allowed_recursion_nets, the recursive lookup northward within designate-bind will return an accurate, external SOA record where designate code expects none.

Workaround, remove the forwarders from your charm config, let the zone deletion succeed, then re-add your forwarders.

Another workaround is to configure your dns-backend network into allowed_nets instead of allowed_recursion_nets in the charm config to prefect designate's mdns updates from querying upstream DNS accidentally.

tags: added: cpe-onsite
Revision history for this message
Drew Freiberger (afreiberger) wrote :
description: updated
Revision history for this message
James Page (james-page) wrote :

Drew - is there a bug raised against the designate project for this issue?

Changed in charm-designate:
status: New → Triaged
Revision history for this message
Drew Freiberger (afreiberger) wrote :

James, I think you may be on to something, this should be filed against upstream. If the record being looked up comes back from the bind server being controlled by designate as being non-authoritative, it should be assumed that the bind server is performing recursive lookups and that it's no longer hosted in the local zone files.

We just found a similar issue where it took TTL seconds to have an entry deleted from the designate database when the recursive lookup hit corporate dns that had a cached entry from this delegated subdomain that was hosted in Designate.

Imagine the scenario where you have:

End User -> corporate DNS query for designate hosted entry -> corporate DNS recurses to the designate-bind service, retrieves entry, caches it in corporate DNS, and then returns non-authoritative entry to end-user, for example zone foo.mydesignatedomain.com, where corp dns is delegating mydesignatedomain.com to designate-bind.

You then go to delete foo.mydesignatedomain.com zone from the mydesignatedomain.com zone in designate. Designate mdns updates the designate-bind service and the zone is dropped from designate-bind, but when designate's mdns service queries designate-bind, designate-bind now forwards foo.mydesignatedomain.com upstream to corporate DNS (as the forwarder configured in the charm for anything not hosted locally). Your corporate DNS service still has the ns/soa record for foo.mydesignatedomain.com cached as long as the TTL was set for that zone, and designate doesn't ack that the domain was deleted from designate-bind because it received a response due to the recursion. Once the TTL times out upstream, then the record is shown as deleted. This can lead to 24 hour DNS woes if someone uses Designate maliciously.

While we can solve this in the charm by having the designate relationship's mdns records excluded from recursive lookup upstream, or by configuring the recursion_nets to cover only the cloud's overlay networks, we can solve this internally, but I believe solving it with a check for authoritative vs non-authoritative lookups within the designate code itself would solve this much better. I do worry that there are use-cases where the DNS backend (when not using designate-bind) may have a good reason to not provide an authoritative response (like if you're doing designate->bastion-dns-server which then updates an authoritative upstream corporate server), in which case, it wouldn't be solvable upstream w/out adding some additional configuration options.

I'd really like to see the designate-bind service set to blacklist recursion for the designate-mdns endpoints to solve this in the charmed situations, as I believe there are valid instances where the designate units' mdns endpoints could be within the same CIDR as public/overlay network IPs that need to be able to recurse through the designate-bind service.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

This appears to be described in upstream project bug:
https://bugs.launchpad.net/designate/+bug/1802227

In the comments of that bug, it is noted that the best practice is to not have your authoritative BIND servers for designate also perform recursion.

This seems to suggest that for those environments that choose the designate-bind/neutron recursive DNS chain setup of VM->neutron dnsmasq->designate-bind->corporate should be altered to be something more along the lines of VM->neutron dnsmasq->bind-recursion service that points designate zones to the designate-bind forwarders and remaining queries to corporate DNS. The other option may be to stop supporting recursive DNS from designate-bind at all and to force the cloud DNS strategy to have neutron dnsmasq services refer to corporate or public DNS servers and not use designate-bind servers as primary external DNS for the cloud.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.