Designate

Missing check for domain ownership

Bug #1471159 reported by Florian Weimer on 2015-07-03

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Designate	Fix Released	High	Kiall Mac Innes	Designate 1.0.0 "liberty"

Bug Description

Designate currently does not check if a tenant asking for domain creation actually controls the domain according to the (internal or public) DNS.

This causes at least the following issues:

(a) The first-come-first-served nature of domain ownership allows one tenant to create a domain which another tenant plans to manage with Designate. This prevents the other tenant from using Designate. If the other tenant proceeds to move the domain to Designate, the first tenant has hijacked control of the domain.

(b) Registration of second-level public suffixes such “co.uk” is not blocked. If a tenant creates such domains, no one else can create subdomains under that, including legitimate domain names such as “example.co.uk“.

(c) A malicious tenant may create domains for name servers whose names are currently not managed by this Designate instance and add A records. This enables domain hijacking.

Before:

; <<>> DiG 9.9.4-RedHat-9.9.4-18.el7 <<>> @10.35.6.9 example.org NS
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56445
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1680
;; QUESTION SECTION:
;example.org. IN NS

;; ANSWER SECTION:
example.org. 3600 IN NS ns1.unmanaged.example.
example.org. 3600 IN NS ns1.example.com.

;; Query time: 1 msec
;; SERVER: 10.35.6.9#53(10.35.6.9)
;; WHEN: Thu Jun 25 18:17:41 IDT 2015
;; MSG SIZE rcvd: 104

After the malicious tenant has created a domain “ns1.unmanaged.example.” and added an A record (192.0.2.7 in this example):

; <<>> DiG 9.9.4-RedHat-9.9.4-18.el7 <<>> @10.35.6.9 example.org NS
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62622
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1680
;; QUESTION SECTION:
;example.org. IN NS

;; ANSWER SECTION:
example.org. 3600 IN NS ns1.example.com.
example.org. 3600 IN NS ns1.unmanaged.example.

;; ADDITIONAL SECTION:
ns1.unmanaged.example. 3600 IN A 192.0.2.7

;; Query time: 43 msec
;; SERVER: 10.35.6.9#53(10.35.6.9)
;; WHEN: Thu Jun 25 18:19:06 IDT 2015
;; MSG SIZE rcvd: 120

If the malicious tenant controls the IP address 192.0.2.7 and runs a suitable name server there, they can hijack the domain.

(d) It is completely unsafe to use a combined authoritative/recursive implementation for the DNS server (such as BIND) because a malicious tenant can create domains they do not own and hijack domains from the point of view of this recursor. (This likely affects BIND even when recursion is disabled in the configuration because BIND will still perform recursion for its own purposes.) Separate name servers on distinct IP addresses (but perhaps hosted on the machine) are required. Using PowerDNS addresses this issue.

I believe this results in a security vulnerability (which may affect already existing production deployments), so we need to communicate this to upstream and the OpenStack security team.

Fixing this issue appears *very* difficult. (d) points towards one approach: have complete separate name servers for each tenant, on dedicated IP addresses. However, for public DNS deployments, this may not be possible due to IPv4 address space limitations.

I believe ISPs address this by tying DNS management to registry updates: A customer can maintain control of a domain in their systems only if an automated request to register the domain at the registry level succeeds in a reasonable time frame. The “co.uk” aspect is taken care of because customers can only register domains for which there is a known registry configured in the system (because the ISP needs to have negotiated access to the registry database, otherwise they would not be able to register domains on behalf of their customers).

Revision history for this message

Florian Weimer (fweimer) wrote on 2015-07-03:

Downstream bug: https://bugzilla.redhat.com/show_bug.cgi?id=1235734

Revision history for this message

Kiall Mac Innes (kiall) wrote on 2015-07-03:

Hey, So we have three different things here, I'll address the first two now.

> (a) The first-come-first-served nature of domain ownership allows one tenant to create a domain which another tenant plans to manage with Designate. This prevents the other tenant from using Designate. If the other tenant proceeds to move the domain to Designate, the first tenant has hijacked control of the domain.

Correct - This is something every DNS provider struggles with, there is just no mechanism for implementing this other than as a manual support process which we could include in Designate. I'm aware of one provider who's managed this, CloudFlare, whereby they assign each zone two NS records out of a larger pool of available NS records. When the zone delegation is updated, they identify the "true" owner and activate the appropriate zone based on matching of the assigned NS's to the delegation.

They have a good write up on this here: https://blog.cloudflare.com/whats-the-story-behind-the-names-of-cloudflares-name-servers/

All that said, I don't believe this is something we should implement in Designate. DNS providers have dealt with this issue successfully for years as a manual process. We should however include a description of this within the docs.

> (b) Registration of second-level public suffixes such “co.uk” is not blocked. If a tenant creates such domains, no one else can create subdomains under that, including legitimate domain names such as “example.co.uk“.

Designate include's functionality to protect against this, where the list of TLDs (we call co.uk etc a TLD, even if it's not technically accurate) can be imported and treated as as such. We should ensure the documentation encourages deployers to keep the list of TLDs up to date, using a source such as the IANA TLD list (for true single label TLDs), and the Mozilla sponsored "Public Suffix List" https://publicsuffix.org/list/public_suffix_list.dat for co.uk style "TLDs".

Beyond this, there is little we can do. We have no way of knowing if "foo.uk" will become a TLD, or a standard zone in the future.. :(

Hey, So we have three different things here, I'll address the first two now.

They have a good write up on this here: https://blog.cloudflare.com/whats-the-story-behind-the-names-of-cloudflares-name-servers/

Beyond this, there is little we can do. We have no way of knowing if "foo.uk" will become a TLD, or a standard zone in the future.. :(

Revision history for this message

Kiall Mac Innes (kiall) wrote on 2015-07-08:

@Florian Weimer: With regard to #3, this is an interesting variation on DNS cache poisoning. Properly configured recursive revolvers will ignore the out-of-bailiwick additional section. As this is a well-known and widely mitigated attack against DNS, I believe this should be moved to Public Security, with documentation provided recommended configurations for DNS servers to prevent the out-of-zone data from being returned.

I'm attaching a screenshot of the new documentation section I'll be proposing for this. This section will be further updated to address your A and B points above.

Revision history for this message

Kiall Mac Innes (kiall) wrote on 2015-07-08:

bug-1471159.png Edit (148.5 KiB, image/png)

Kiall Mac Innes (kiall) on 2015-07-08

Changed in designate:
status:	New → In Progress
assignee:	nobody → Kiall Mac Innes (kiall)
milestone:	none → liberty-2
importance:	Undecided → High

Revision history for this message

Florian Weimer (fweimer) wrote on 2015-07-08:

Kiall, there's a typo in the screenshot “their [z]one pointing”.

I think the description of the attack is a bit misleading. I would expect that some resolvers use the name server A record from glue. Then the attacker can serve a zone with a non-minimal reply which overrides the entire zone NS RRset, hijacking the zone. The second part is necessary to bypass the trust ranking rules. I am not sure if resolvers can actually filter out such glue records.

I'm surprised that the PowerDNS documentation says that out-of-zone-additional-processing defaults to off. This doesn't match what I saw in my experiment. I will raise this with PowerDNS upstream.

Can we delay updating this documentation until PowerDNS has had a chance to respond, please?

Revision history for this message

Kiall Mac Innes (kiall) wrote on 2015-07-09:

Yes, Graham also contacted powerdns a few minutes ago about the docs being incorrect (he was validating the suggestions from the screenshot).

I'll re-work the description a little, and attach a patch + updated screenshot here. I've also written up case #1 and #2, above.

Revision history for this message

Florian Weimer (fweimer) wrote on 2015-07-09:

PowerDNS told me that they will treat this discrepancy solely as a documentation issue, and that they will change the documentation, not the code. Disclosure does not need to be coordinated with them.

Revision history for this message

Peter van Dijk (habbie) wrote on 2015-07-09:

Graham has sent us a documentation update which we will merge and publish on doc.powerdns.com soon. If and when you publish about this issue, we'd be happy to do a proper announcement (pdns-announce+social networks) about the issue, without making it a security advisory from our end.

As I pointed out to Florian, though, the PowerDNS side of things is a non-issue because (a) the flag is irrelevant if you prevent user B from claiming a parent domain of a domain user A has (b) the flag does not help when user B claims a subdomain of a domain user A has. In the end it is all still up to the person/software filling the database.

- Peter (PowerDNS)

Revision history for this message

Kiall Mac Innes (kiall) wrote on 2015-07-10:

@habie - Agreed, from the PowerDNS point of view this is likely just a doc issue. But - I believe Florian's concern was around the documentation for your out-of-zone-additional-processing flag being incorrect, misleading users into believing the default configuration would prevent other users out-of-zone additions from being loaded. This was his issue (c) above...

Revision history for this message

Kiall Mac Innes (kiall) wrote on 2015-07-10:

#10

0001-Introduce-a-Production-Guidelines-document.patch Edit (7.1 KiB, text/plain)

@Florian: re

> I would expect that some resolvers use the name server A record from glue. Then the attacker can serve a zone with a non-minimal reply which overrides the entire zone NS RRset.

The attacker will have no control of the choice of minimal or non-minimal replies, as the provider will make this configuration change on their nameservers. At this point, Designate and the Designate managed nameservers have resolved this issue. If an attacker sets up their own nameserver with "forged" or otherwise invalid information we're entering the world of standard DNS caching poisoning, which there's zero we can do about. e.g. I can setup a nameserver for "foo.com.", entice lots of users towards it, and serve "1.1.1.1 IN A ns1.google.com." as an additional, but due to the 99% or more of resolvers implementing protection from this, I won't be hijacking google :)

I've attached another patch to the docs, and a rendered screenshot.

Revision history for this message

Kiall Mac Innes (kiall) wrote on 2015-07-10:

#11

bug-1471159-2.png Edit (519.6 KiB, image/png)

Revision history for this message

Graham Hayes (grahamhayes) wrote on 2015-07-10:

#12

The docs section looks good.

Should we be trying to contact the other DNS Servers we support to get guidance on configuring them?

Kiall Mac Innes (kiall) on 2015-07-28

information type:

Private Security → Public

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-07-28: Fix proposed to designate (master)

#13

Fix proposed to branch: master
Review: https://review.openstack.org/206441

Revision history for this message

Kiall Mac Innes (kiall) wrote on 2015-07-28:

#14

Doc update submitted as https://review.openstack.org/206441

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-07-28: Fix merged to designate (master)

#15

Reviewed: https://review.openstack.org/206441
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=7cac0bd4ee8ade9c7605a7e41d588bca9a294e91
Submitter: Jenkins
Branch: master

commit 7cac0bd4ee8ade9c7605a7e41d588bca9a294e91
Author: Kiall Mac Innes <email address hidden>
Date: Wed Jul 8 15:43:34 2015 +0100

Introduce a Production Guidelines document

    This document aims to provide a location for documented production
    configurations and considerations. Including common misconfigurations,
    attack mitigation techniques, and other relavant tips.

Change-Id: Ifd5fdb2546cca90766dcfe0aa657ab8d236569e1
Closes-Bug: 1471159