Comment 8 for bug 1726017

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ok testing zesty on my own then, verified with three KVM guests:
dns1 192.168.122.79
dns2 192.168.122.225
zesty 192.168.122.220

# basic servers
$ sudo apt-get install bind9 bind9utils bind9-doc

/etc/bind/named.conf.local:
zone "paelzertest1.lan" {
        type master;
        file "/etc/bind/for.paelzertest1.lan";
 };
zone "1.168.192.in-addr.arpa" {
        type master;
        file "/etc/bind/rev.paelzertest1.lan";
 };

The other one the same but with a 2 instead of a 1

Also the forwar/reverse zones with 1 on dns1 and 2 on dns2
/etc/bind/for.paelzertest2.lan:
$TTL 86400
@ IN SOA pri.paelzertest1.lan. root.paelzertest1.lan. (
        2011071001 ;Serial
        3600 ;Refresh
        1800 ;Retry
        604800 ;Expire
        86400 ;Minimum TTL
)
@ IN NS pri.paelzertest1.lan.
@ IN A 192.168.1.200
@ IN A 192.168.1.201
pri IN A 192.168.1.200
test IN A 192.168.1.200

/etc/bind/rev.paelzertest1.lan:
$TTL 86400
@ IN SOA pri.paelzertest1.lan. root.paelzertest1.lan. (
        2011071002 ;Serial
        3600 ;Refresh
        1800 ;Retry
        604800 ;Expire
        86400 ;Minimum TTL
)
@ IN NS pri.paelzertest1.lan.
@ IN PTR paelzertest1.lan.
pri IN A 192.168.1.200
test IN A 192.168.1.201
200 IN PTR pri.paelzertest1.lan.
201 IN PTR test.paelzertest1.lan.

Disable recursion by adding the following to /etc/bind/named.conf.options:
allow-transfer {"none";};
allow-recursion {"none";};
recursion no;

$ sudo systemctl restart bind9

This is now having dns1 only answering for test.paelzertest1.lan and refusing if asking dns2 for it (and vice versa)

Example:
$ dig test.paelzertest1.lan @192.168.122.225

; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest1.lan @192.168.122.225
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 62119
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;test.paelzertest1.lan. IN A

;; Query time: 0 msec
;; SERVER: 192.168.122.225#53(192.168.122.225)
;; WHEN: Tue Nov 07 07:14:52 UTC 2017
;; MSG SIZE rcvd: 50

ubuntu@zesty-dnsmasq-test:~$ dig test.paelzertest2.lan @192.168.122.225

; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest2.lan @192.168.122.225
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37335
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;test.paelzertest2.lan. IN A

;; ANSWER SECTION:
test.paelzertest2.lan. 86400 IN A 192.168.2.201

;; AUTHORITY SECTION:
paelzertest2.lan. 86400 IN NS pri.paelzertest2.lan.

;; ADDITIONAL SECTION:
pri.paelzertest2.lan. 86400 IN A 192.168.2.200

;; Query time: 0 msec
;; SERVER: 192.168.122.225#53(192.168.122.225)
;; WHEN: Tue Nov 07 07:14:56 UTC 2017
;; MSG SIZE rcvd: 100

Now we configure dnsmasq as dns server and with a config to reach out to those two dns servers we prepared.

$ sudo vim /etc/resolv.dnsmasq.conf
nameserver 192.168.122.79
nameserver 192.168.122.225
$ sudo dnsmasq --resolv-file=/etc/resolv.dnsmasq.conf --no-hosts --no-daemon --log-queries

This should give you a dnsmasq asking our two servers, running locally (in foreground with debug enabled).
On a second console on the test system with dnsmasq now use dig to query the dnsmasq that will then ask the two binds we have.

So for something that fails for sure on both we get:
$ dig foo @127.0.0.1

; <<>> DiG 9.10.3-P4-Ubuntu <<>> foo @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 42311

On the server we see:
dnsmasq: query[A] foo from 127.0.0.1
dnsmasq: forwarded foo to 192.168.122.79
dnsmasq: forwarded foo to 192.168.122.225

That works for the Xenial Test.

Now this is a bit of a race, run sime loacl requests and sometimes you get the combo:

$ dig test.paelzertest2.lan @127.0.0.1

; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest2.lan @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 953
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

sever log:
dnsmasq: query[A] test.paelzertest2.lan from 127.0.0.1
dnsmasq: forwarded test.paelzertest2.lan to 192.168.122.79
dnsmasq: forwarded test.paelzertest2.lan to 192.168.122.225

This should not happen (and doesn't with the fix).

For Zesty to force the issue (since it has one of the two patches already) we need to force "SERVFAIL".
Unfortunately this fail has to be faster than the valid reply to trigger the race (it would then consider fail success and reply without waiting for the good answer).

To get an answer a bind has to run, but to get a SERVFAIL instead of an NXDOMAIN it will need a definition for that zone.

So copy /etc/bind/for.paelzertest1.lan and /etc/bind/rev.paelzertest1.lan from dns1 to dns2.
Then make it known in /etc/bind/named.conf.local to be loaded.
Finally "break" it intentional e.g. by changing the leading "$TTL" to "TTL".
That way bind works (one good zone) and serves paelzertest1 namespace (registered the conf) but it fails.
Status should show like:
  named[3534]: zone paelzertest1.lan/IN: not loaded due to errors.

Now dns1 gives me NOERROR but dns2 gives SERVFAIL for
dig test.paelzertest1.lan @192.168.122.225

; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest1.lan @192.168.122.225
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 36187

Disable caching to open the window of the race further.
Further we need to set --all-servers, otherwise it would almost randomly iterate.
$ sudo dnsmasq --resolv-file=/etc/resolv.dnsmasq.conf --no-hosts --no-daemon --log-queries --cache-size=0 --all-servers

That gives SERVFAIL when querying the dnsmasq server.
$ dig test.paelzertest1.lan @127.0.0.1

; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest1.lan @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 27511

Log from the server:
dnsmasq: query[A] test.paelzertest1.lan from 127.0.0.1
dnsmasq: forwarded test.paelzertest1.lan to 192.168.122.225

=> It didn't try the next as it considered SERVFAIL to be ok successful as an answer.

Installing the version from proposed resolves that.

$ dig test.paelzertest1.lan @127.0.0.1

; <<>> DiG 9.10.3-P4-Ubuntu <<>> test.paelzertest1.lan @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43539

Server-log:
dnsmasq: query[A] test.paelzertest1.lan from 127.0.0.1
dnsmasq: forwarded test.paelzertest1.lan to 192.168.122.225
dnsmasq: forwarded test.paelzertest1.lan to 192.168.122.79
dnsmasq: reply test.paelzertest1.lan is 192.168.1.201

With that - set verification-done