Ubuntu
unbound package

Unbound behaviour changes (wrong) when domain-insecure is set for a stub zone with multiple stub-addr(s)

Bug #1732150 reported by Richard Arends on 2017-11-14

This bug affects 1 person

	Status	Importance	Assigned to
Unbound - Caching DNS Resolver	Unknown	Unknown	auto-www.nlnetlabs.nl #2882
unbound (Ubuntu)	Fix Released	Medium	Unassigned
Trusty	Won't Fix	Undecided	Unassigned
Xenial	Won't Fix	Undecided	Unassigned
Bionic	Won't Fix	Undecided	Unassigned

Bug Description

[Impact]

* DNSSEC setup with domain-insecure set fail to work.
The lookup will process all available servers leading to a very long
lookup time.

* Backport upstream fix to stop checking for further trust points in that
case.

[Test Case]

* TBD: Waiting for the bug reporter to provide the initial steps that we
migth refine

[Regression Potential]

* The change will make it stop iterating for further DNSSEC records in
   certain configuration cases (domain-insecure). But this is just what
   the respective configuration is meant to do (see
   http://manpages.ubuntu.com/manpages/bionic/man5/unbound.conf.5.html)
   So it should speed up certain cases were so far it still iterated
   through servers, but giving that up early is just what it shoudl be per
   config.
   I can think of a slight behavior change due to being faster now, but
   the end result should not change due to this. With that background I
   could think of two regressions:
     a) the faster lookup makes automation wonder
     b) there would be a condition we (and upstream) missed which would
        change the actual lookup return
   Given that the code was not reverted upstream for quite a while I'd
   think the latter is only theoretical, and the former should be of low
   risk.

[Other Info]

* n/a

---

Unbound contains a bug when domain-insecure is set for a (stub) zone. This bug is fixed, see https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=2882. Can you please backport this to the Trusty package?

With regards,
Richard Arends

See original description

Joshua Powers (powersj) on 2017-11-16

Changed in unbound (Ubuntu):
status:	New → Confirmed
importance:	Undecided → Medium

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2017-11-17:

This was fixed yesterday upstream and is not yet released anywhere.
For confidence in the change I'd prefer to wait for it to be released - not that they find issues in the fix.

The actual change is small and seems backportable even to trusty, but I had no time to check the context.

Debian currently is ~2 months behind the last upstream release, given the cadence of upstream releases this will be released ~mid december.

We can't really wait until March to pick it, but should wait for the upstream release.
Until then we can try in a ppa if it actually is backportable and there Richard can check if the backport works.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2017-11-17:

Hi again,
I made a ppa available to test.
Please take a look at: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3044

@Richard I'd ask you two things:
1. please test from the ppa if that fixes the issue for you and works in general
2. would you report a bug to Debian so that they can pick up the patch (or update to the next release mid december)?

Per #2 we would be able to eliminate the Delta in the long run and have next Debian and Ubuntu release fixed.
Once #1 and #2 are complete we can consider SRU into (all) older versions.

Revision history for this message

Richard Arends (l-lauuchpad-s) wrote on 2017-11-17:

Hi Christian,

Thank you for your fast response! I tested the package from the PPA and can confirm that it fixed the issue. I filed a bug at Debian on your request, see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=881999

Richard.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2017-11-20:

Hi Richard,
glad to hear that it helps.

I'm now not "unbound expert enough" to perfectly rate the urgency of the next actions.
The official path would be to give several things time to happen
1. Upstream to release the next version of unbound
2. Debian to pick it up
3. Ubuntu to sync this new version into the latest development release
4. Per SRU policy only then I could start the SRUs into older Ubuntu releases

That certainly needs some time to happen - I'd think your case is not the most common one so that might be ok.

Also the update rate on unbound in the releases is rather low, so you could as well run with the ppa for now - once there is an official update it will upgrade over it (I'd not hope there are other updates than this until then).

I'd like to wait on those as an upstream release would increase the confidence that this is not something that needs to be reverted or follow on fixes. OTOH if more urgent then I think we could certainly take the patch and apply it rather soon to all the Ubuntu Releases.
If that is a more common case than I think could you outline so here that I can follow and also later on the SRU team understands why we can't wait.

So for now setting the bug to triaged on all releases (we know what we would do there) but waiting on upstream and Debian to pick it up.

Changed in unbound (Ubuntu):
status:	Confirmed → Triaged

Revision history for this message

Richard Arends (l-lauuchpad-s) wrote on 2017-11-21:

Hi Christian,

We can wait on the official update path. As you said, the PPA version works for us and beside that, we have a workaround for this bug (disable domain-insecure for those domains).

With regards,
Richard.

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2018-05-21:

Confirmed the patch from https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=2882#c7 is applied upstream.

Debian doesn't have it yet.

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2018-05-21:

Dropping zesty as it's EOL.

Cosmic and bionic affected.

Changed in unbound (Ubuntu Zesty):
status:	New → Won't Fix

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-08-28:

FYI - final upstream fix was https://github.com/NLnetLabs/unbound/commit/52aeaf4924ec3f6689e6aafedbe41473d2bda992

Which in the meantime was released in version 1.7 and later:
$ git tag --contains 52aeaf4924ec3f6689e6aafedbe41473d2bda992 | head -n 2
release-1.7.0@4583
release-1.7.0rc1

Thereby Cosmic is already fixed.
In fact we covred steps 1-3 of my comment #4.

That said I have had a unbound SRU planned anyway.
Lets fix this for T/X/B - especially as we already know it works back to Trusty as-is.

I realized that you could help a lot by providing a bit more detailed steps to reproduce the error.
I've added the rest of the SRU Template, but would ask you to provide steps to reproduce here.
From your upstream bug report I can derive it would need setting up multiple DNS servers, then unbound with domain-insecure.
Whatever you can add to make this testable for someone that will evaluate the SRU later on will help.

Note: also killing other EOL tasks.

description:	updated
no longer affects:	unbound (Ubuntu Zesty)
no longer affects:	unbound (Ubuntu Artful)
Changed in unbound (Ubuntu):
status:	Triaged → Fix Released
Changed in unbound (Ubuntu Trusty):
status:	New → Triaged
Changed in unbound (Ubuntu Xenial):
status:	New → Triaged
Changed in unbound (Ubuntu Bionic):
status:	New → Triaged

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-11-29:

No feedback yet on helping how to test this which is a pre-req for the SRU process.

Changed in unbound (Ubuntu Trusty):
status:	Triaged → Incomplete
Changed in unbound (Ubuntu Xenial):
status:	Triaged → Incomplete
Changed in unbound (Ubuntu Bionic):
status:	Triaged → Incomplete

Revision history for this message

Simon Déziel (sdeziel) wrote on 2018-11-29:

#10

The steps to reproduces are mentioned in [1]. One basically needs a unbound machine and 2 DNS servers that are master for a given zone. The idea is to then simulate an outage of one of the masters and see if unbound will still try to reach that dead master even after having received an answer from the surviving master. This attempt to query both causes delays because the dead one will of course never reply leading to a timeout. This bad behavior only manifested when domain-insecure was used.

Christian, I'd really like to help here but I don't have the time (yet).

1: https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=2882#c0

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-11-29:

#11

Hi Simon,
I also "came by" here only every few weeks.
So I beg your pardon as I didn't realize it might be so clear.
OTOH there are a few steps needed to set this up for sure.
Thanks for pointing to the initial description of the nlnetlabs bug .

Lets put this back on our list then, hopefully one of us or in the community has the time to convert this into steps one can follow more easily and will work for SRU verification ...

Changed in unbound (Ubuntu Bionic):
status:	Incomplete → Triaged
Changed in unbound (Ubuntu Xenial):
status:	Incomplete → Triaged
Changed in unbound (Ubuntu Trusty):
status:	Incomplete → Triaged

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-11-30:

#12

Download full text (6.6 KiB)

To get this moving again and after Simons reply I was trying to set this up with potential copy and paste for anyone retrying.
But it doesn't reproduce the error on Bionic (where it should still happen), so please help me what is missing.

This follows:
- https://help.ubuntu.com/community/BIND9ServerHowto#Secondary_Master_Server_configuration
- https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=2882

I wonder:
- does it have to be powerdns?
- does it need multiple masters (is that even an allowed config)?
- does it need on top of all that also DNSSEC to trigger?
- what else?

Master:
add to /etc/bind/named.conf.local:
zone "example.com" {
     type master;
     file "/etc/bind/db.example.com";
     allow-transfer { 192.168.122.212; };
};

zone "122.168.192.in-addr.arpa" {
     type master;
     notify no;
     file "/etc/bind/db.192";
     allow-transfer { 192.168.122.212; };
};

Create your db /etc/bind/db.example.com like:
;
; BIND data file for local loopback interface
;
$TTL 604800
@ IN SOA example.com. root.example.com. (
                              1 ; Serial
                         604800 ; Refresh
                          86400 ; Retry
                        2419200 ; Expire
                         604800 ) ; Negative Cache TTL
;
@ IN NS example.com.
@ IN A 192.168.122.46
@ IN AAAA ::1

;test system is the client
client IN A 192.168.122.115

Reverse zone in /etc/bind/db.192 like:
;
; BIND reverse data file for local loopback interface
;
$TTL 604800
@ IN SOA ns.example.com. root.example.com. (
                              5 ; Serial
                         604800 ; Refresh
                          86400 ; Retry
                        2419200 ; Expire
                         604800 ) ; Negative Cache TTL
;
@ IN NS ns.
46 IN PTR ns.example.com.
46 IN PTR example.com.

;test system is the client
115 IN PTR client.example.com.

# restart and verify the Nameserver like:
$ systemctl restart bind9
$ dig 122.168.192.in-addr.arpa. AXFR @127.0.0.1

; <<>> DiG 9.11.3-1ubuntu1.3-Ubuntu <<>> 122.168.192.in-addr.arpa. AXFR @127.0.0.1
;; global options: +cmd
122.168.192.in-addr.arpa. 604800 IN SOA ns.example.com. root.example.com. 5 604800 86400 2419200 604800
122.168.192.in-addr.arpa. 604800 IN NS ns.
115.122.168.192.in-addr.arpa. 604800 IN PTR client.example.com.
46.122.168.192.in-addr.arpa. 604800 IN PTR ns.example.com.
46.122.168.192.in-addr.arpa. 604800 IN PTR example.com.
122.168.192.in-addr.arpa. 604800 IN SOA ns.example.com. root.example.com. 5 604800 86400 2419200 604800
;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Nov 30 11:29:17 UTC 2018
;; XFR size: 6 records (messages 1, bytes 244)

# Slave
Link it to the master in /etc/bind/named.conf.local:
zone "example.com" {
     type slave;
     file "/var/cache/bind/db.example.com";
     masters { 192.168.122.46; };
};

zone "1.168.192.in-addr.arpa" {
type slave;
file "/var/cache/bind...

This follows:
- https://help.ubuntu.com/community/BIND9ServerHowto#Secondary_Master_Server_configuration
- https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=2882

I wonder:
- does it have to be powerdns?
- does it need multiple masters (is that even an allowed config)?
- does it need on top of all that also DNSSEC to trigger?
- what else?

Master:
add to /etc/bind/named.conf.local:
zone "example.com" {
     type master;
     file "/etc/bind/db.example.com";
     allow-transfer { 192.168.122.212; };
};

zone "122.168.192.in-addr.arpa" {
     type master;
     notify no;
     file "/etc/bind/db.192";
     allow-transfer { 192.168.122.212; };
};

Create your db /etc/bind/db.example.com like:
;
; BIND data file for local loopback interface
;
$TTL    604800
@       IN      SOA     example.com. root.example.com. (
                              1         ; Serial
                         604800         ; Refresh
                          86400         ; Retry
                        2419200         ; Expire
                         604800 )       ; Negative Cache TTL
;
@       IN      NS      example.com.
@       IN      A       192.168.122.46
@       IN      AAAA    ::1

;test system is the client
client  IN      A       192.168.122.115

Reverse zone in /etc/bind/db.192 like:
;
; BIND reverse data file for local loopback interface
;
$TTL    604800
@       IN      SOA     ns.example.com. root.example.com. (
                              5         ; Serial
                         604800         ; Refresh
                          86400         ; Retry
                        2419200         ; Expire
                         604800 )       ; Negative Cache TTL
;
@       IN      NS      ns.
46      IN      PTR     ns.example.com.
46      IN      PTR     example.com.

;test system is the client
115     IN      PTR    client.example.com.

# restart and verify the Nameserver like:
$ systemctl restart bind9
$ dig 122.168.192.in-addr.arpa. AXFR @127.0.0.1

; <<>> DiG 9.11.3-1ubuntu1.3-Ubuntu <<>> 122.168.192.in-addr.arpa. AXFR @127.0.0.1
;; global options: +cmd
122.168.192.in-addr.arpa. 604800 IN     SOA     ns.example.com. root.example.com. 5 604800 86400 2419200 604800
122.168.192.in-addr.arpa. 604800 IN     NS      ns.
115.122.168.192.in-addr.arpa. 604800 IN PTR     client.example.com.
46.122.168.192.in-addr.arpa. 604800 IN  PTR     ns.example.com.
46.122.168.192.in-addr.arpa. 604800 IN  PTR     example.com.
122.168.192.in-addr.arpa. 604800 IN     SOA     ns.example.com. root.example.com. 5 604800 86400 2419200 604800
;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Nov 30 11:29:17 UTC 2018
;; XFR size: 6 records (messages 1, bytes 244)

# Slave
Link it to the master in /etc/bind/named.conf.local:
zone "example.com" {
     type slave;
     file "/var/cache/bind/db.example.com";
     masters { 192.168.122.46; };
};

zone "1.168.192.in-addr.arpa" {
     type slave;
     file "/var/cache/bind/db.192";
     masters { 192.168.122.46; };
};

On a restart you should see a successful transfer
$ sudo systemctl restart bind9
$ sudo systemctl status bind9
● bind9.service - BIND Domain Name Server
   Loaded: loaded (/lib/systemd/system/bind9.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2018-11-30 11:33:54 UTC; 1s ago
     Docs: man:named(8)
  Process: 1154 ExecStop=/usr/sbin/rndc stop (code=exited, status=0/SUCCESS)
 Main PID: 1294 (named)
    Tasks: 7 (limit: 547)
   CGroup: /system.slice/bind9.service
           └─1294 /usr/sbin/named -f -u bind

Nov 30 11:33:54 bionic-unb-slave named[1294]: checkhints: b.root-servers.net/A (192.228.79.201) extra record in hints
Nov 30 11:33:54 bionic-unb-slave named[1294]: checkhints: b.root-servers.net/AAAA (2001:500:200::b) missing from hints
Nov 30 11:33:54 bionic-unb-slave named[1294]: checkhints: b.root-servers.net/AAAA (2001:500:84::b) extra record in hints
Nov 30 11:33:54 bionic-unb-slave named[1294]: checkhints: l.root-servers.net/AAAA (2001:500:9f::42) missing from hints
Nov 30 11:33:54 bionic-unb-slave named[1294]: checkhints: l.root-servers.net/AAAA (2001:500:3::42) extra record in hints
Nov 30 11:33:55 bionic-unb-slave named[1294]: zone example.com/IN: Transfer started.
Nov 30 11:33:55 bionic-unb-slave named[1294]: transfer of 'example.com/IN' from 192.168.122.46#53: connected using 192.168.122.212#56633
Nov 30 11:33:55 bionic-unb-slave named[1294]: zone example.com/IN: transferred serial 6
Nov 30 11:33:55 bionic-unb-slave named[1294]: transfer of 'example.com/IN' from 192.168.122.46#53: Transfer status: success
Nov 30 11:33:55 bionic-unb-slave named[1294]: transfer of 'example.com/IN' from 192.168.122.46#53: Transfer completed: 1 messages, 6 records, 187 bytes, 0.003 secs (62333 bytes/sec)

Both can resolve client.example.com on 127.0.0.1 now.

Client
The client can now resolve against both servers IPs
$ dig client.example.com @192.168.122.46
$ dig client.example.com @192.168.122.212
# both delivers 192.168.122.115 in the current example

# install unbound
$ apt install unbound
# configure unbound to the two bind9 servers in /etc/unbound/unbound.conf.d/exampl.com.conf
domain-insecure: "example.com"

stub-zone:
  name: "example.com"
  stub-addr: 192.168.122.46
  stub-addr: 192.168.122.212

$ sudo systemctl restart unbound.service

# you should it see run locally
$ sudo netstat -eeapn  | grep unb
tcp        0      0 127.0.0.1:53            0.0.0.0:*               LISTEN      0          22350      1422/unbound

Local resolve now goes through unbound to one of the bin9 servers.
$ dig client.example.com @127.0.0.1

; <<>> DiG 9.11.3-1ubuntu1.3-Ubuntu <<>> client.example.com @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38064
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 3

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;client.example.com.            IN      A

;; ANSWER SECTION:
client.example.com.     604800  IN      A       192.168.122.115

;; AUTHORITY SECTION:
example.com.            604800  IN      NS      example.com.

;; ADDITIONAL SECTION:
example.com.            604800  IN      A       192.168.122.46
example.com.            604800  IN      AAAA    ::1

;; Query time: 5 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Nov 30 11:43:40 UTC 2018
;; MSG SIZE  rcvd: 121

But I can shut down one of the bind9 servers just fine and still get fast responses from unbound.

Changed in unbound (Ubuntu Bionic):
status:	Triaged → Incomplete

Revision history for this message

Simon Déziel (sdeziel) wrote on 2018-11-30:

#13

I'm sorry Christian, I had the terminology wrong as you figured you need 1 master and 1 slave.

Your setup looks sane except for the TTL of your test record that seems way too high. I believe unbound will simply answer from it's cache without trying to contact any authoritative server. To reproduce, I think the TTL must be smaller than the simulated authoritative server downtime.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-12-03:

#14

Thanks for the hint Simon,
I have dropped the TTL to 10 and confirmed that the dig output has it as "10" through unbound.
Also I checked that the TTL drops per second and the query gets refreshed after those 10 sec expire.

$ dig client.example.com @127.0.0.1
...
;; ANSWER SECTION:
client.example.com. 10 IN A 192.168.122.115

In addition between the steps I flushed the clients unbound cache to be sure:
$ sudo unbound-control flush_zone example.com
ok removed 4 rrsets, 1 messages and 0 key entries

With that I shut down the slave and waited for the timeout - still fast to resolve.
I then started the slave again and stopped the master - still fast to resolve from the client.

Something is still missing - some more caching I don't know of maybe? :-/

Revision history for this message

Sergio Durigan Junior (sergiodj) wrote on 2022-02-10:

#15

Xenial and Trusty have reached end of standard support.

Changed in unbound (Ubuntu Trusty):
status:	Triaged → Won't Fix
Changed in unbound (Ubuntu Xenial):
status:	Triaged → Won't Fix

Sergio Durigan Junior (sergiodj) on 2023-09-20

Changed in unbound (Ubuntu Bionic):
status:	Incomplete → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

debbugs #881999
[done normal] Edit
auto-www.nlnetlabs.nl #2882 Edit

Bug watches keep track of this bug in other bug trackers.

Ubuntuunbound package

Unbound behaviour changes (wrong) when domain-insecure is set for a stub zone with multiple stub-addr(s)

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
unbound package