Unbound behaviour changes (wrong) when domain-insecure is set for a stub zone with multiple stub-addr(s)

Bug #1732150 reported by Richard Arends
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Unbound - Caching DNS Resolver
Unknown
Unknown
unbound (Ubuntu)
Fix Released
Medium
Unassigned
Trusty
Won't Fix
Undecided
Unassigned
Xenial
Won't Fix
Undecided
Unassigned
Bionic
Won't Fix
Undecided
Unassigned

Bug Description

[Impact]

 * DNSSEC setup with domain-insecure set fail to work.
   The lookup will process all available servers leading to a very long
   lookup time.

 * Backport upstream fix to stop checking for further trust points in that
   case.

[Test Case]

 * TBD: Waiting for the bug reporter to provide the initial steps that we
   migth refine

[Regression Potential]

 * The change will make it stop iterating for further DNSSEC records in
   certain configuration cases (domain-insecure). But this is just what
   the respective configuration is meant to do (see
   http://manpages.ubuntu.com/manpages/bionic/man5/unbound.conf.5.html)
   So it should speed up certain cases were so far it still iterated
   through servers, but giving that up early is just what it shoudl be per
   config.
   I can think of a slight behavior change due to being faster now, but
   the end result should not change due to this. With that background I
   could think of two regressions:
     a) the faster lookup makes automation wonder
     b) there would be a condition we (and upstream) missed which would
        change the actual lookup return
   Given that the code was not reverted upstream for quite a while I'd
   think the latter is only theoretical, and the former should be of low
   risk.

[Other Info]

 * n/a

---

Unbound contains a bug when domain-insecure is set for a (stub) zone. This bug is fixed, see https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=2882. Can you please backport this to the Trusty package?

With regards,
Richard Arends

Joshua Powers (powersj)
Changed in unbound (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This was fixed yesterday upstream and is not yet released anywhere.
For confidence in the change I'd prefer to wait for it to be released - not that they find issues in the fix.

The actual change is small and seems backportable even to trusty, but I had no time to check the context.

Debian currently is ~2 months behind the last upstream release, given the cadence of upstream releases this will be released ~mid december.

We can't really wait until March to pick it, but should wait for the upstream release.
Until then we can try in a ppa if it actually is backportable and there Richard can check if the backport works.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi again,
I made a ppa available to test.
Please take a look at: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3044

@Richard I'd ask you two things:
1. please test from the ppa if that fixes the issue for you and works in general
2. would you report a bug to Debian so that they can pick up the patch (or update to the next release mid december)?

Per #2 we would be able to eliminate the Delta in the long run and have next Debian and Ubuntu release fixed.
Once #1 and #2 are complete we can consider SRU into (all) older versions.

Revision history for this message
Richard Arends (l-lauuchpad-s) wrote :

Hi Christian,

Thank you for your fast response! I tested the package from the PPA and can confirm that it fixed the issue. I filed a bug at Debian on your request, see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=881999

Richard.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Richard,
glad to hear that it helps.

I'm now not "unbound expert enough" to perfectly rate the urgency of the next actions.
The official path would be to give several things time to happen
1. Upstream to release the next version of unbound
2. Debian to pick it up
3. Ubuntu to sync this new version into the latest development release
4. Per SRU policy only then I could start the SRUs into older Ubuntu releases

That certainly needs some time to happen - I'd think your case is not the most common one so that might be ok.

Also the update rate on unbound in the releases is rather low, so you could as well run with the ppa for now - once there is an official update it will upgrade over it (I'd not hope there are other updates than this until then).

I'd like to wait on those as an upstream release would increase the confidence that this is not something that needs to be reverted or follow on fixes. OTOH if more urgent then I think we could certainly take the patch and apply it rather soon to all the Ubuntu Releases.
If that is a more common case than I think could you outline so here that I can follow and also later on the SRU team understands why we can't wait.

So for now setting the bug to triaged on all releases (we know what we would do there) but waiting on upstream and Debian to pick it up.

Changed in unbound (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Richard Arends (l-lauuchpad-s) wrote :

Hi Christian,

We can wait on the official update path. As you said, the PPA version works for us and beside that, we have a workaround for this bug (disable domain-insecure for those domains).

With regards,
Richard.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Confirmed the patch from https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=2882#c7 is applied upstream.

Debian doesn't have it yet.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Dropping zesty as it's EOL.

Cosmic and bionic affected.

Changed in unbound (Ubuntu Zesty):
status: New → Won't Fix
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - final upstream fix was https://github.com/NLnetLabs/unbound/commit/52aeaf4924ec3f6689e6aafedbe41473d2bda992

Which in the meantime was released in version 1.7 and later:
$ git tag --contains 52aeaf4924ec3f6689e6aafedbe41473d2bda992 | head -n 2
release-1.7.0@4583
release-1.7.0rc1

Thereby Cosmic is already fixed.
In fact we covred steps 1-3 of my comment #4.

That said I have had a unbound SRU planned anyway.
Lets fix this for T/X/B - especially as we already know it works back to Trusty as-is.

I realized that you could help a lot by providing a bit more detailed steps to reproduce the error.
I've added the rest of the SRU Template, but would ask you to provide steps to reproduce here.
From your upstream bug report I can derive it would need setting up multiple DNS servers, then unbound with domain-insecure.
Whatever you can add to make this testable for someone that will evaluate the SRU later on will help.

Note: also killing other EOL tasks.

description: updated
no longer affects: unbound (Ubuntu Zesty)
no longer affects: unbound (Ubuntu Artful)
Changed in unbound (Ubuntu):
status: Triaged → Fix Released
Changed in unbound (Ubuntu Trusty):
status: New → Triaged
Changed in unbound (Ubuntu Xenial):
status: New → Triaged
Changed in unbound (Ubuntu Bionic):
status: New → Triaged
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

No feedback yet on helping how to test this which is a pre-req for the SRU process.

Changed in unbound (Ubuntu Trusty):
status: Triaged → Incomplete
Changed in unbound (Ubuntu Xenial):
status: Triaged → Incomplete
Changed in unbound (Ubuntu Bionic):
status: Triaged → Incomplete
Revision history for this message
Simon Déziel (sdeziel) wrote :

The steps to reproduces are mentioned in [1]. One basically needs a unbound machine and 2 DNS servers that are master for a given zone. The idea is to then simulate an outage of one of the masters and see if unbound will still try to reach that dead master even after having received an answer from the surviving master. This attempt to query both causes delays because the dead one will of course never reply leading to a timeout. This bad behavior only manifested when domain-insecure was used.

Christian, I'd really like to help here but I don't have the time (yet).

1: https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=2882#c0

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Simon,
I also "came by" here only every few weeks.
So I beg your pardon as I didn't realize it might be so clear.
OTOH there are a few steps needed to set this up for sure.
Thanks for pointing to the initial description of the nlnetlabs bug .

Lets put this back on our list then, hopefully one of us or in the community has the time to convert this into steps one can follow more easily and will work for SRU verification ...

Changed in unbound (Ubuntu Bionic):
status: Incomplete → Triaged
Changed in unbound (Ubuntu Xenial):
status: Incomplete → Triaged
Changed in unbound (Ubuntu Trusty):
status: Incomplete → Triaged
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (6.6 KiB)

To get this moving again and after Simons reply I was trying to set this up with potential copy and paste for anyone retrying.
But it doesn't reproduce the error on Bionic (where it should still happen), so please help me what is missing.

This follows:
- https://help.ubuntu.com/community/BIND9ServerHowto#Secondary_Master_Server_configuration
- https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=2882

I wonder:
- does it have to be powerdns?
- does it need multiple masters (is that even an allowed config)?
- does it need on top of all that also DNSSEC to trigger?
- what else?

Master:
add to /etc/bind/named.conf.local:
zone "example.com" {
     type master;
     file "/etc/bind/db.example.com";
     allow-transfer { 192.168.122.212; };
};

zone "122.168.192.in-addr.arpa" {
     type master;
     notify no;
     file "/etc/bind/db.192";
     allow-transfer { 192.168.122.212; };
};

Create your db /etc/bind/db.example.com like:
;
; BIND data file for local loopback interface
;
$TTL 604800
@ IN SOA example.com. root.example.com. (
                              1 ; Serial
                         604800 ; Refresh
                          86400 ; Retry
                        2419200 ; Expire
                         604800 ) ; Negative Cache TTL
;
@ IN NS example.com.
@ IN A 192.168.122.46
@ IN AAAA ::1

;test system is the client
client IN A 192.168.122.115

Reverse zone in /etc/bind/db.192 like:
;
; BIND reverse data file for local loopback interface
;
$TTL 604800
@ IN SOA ns.example.com. root.example.com. (
                              5 ; Serial
                         604800 ; Refresh
                          86400 ; Retry
                        2419200 ; Expire
                         604800 ) ; Negative Cache TTL
;
@ IN NS ns.
46 IN PTR ns.example.com.
46 IN PTR example.com.

;test system is the client
115 IN PTR client.example.com.

# restart and verify the Nameserver like:
$ systemctl restart bind9
$ dig 122.168.192.in-addr.arpa. AXFR @127.0.0.1

; <<>> DiG 9.11.3-1ubuntu1.3-Ubuntu <<>> 122.168.192.in-addr.arpa. AXFR @127.0.0.1
;; global options: +cmd
122.168.192.in-addr.arpa. 604800 IN SOA ns.example.com. root.example.com. 5 604800 86400 2419200 604800
122.168.192.in-addr.arpa. 604800 IN NS ns.
115.122.168.192.in-addr.arpa. 604800 IN PTR client.example.com.
46.122.168.192.in-addr.arpa. 604800 IN PTR ns.example.com.
46.122.168.192.in-addr.arpa. 604800 IN PTR example.com.
122.168.192.in-addr.arpa. 604800 IN SOA ns.example.com. root.example.com. 5 604800 86400 2419200 604800
;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Nov 30 11:29:17 UTC 2018
;; XFR size: 6 records (messages 1, bytes 244)

# Slave
Link it to the master in /etc/bind/named.conf.local:
zone "example.com" {
     type slave;
     file "/var/cache/bind/db.example.com";
     masters { 192.168.122.46; };
};

zone "1.168.192.in-addr.arpa" {
     type slave;
     file "/var/cache/bind...

Read more...

Changed in unbound (Ubuntu Bionic):
status: Triaged → Incomplete
Revision history for this message
Simon Déziel (sdeziel) wrote :

I'm sorry Christian, I had the terminology wrong as you figured you need 1 master and 1 slave.

Your setup looks sane except for the TTL of your test record that seems way too high. I believe unbound will simply answer from it's cache without trying to contact any authoritative server. To reproduce, I think the TTL must be smaller than the simulated authoritative server downtime.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for the hint Simon,
I have dropped the TTL to 10 and confirmed that the dig output has it as "10" through unbound.
Also I checked that the TTL drops per second and the query gets refreshed after those 10 sec expire.

$ dig client.example.com @127.0.0.1
...
;; ANSWER SECTION:
client.example.com. 10 IN A 192.168.122.115

In addition between the steps I flushed the clients unbound cache to be sure:
  $ sudo unbound-control flush_zone example.com
  ok removed 4 rrsets, 1 messages and 0 key entries

With that I shut down the slave and waited for the timeout - still fast to resolve.
I then started the slave again and stopped the master - still fast to resolve from the client.

Something is still missing - some more caching I don't know of maybe? :-/

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Xenial and Trusty have reached end of standard support.

Changed in unbound (Ubuntu Trusty):
status: Triaged → Won't Fix
Changed in unbound (Ubuntu Xenial):
status: Triaged → Won't Fix
Changed in unbound (Ubuntu Bionic):
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.