BIND crashes with failed assertion INSIST(dns_name_issubdomain(&fctx->name, &fctx->domain))

Bug #1896740 reported by Marian Rainer-Harbach on 2020-09-23
274
This bug affects 2 people
Affects Status Importance Assigned to Milestone
bind9 (Ubuntu)
Undecided
Unassigned
Focal
Undecided
Christian Ehrhardt 

Bug Description

[Impact]

 * There are rare configurations and situations that lead to bind9
   code that works like an assert aborting the daemon.
   It was found that these rare conditions can happen in some setups
   and that a log-error would be better than an abort.

 * Backport the upstream change to help the users affected by this issue.

[Test Case]

 * Sadly we have no well isolated testcase yet, but an awesome and
   responsive bug reporter who has a setup that usually triggered it a few
   times over 1-2 weeks. Gladly the fix converts a former assert-abort
   into a log warning. Therefore once we can check these logs and once the
   messages appear know that the bug "would have happened but got fixed.

[Regression Potential]

 * The change only is on a path that formerly was causing an abort of the
   daemon. There a conversion of this assert to a log-warning should only
   affect people that formerly had it crashing - and for them it is a fix.
   Other use-cases should be unaffected.
   Trying to think of regressions I can only come to e.g. CI-testcases
   which expect fails on this case but now would work. Yet it would do so
   as well with newer versions of bind as this is no Ubuntu Delta but a
   backport.

 * [racb] Changing an abort() to log and continue instead might leave the process in a bad state leading to a future unexplained crash, loss of data or even a security vulnerability. In this case though the fix is cherry-picked from upstream so we assume that they are confident that it is safe. Nevertheless, if there is a regression, I think that this is what's likely to happen, originally triggered "when the resolver is qname minimizing and forwarding at the same time".

[Other Info]

 * n/a

---

I'm running Ubuntu 20.04 and bind9 1:9.16.1-0ubuntu2.3.

BIND sometimes crashes with:
Sep 23 10:10:12 <hostname> named[1146]: timed out resolving 'www.independent.co.uk/TYPE65/IN': 2606:4700:4700::1001#53
Sep 23 10:10:12 <hostname> named[1146]: timed out resolving 'ssc.independent.co.uk/TYPE65/IN': 2606:4700:4700::1001#53
Sep 23 10:10:13 <hostname> named[1146]: timed out resolving 'www.independent.co.uk/TYPE65/IN': 1.0.0.1#53
Sep 23 10:10:13 <hostname> named[1146]: timed out resolving 'ssc.independent.co.uk/TYPE65/IN': 1.0.0.1#53
Sep 23 10:10:14 <hostname> named[1146]: timed out resolving 'www.independent.co.uk/TYPE65/IN': 2606:4700:4700::1111#53
Sep 23 10:10:14 <hostname> named[1146]: timed out resolving 'ssc.independent.co.uk/TYPE65/IN': 2606:4700:4700::1111#53
Sep 23 10:10:15 <hostname> named[1146]: timed out resolving 'www.independent.co.uk/TYPE65/IN': 1.1.1.1#53
Sep 23 10:10:16 <hostname> named[1146]: timed out resolving 'ssc.independent.co.uk/TYPE65/IN': 1.1.1.1#53
Sep 23 10:10:16 <hostname> named[1146]: resolver.c:5105: INSIST(dns_name_issubdomain(&fctx->name, &fctx->domain)) failed, back trace
Sep 23 10:10:16 <hostname> named[1146]: #0 0x558b762dce43 in ??
Sep 23 10:10:16 <hostname> named[1146]: #1 0x7f390cf32ac0 in ??
Sep 23 10:10:16 <hostname> named[1146]: #2 0x7f390d0fb12d in ??
Sep 23 10:10:16 <hostname> named[1146]: #3 0x7f390d0fe940 in ??
Sep 23 10:10:16 <hostname> named[1146]: #4 0x7f390d104888 in ??
Sep 23 10:10:16 <hostname> named[1146]: #5 0x7f390d108f21 in ??
Sep 23 10:10:16 <hostname> named[1146]: #6 0x7f390d109a66 in ??
Sep 23 10:10:16 <hostname> named[1146]: #7 0x7f390d109ec6 in ??
Sep 23 10:10:16 <hostname> named[1146]: #8 0x7f390cf59fe1 in ??
Sep 23 10:10:16 <hostname> named[1146]: #9 0x7f390ca22609 in ??
Sep 23 10:10:16 <hostname> named[1146]: #10 0x7f390c943103 in ??
Sep 23 10:10:16 <hostname> named[1146]: exiting (due to assertion failure)

The crash has ocurred repeatedly on multiple hosts.

In all cases I observed BIND has logged timeouts immediately before each crash.

Related branches

CVE References

It seems that this has already been reported upstream in https://gitlab.isc.org/isc-projects/bind9/-/issues/1997 and was fixed in 9.16.6.

A CVE number was assigned as well, CVE-2020-8621 https://kb.isc.org/docs/cve-2020-8621.

Please pull the fix into Focal's BIND.

information type: Public → Public Security
Andreas Hasenack (ahasenack) wrote :

9.16.1-0ubuntu2.3 has the patch for 2020-8621:
bind9 (1:9.16.1-0ubuntu2.3) focal-security; urgency=medium

  * SECURITY UPDATE: A specially crafted large TCP payload can trigger an
    assertion failure
    - debian/patches/CVE-2020-8620.patch: add extra checks to
      lib/isc/netmgr/netmgr-int.h, lib/isc/netmgr/netmgr.c,
      lib/isc/netmgr/tcp.c, lib/isc/netmgr/udp.c.
    - CVE-2020-8620
  * SECURITY UPDATE: Attempting QNAME minimization after forwarding can
    lead to an assertion failure
    - debian/patches/CVE-2020-8621.patch: disable QNAME minimization in
      lib/dns/resolver.c.
    - CVE-2020-8621
...

Maybe this is https://gitlab.isc.org/isc-projects/bind9/-/commit/0a22024c270a38a54f0d51621a046b726df158c0 ? Fixed in debian too:

bind9 (1:9.16.6-3) unstable; urgency=medium

  [ Ondřej Surý ]
  * Add upstream patches to fix some rare conditions (Closes: #969448)

  [ Bernhard Schmidt ]
  * Set Restart=on-failure in systemd unit

 -- Bernhard Schmidt <email address hidden> Tue, 15 Sep 2020 00:26:14 +0200

no longer affects: bind9 (Debian)

I agree, the Debian bug #969448 looks very similar to the crashes I experienced. Is there a timeline for the Debian patches to be merged into Ubuntu's BIND?

Andreas Hasenack (ahasenack) wrote :

They were already, but for groovy (the ubuntu version currently in development). This is a case for backporting them to focal. Since I have no clear way to reproduce this crash, would you be willing to test packages from a ppa to confirm the fix?

Sure, I can test the packages from the PPA. However, I also don't have a way to reproduce the crash. I could only leave the PPA version running for some time (e.g., two weeks) and see whether it crashes or not...

tags: added: server-next

I looked at the changes Andreas pointed to, but I'm unsure they are really the same right now.

- 0003-Print-diagnostics-on-dns_name_issubdomain-failure-in.patch
  applied as-is, but is about diagnostics and not fixing a crash as reported here

- 0004-Fix-off-by-one-error-when-calculating-new-hashtable-.patch
  does not apply as-is
  Tracking this I found this would also need (at least) the backport of:
  https://github.com/isc-projects/bind9/commit/e24bc324b455d9cad7b51acd3d5c7b4e40c66187
  https://github.com/isc-projects/bind9/commit/1e043a011b9fe3f62f9f5c7a9b74b44adc03ca44

But that in turn let me realize that the bug that "78543ad" was only introduced by "e24bc324".
So we won't backport "e24bc324" (which is a massive rework of the hash handling) just to fix it.

@Andreas - what do you think might this - despite the similarity - be a different issue after all. Maybe a different off-by-one/crash but in a similar place to look so similar?
If so do we have a crash dump of that fail that Marian reported to take a look?

Andreas Hasenack (ahasenack) wrote :

Isn't the 0003-Print-diagnostics.... patch removing the crash by *replacing* it with a diagnostics output? The crash was caused by the INSIST() macro, which calls abort().

Hmm, ok your context experience made you look deep enough :-)
Thanks Andrea!

I'll give just 0003 a try then and provide it as PPA tomorrow.

Changed in bind9 (Ubuntu):
status: New → Fix Released
Changed in bind9 (Ubuntu Focal):
status: New → Triaged

Once the build is complete the PPA [1] would hold a package worth to test for you.
Please let me know if that works for you and - after some time has passed - if it is more stable in regard to the bug that you hit before.

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4283

Hi Christian,

thanks for the PPA! I installed the new version and I'm now running 1:9.16.1-0ubuntu2.4~ppa1. I'll report back if BIND crashes again or if the "..is not subdomain of..." message from the patch is logged.

Hi Christian,

I'm now seeing the warning message from the patch in my log and BIND has _not_ crashed:

Oct 06 21:30:52 <hostname> named[14587]: timed out resolving 'e4478.a.akamaiedge.net/A/IN': 2606:4700:4700::1111#53
Oct 06 21:30:53 <hostname> named[14587]: timed out resolving 'e6858.dsce9.akamaiedge.net/A/IN': 1.1.1.1#53
Oct 06 21:30:53 <hostname> named[14587]: timed out resolving 'e4478.a.akamaiedge.net/A/IN': 1.1.1.1#53
Oct 06 21:30:54 <hostname> named[14587]: resolver.c:5107: unexpected error:
Oct 06 21:30:54 <hostname> named[14587]: '_.facebook.com/A' is not subdomain of 'c10r.facebook.com'

So this looks good now :) Can you release the patch in the regular BIND package for Focal? Thanks!

Hi Marian,
Thanks for the testing - yes I think I can now prepare an SRU for Focal.
You'll then (once in focal-proposed) be asked to test it again, but until then this is on me.

Changed in bind9 (Ubuntu Focal):
assignee: nobody → Christian Ehrhardt  (paelzer)

SRU Template added to the description and uploaded to Focal-unapproved.

description: updated
Robie Basak (racb) on 2020-10-14
description: updated

Hello Marian, or anyone else affected,

Accepted bind9 into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/bind9/1:9.16.1-0ubuntu2.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in bind9 (Ubuntu Focal):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-focal

Thanks, I installed bind9 1:9.16.1-0ubuntu2.4 from proposed and I'll report back in two or three weeks with the results.

Thank you Marian, looking forward to hear back from you then.

To post a comment you must log in.
This report contains Public Security information  Edit
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.