named crashes on REQUIRE((disp->attributes assert
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
bind9 (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Xenial |
Fix Released
|
High
|
Unassigned |
Bug Description
[Impact]
* A race in the handling of the dispatcher can trigger a crash.
The reason is an assertion of a case that can actually happen (rarely
but it can)
* The fix is very small and essentially converts the assert into an early
return here a quote of the added comment:
If the attribute DNS_DISPATCHATT
the dispatch is already handling a recv; return immediately.
[Test Case]
* That is the hardest part on this SRU, this is a race and neither in the
upstream bug [1] nor here someone was able to come up with clear repro
steps. I'm afraid we might just review code and probably keep it in
proposed some extra time?
[Regression Potential]
* The change is minimal and upstream (as well as in Ubuntu releases) for
quite some time now. So I'm confident it isn't entirely broken.
The old code was preventing an odd condition to happen, the new code
still does only instead of an aborting assert it now is an early
return.
The regressions I could think of are only theoretical - like someone
having a test for this and now wondering it works - not really an
issue. No really the only issue I can think of is if that early return
on the return path would trigger a bug as it e.g. can't handle the
returned null properly. But TBH that would replace one crash (the
current one) with another one, so it isn't that bad.
[Other Info]
* This isn't very frequent at least to the crash DB [2] (others are :-/)
but at least this one has a clearly outlined solution.
[1]: https:/
[2]: https:/
---
Ubuntu xenial 16.04, bind9 1:9.10.
Yesterday the named process started crashing frequently, 49 crashes so far on 49 different servers around the world (one crash each!). We did run OS upgrades yesterday, but bind9 packages were not updated at this time. This particular bind9 package version was mostly deployed out last month. Due to the sudden surge of crashes and the distribution I'm suspecting this might be triggered remotely by an incoming packet.
Backtrace from the assert:
2019-06-
2019-06-
2019-06-
2019-06-
2019-06-
2019-06-
2019-06-
2019-06-
2019-06-
Related branches
- Andreas Hasenack: Approve
- Canonical Server: Pending requested
- git-ubuntu developers: Pending requested
-
Diff: 166 lines (+132/-0)5 files modifieddebian/changelog (+8/-0)
debian/patches/fix-shutdown-race.diff (+41/-0)
debian/patches/series (+3/-0)
debian/patches/ubuntu/lp-1833400-master-Remove-REQUIRE-preventing-change-4592-from-wo.patch (+33/-0)
debian/patches/ubuntu/lp-1833400-master-fix-dispatch.c-shutdown-race.patch (+47/-0)
description: | updated |
description: | updated |
Changed in bind9 (Ubuntu Xenial): | |
status: | Incomplete → Triaged |
Thanks for your report. Did named crash exactly once per server, without any further crashes after the service was restarted?
Do have a list of the packages that got updated in the upgrade you performed before the crashes happened? Is there a difference in the upgraded packageset between the servers where named crashes and those where it didn't?
At the moment I don't really have enough elements to tell anything, but I'd try to understand if one of the upgraded packages is something named depends on (e.g. a shared library), and a if restart of the service should have been triggered by the upgrade
I'm marking this report as Incomplete for now, which is our way to mark bugs for which we asked for more information. Once you provided it please change the status back to New, and we'll look at it again. Thank you!