[2.3a1] named stuck on reload, DNS broken

Bug #1710278 reported by Mark Shuttleworth on 2017-08-11
62
This bug affects 6 people
Affects Status Importance Assigned to Milestone
BIND
Undecided
Unassigned
MAAS
Status tracked in 2.7
2.2
Critical
Blake Rouse
2.6
Critical
Blake Rouse
2.7
Critical
Blake Rouse
bind9 (Ubuntu)
Status tracked in Eoan
Xenial
Medium
Unassigned
Bionic
Medium
Unassigned
Disco
Medium
Unassigned
Eoan
Medium
Unassigned
maas (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
Unassigned

Bug Description

Am running an HA MAAS, but every few days named gets stuck on one of the region controllers.

systemd thinks the service is running, but it doe not respond to any commands or requests. Also, it doesn't respond to signals other than kill -9. service restarts hang, rndc hangs.

I have attached logs and a core dump of named.

Tags: sts Edit Tag help

Related branches

Mark Shuttleworth (sabdfl) wrote :
Mark Shuttleworth (sabdfl) wrote :
Mike Pontillo (mpontillo) wrote :
Download full text (4.6 KiB)

Assuming the debug symbols I grabbed[1] for my install of bind9 on Xenial match yours (I have bind9 version 1:9.10.3.dfsg.P4-8ubuntu1.7 installed per "apt-cache policy bind9"), I did the following to grab a traceback:

$ sudo apt-get install bind9-dbgsym libdns162-dbgsym libisc160-dbgsym
$ gdb /usr/sbin/named core
(gdb) set pagination off
(gdb) thread apply all bt
... [2] ...

Looking at the backtrace in [2], the interesting parts to me are threads 8, 11 and 20, which are possibly involved in a deadlock[3]. Looks like one of the threads is reloading the configuration (something we would expect MAAS to do), and the other is calling dns_resolver_shutdown() via view_flushanddetach().

[1]: https://wiki.ubuntu.com/Debug%20Symbol%20Packages
[2]: http://paste.ubuntu.com/25292729/
[3]:
Thread 8 (Thread 0x7f95226aa700 (LWP 3203)):
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f952a351efe in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f94d4014fe8) at ../nptl/pthread_mutex_lock.c:135
#2 0x00007f952b7a0794 in dns_view_weakdetach (viewp=viewp@entry=0x7f9504389780) at ../../../lib/dns/view.c:597
#3 0x00007f952b7993de in destroy (val=0x7f9504389750) at ../../../lib/dns/validator.c:3891
#4 0x00007f952b79927b in dns_validator_destroy (validatorp=validatorp@entry=0x7f9519462628) at ../../../lib/dns/validator.c:3915
#5 0x00007f952b76b9d1 in validated (task=<optimized out>, event=0x7f95194625d0) at ../../../lib/dns/resolver.c:4722
#6 0x00007f952a9a6360 in dispatch (manager=0x7f952be3b010) at ../../../lib/isc/task.c:1130
#7 run (uap=0x7f952be3b010) at ../../../lib/isc/task.c:1302
#8 0x00007f952a34f6ba in start_thread (arg=0x7f95226aa700) at pthread_create.c:333
#9 0x00007f9529a993dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 11 (Thread 0x7f9520ea7700 (LWP 3206)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00007f952a9a516b in isc__task_beginexclusive (task0=<optimized out>) at ../../../lib/isc/task.c:1717
#2 0x0000557c34997dc1 in load_configuration (filename=<optimized out>, server=server@entry=0x7f952be44010, first_time=first_time@entry=isc_boolean_false) at ../../../bin/named/server.c:5651
#3 0x0000557c3499a826 in loadconfig (server=0x7f952be44010) at ../../../bin/named/server.c:7162
#4 0x0000557c3499ad48 in reload (server=0x7f952be44010) at ../../../bin/named/server.c:7183
#5 ns_server_reloadcommand (server=0x7f952be44010, args=args@entry=0x7f94fc120af0 "reload", text=text@entry=0x7f9520ea6590) at ../../../bin/named/server.c:7416
#6 0x0000557c34975db5 in ns_control_docommand (message=<optimized out>, text=text@entry=0x7f9520ea6590) at ../../../bin/named/control.c:102
#7 0x0000557c34978b97 in control_recvmessage (task=0x7f952be51010, event=<optimized out>) at ../../../bin/named/controlconf.c:458
#8 0x00007f952a9a6360 in dispatch (manager=0x7f952be3b010) at ../../../lib/isc/task.c:1130
#9 run (uap=0x7f952be3b010) at ../../../lib/isc/task.c:1302
#10 0x00007f952a34f6ba in start_thread (arg=0x7f9520ea7700) at pthread_create.c:333
#11 0x00007f9529a993dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

...

Read more...

Changed in maas:
status: New → Triaged
importance: Undecided → Critical
Mike Pontillo (mpontillo) wrote :

This is technically Invalid for MAAS unless there is something unsupported about how we're using BIND, but I'm marking it Triaged for now so we don't lose visibility (in case a fix in MAAS itself turns out to be required).

It would be nice if the service monitoring in MAAS detected this condition, but that feels like it should be handled in a separate bug.

tags: added: server-next
Andreas Hasenack (ahasenack) wrote :

named is being asked to reload its zones quite frequently, sometimes within the same second:
Aug 11 16:31:08 maas named[3174]: received control channel command 'reload'
Aug 11 16:31:17 maas named[3174]: received control channel command 'reload'
Aug 11 16:31:18 maas named[3174]: received control channel command 'reload'
Aug 11 16:31:22 maas named[3174]: received control channel command 'reload'
Aug 11 16:31:26 maas named[3174]: received control channel command 'reload'
Aug 11 16:31:29 maas named[3174]: received control channel command 'reload'
Aug 11 16:31:30 maas named[3174]: received control channel command 'reload'
(...)
Aug 11 17:07:35 maas named[3174]: received control channel command 'reload'
Aug 11 17:07:35 maas named[3174]: received control channel command 'reload'

Eventually it gets stuck:
Aug 11 17:15:16 maas named[3174]: received control channel command 'reload'
Aug 11 17:15:16 maas named[3174]: loading configuration from '/etc/bind/named.conf'
Aug 11 17:15:16 maas named[3174]: reading built-in trusted keys from file '/etc/bind/bind.keys'
<stuck>

An idea to try to reproduce this would be to issue such aggressive reloads on a multi-core machine.

Mike Pontillo (mpontillo) wrote :

Right; to attempt to reproduce the issue, I would aggressively reload (changing the zone files each time) while at the same time sending a large amount of queries to the server (for records in locally authoritative zones?).

Changed in bind9 (Ubuntu):
status: New → Triaged
importance: Undecided → High

In MAAS, we should:

 * throttle reloads (at least make sure a reload is complete before we
trigger the next one)
 * monitor the actual service from the perspective of rackd's (perhaps
have rackd's do a dig @region-controller for a name we send them
whenever they talk to the region controller)
 * log loudly, kill and restart when the service monitoring fails

Mark

Mike Pontillo (mpontillo) wrote :

I'm +1 on throttling reloads; I think that is the most obvious and critical work item for the MAAS team to address. I have filed that as bug #1710308.

I'm also +1 on better service monitoring using actual queries; I've filed that as bug #1710310. I think something equivalent to 'dig @127.0.0.1 <test-query>' on the region should be enough to detect a deadlock condition, but I like the idea of monitoring it from the rack's perspective as well (though that feels more like a non-fatal warning, because we don't want to restart bind in the event of random firewall hiccups).

Finally, I think your last bullet requires more discussion before we can work on it. MAAS currently uses sudoers rules specific to the init system to start and stop services like bind9; we do not currently have permission to 'kill -9' arbitrary processes. I'm concerned that if we go down that road, we would open up the possibility that MAAS could erroneously (or due to a malicious attack) believe that bind9 isn't working and repeatedly kill it without good cause, or be convinced to 'kill -9' an incorrect process.

In summary, I think the most urgent thing for MAAS to do is throttle reloads. That should greatly reduce the window of opportunity for the deadlock to occur. In parallel, this should be addressed upstream in bind9.

Mike Pontillo (mpontillo) wrote :

I attempted to reproduce the bind9 issue by doing the following (in two separate sessions):

# Queue 10,000 concurrent reloads (also tried removing the & to make it less parallel)
i=0; while [ $i -lt 10000 ]; do (/usr/sbin/rndc reload&); let i=$i+1; done

# Hammer the DNS server with queries
while [ 1 ]; do dig @127.0.0.1 <maas-hostname>; done

Everything works properly when I do this by itself. But if I have parallel reload requests running *and* I make manual changes to the DNS zones in /etc/bind/maas, I have observed bind9 behaving badly, including (eventually) what seemed to be the deadlock (but my bind9 was older, so my debug symbols didn't match).[1] Then I observed a similar state where after I updated the zone file, it was as if nothing changed (bind9 was returning old data, which didn't resolve itself until I did "service bind9 restart").

It's my impression that the problem is worse when I do reloads in parallel. So this is more evidence pointing to "we should ensure MAAS never tries to reload bind9 twice in parallel".

[1]:
First observed extreme sluggishness in resolving queries, which resolved itself after several seconds.
Then observed a crash (which the system subsequently recovered from): http://paste.ubuntu.com/25293751/
Then observed a deadlock with the same symptoms.

Mark Shuttleworth (sabdfl) wrote :

On 12/08/17 01:11, Mike Pontillo wrote:
> Finally, I think your last bullet requires more discussion before we can
> work on it. MAAS currently uses sudoers rules specific to the init
> system to start and stop services like bind9; we do not currently have
> permission to 'kill -9' arbitrary processes. I'm concerned that if we go
> down that road, we would open up the possibility that MAAS could
> erroneously (or due to a malicious attack) believe that bind9 isn't
> working and repeatedly kill it without good cause, or be convinced to
> 'kill -9' an incorrect process.

This bug causes named to be unresponsive to anything other than kill -9.

MAAS installed, configured, started, and validates named's behaviour.
Assume there is no operator. Since kill -9 is necessary on occasion, it
follows that MAAS must have and must use that ability.

I could see MAAS trying it a few times and then giving up with a big
alert to the operators. But I absolutely think MAAS should treat this as
a bug in named which should be logged and managed nicely but nonetheless
handled transparently to users.

Mark

Mark Shuttleworth (sabdfl) wrote :

To avoid reloads in parallel, I think we should:

 * verify the reload happened (perhaps checking zone serial?)
 * make sure we defer and subsequent reload at least 10 seconds

Mark

Andreas Hasenack (ahasenack) wrote :

Maybe dns dynamic updates could be used instead of zone reloads if just a
few IPs were added or removed.

On Aug 12, 2017 06:22, "Mark Shuttleworth" <email address hidden>
wrote:

> To avoid reloads in parallel, I think we should:
>
> * verify the reload happened (perhaps checking zone serial?)
> * make sure we defer and subsequent reload at least 10 seconds
>
> Mark
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1710278
>
> Title:
> [2.3a1] named stuck on reload, DNS broken
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/bind/+bug/1710278/+subscriptions
>

Mike Pontillo (mpontillo) wrote :

Mark, if you observe the deadlock again, can you run "systemctl stop bind9", wait a few minutes (at least 2, but maybe up to 5), and then check if bind9 successfully stops? It looks like systemd will (by default) resort to more aggressive methods to kill a service if it doesn't stop after ~90 seconds.

If the normal method of killing the bind9 service works, we can still avoid adding that scope and risk to MAAS. Rather, if we detect bind9 behaving badly, a stop/start cycle would also allow bind9 to properly shut down in most cases, and avoid any other bugs in BIND we might see as a side-effect of a "kill -9 <bind9-pid>" approach. (A human operator could troubleshoot those side effects, but it's more difficult for MAAS to anticipate, for example, why BIND might now fail to start up because of a lock file that was left on the filesystem when the 'kill -9' occurred.)

Mark Shuttleworth (sabdfl) wrote :

OK, will do, thanks :)

Changed in maas:
status: Triaged → In Progress
assignee: nobody → Blake Rouse (blake-rouse)
milestone: none → 2.3.0
Mark Shuttleworth (sabdfl) wrote :

OK, restarting via 'sudo service bind9 restart' does work in the end, it just takes a long time. The downside is that MAAS is not going to have an effective DNS for a few minutes, which is unacceptable.

Mike Pontillo (mpontillo) wrote :

Thanks. I wonder if we shouldn't fix this in Ubuntu by tweaking the systemd control files so that the timeout values are more acceptable for production use.

Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
milestone: 2.3.0 → 2.3.0alpha2
Changed in maas:
status: Fix Committed → Fix Released
Joshua Powers (powersj) on 2019-02-19
tags: removed: server-next
Mark Shuttleworth (sabdfl) wrote :

Hold on, I think this bug is still problematic for MAAS and Ubuntu.

Joshua Powers (powersj) wrote :

Hey Mark, was cleaning up bug tags; still consider this an issue.

Sam Lee (sjwl) wrote :

Mark,

Do you have any updated repro steps?

I'm seeing this failure with MAAS v2.5.3. I suspect when v2.5 moved the DNS logic from region to rack controller, that some of the mitigation logic was lost and thus this bug manifests more frequently.

When I compare our v2.5.3 install from our v2.4.2 install, the amount of rndc reloads is vastly more on v2.5.3.

[2.4.2]
journalctl -b -u bind9.service |grep received.control
Jun 22 00:22:05 wdc1-p01-s01-maas-18 named[907]: received control channel command 'reload'
Jun 22 00:22:08 wdc1-p01-s01-maas-18 named[907]: received control channel command 'reload'
Jun 22 00:22:54 wdc1-p01-s01-maas-18 named[907]: received control channel command 'reload'
Jun 24 16:27:06 wdc1-p01-s01-maas-18 named[907]: received control channel command 'reload'
Jun 25 13:53:34 wdc1-p01-s01-maas-18 named[907]: received control channel command 'reload'
Jun 25 13:53:41 wdc1-p01-s01-maas-18 named[907]: received control channel command 'reload'
Jun 25 13:54:51 wdc1-p01-s01-maas-18 named[907]: received control channel command 'reload'
Jun 25 13:55:22 wdc1-p01-s01-maas-18 named[907]: received control channel command 'reload'

[2.5.3]
journalctl -b -u bind9.service |grep received.control
Jun 26 14:23:59 ch31-p01-s01-maas-18 named[1041]: received control channel command 'reload'
Jun 26 14:24:04 ch31-p01-s01-maas-18 named[1041]: received control channel command 'reload'
Jun 26 14:24:09 ch31-p01-s01-maas-18 named[1041]: received control channel command 'reload'
Jun 26 14:24:11 ch31-p01-s01-maas-18 named[1041]: received control channel command 'reload'
Jun 26 14:24:15 ch31-p01-s01-maas-18 named[1041]: received control channel command 'reload'
Jun 26 14:24:18 ch31-p01-s01-maas-18 named[1041]: received control channel command 'reload'
Jun 26 14:24:22 ch31-p01-s01-maas-18 named[1041]: received control channel command 'reload'
Jun 26 14:24:27 ch31-p01-s01-maas-18 named[1041]: received control channel command 'reload'
Jun 26 14:24:31 ch31-p01-s01-maas-18 named[1041]: received control channel command 'reload'
Jun 26 14:24:36 ch31-p01-s01-maas-18 named[1041]: received control channel command 'reload'
Jun 26 14:24:40 ch31-p01-s01-maas-18 named[1041]: received control channel command 'reload'
Jun 26 14:24:42 ch31-p01-s01-maas-18 named[1041]: received control channel command 'reload'

I had to trim the 2.5.3 output because it was way too long to fit in this comment, but as you can see 2.5.3 is spamming reload as compared to 2.4.2. 2.4.2 it may reload 4 times for the _entire day_ whereas 2.5.3 is doing hundreds if not thousands a day.

Mark Shuttleworth (sabdfl) wrote :

On 6/26/19 5:12 PM, Sam Lee wrote:
> When I compare our v2.5.3 install from our v2.4.2 install, the amount of
> rndc reloads is vastly more on v2.5.3.

Hi Sam, I don't see these issues any more, on 18.04 and 2.6. I see
reloads every few minutes on a stable MAAS (i.e. without a lot of
activity). Since 2.6 is brand new (2.6.0) you might want to hold off on
upgrading unless your cluster is for test purposes, but let us know if
you still see this on 2.6 when you get there.

Mark

Sam Lee (sjwl) wrote :

Hi Mark,

Still seeing it with 18.04 and 2.6. The sweet spot seems to be when MAAS is receiving lots of DNS requests while simultaneously doing DNS reloads (as you alluded to in this case).

I'm attempting to setup a simplified repro scenario which basically will do this:

1) enlist 50+ new machines on a untagged subnet *with DNS left blank* forcing nodes to DNS query MAAS
2) Leave machines PXE interface with Autoassign IP (so every deploy/releaes forces a DNS reload)
3) deploy and release (repeat until error)

will report back with findings.

Mark Shuttleworth (sabdfl) wrote :

OK, interesting. I really don't like the reloading strategy but am not
sure that BIND gives us many better options. Let us know what you find.

Mark

Sam Lee (sjwl) wrote :

OK - I was able to repro again, and this time with MAAS 2.6.

Here are the steps

PREP WORK
1) Have 50 machines in Ready state with one interface enabled configured as 'Autoassign' to Default VLAN PXE subnet (auto assign so that every deploy/release causes MAAS to reload DNS)
2) Clear out any DNS entries in the PXE subnet (this forces nodes to send DNS queries to MAAS)
3) Settings-> Network Services -> DNS -> Upstream DNS -> enter valid upstream DNS IP
4) Settings-> Network Services -> DNS -> DNSSEC -> Automatic (for some reason this breaks Upstream DNS)
5) Verify that Upstream DNS is broken
a) Rescue Mode one machine
b) ssh to Rescue machine
c) dig www.google.com
d) (dig should timeout/fail)
e) MAAS->Settings-> Network Services -> DNS -> DNSSEC -> Disable
f) dig www.google.com
g) (dig should succeed)
h) MAAS->Settings-> Network Services -> DNS -> DNSSEC -> Automatic
i) Release Rescue machine

REPRO
1) run repro.py (attached, WARNING this code will use all machines available to MAAS)
2) wait up to 3 hours, checking if bind9 is hung by regularly running `sudo rndc status` on MAAS

monitoring steps (optional)
(See DNS Query activity)
in one ssh window to Maas run
sudo tcpdump dst <your-rack-controller-ip> -i ens3 and dst port 53
(See DNS reloads, and why)
in another ssh window to Maas run
sudo tail -f /var/log/maas/regiond.log |grep Reloaded -A 3

Sam Lee (sjwl) wrote :

repro.py attached

Sam Lee (sjwl) wrote :

repro.py attempts to trigger DNS queries during DNS Reloads.

It does so by first deploying all 50 machines.
Then one-by-one (not all at once!) release a machine, wait, deploy machine, move to next machine.

At some point a machine will be releasing (Reloads) while others are starting to deploy (DNS Queries). This is the sweet spot.

If one simply deploys all 50 machines simultaneously, then the DNS Reload would occur but without any DNS queries (because all machines have yet to PXE boot).

Sam Lee (sjwl) wrote :

I'm not sure why a "broken" Upstream DNS helps repro this bug, but I was not able to repro when the Upstream DNS was working.

Changed in bind9 (Ubuntu):
assignee: nobody → Blake Rouse (blake-rouse)
Dan Streetman (ddstreet) wrote :

This is a deadlock in bind9 code; thread A runs ns_client_endrequest->dns_view_detach->view_flushanddetach, which includes:

  LOCK(&view->lock);
  if (!RESSHUTDOWN(view))
    dns_resolver_shutdown(view->resolver);

inside dns_resolver_shutdown, for each resolver bucket, it does:

  for (i = 0; i < res->nbuckets; i++) {
    LOCK(&res->buckets[i].lock);

at this point, one of the bucket locks is held, and thread A is blocked holding view->lock, but waiting for the view->resolver->bucket[i].lock.

meanwhile, thread B runs dispatch->validated, and does:

  bucketnum = fctx->bucketnum;
  LOCK(&res->buckets[bucketnum].lock);

then while still holding that lock calls dns_validator_destroy->destroy->dns_view_weakdetach

which does:

  LOCK(&view->lock);

leaving thread A and thread B in a deadlock, with thread A waiting for the bucket.lock that thread B holds, and thread B waiting for the view->lock that thread A holds.

Dan Streetman (ddstreet) wrote :

This deadlock doesn't appear to be fixed in the latest upstream

Dan Streetman (ddstreet) wrote :

> Test build in ppa:

eh, there is still a view->attributes field that needs lock protection, this isn't ready yet.

Dan Streetman (ddstreet) wrote :

Ok, updated the test build in ppa with locking for view->attributes as well, should fix this particular bind9 deadlock.

Changed in bind:
assignee: nobody → Dan Streetman (ddstreet)
assignee: Dan Streetman (ddstreet) → nobody
Dan Streetman (ddstreet) on 2019-07-18
Changed in bind9 (Ubuntu Eoan):
assignee: Blake Rouse (blake-rouse) → Dan Streetman (ddstreet)
Changed in bind9 (Ubuntu Disco):
assignee: nobody → Dan Streetman (ddstreet)
Changed in bind9 (Ubuntu Bionic):
assignee: nobody → Dan Streetman (ddstreet)
Changed in bind9 (Ubuntu Disco):
importance: Undecided → Medium
Changed in bind9 (Ubuntu Bionic):
importance: Undecided → Medium
Changed in bind9 (Ubuntu Eoan):
status: Triaged → In Progress
Changed in bind9 (Ubuntu Disco):
status: New → In Progress
Changed in bind9 (Ubuntu Bionic):
status: New → In Progress
Changed in bind9 (Ubuntu Eoan):
importance: High → Medium
Dan Streetman (ddstreet) wrote :

first deadlock from comment 27 fixed/workedaround.

Next deadlock is:

thread A is the same as comment 27, holding view->lock and waiting for the bucket lock in dns_resolver_shutdown.

now, thread B calls dispatch->authvalidated->nsecvalidate->create_fetch->dns_resolver_createfetch->dns_resolver_createfetch3 which does:

  LOCK(&res->buckets[bucketnum].lock);

then calls fctx_create->fcount_incr which does:

  LOCK(&fctx->res->lock);

where fctx->res is the view. So again, deadlock.

Dan Streetman (ddstreet) wrote :

Another test build ready.
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1710278

Where will the next deadlock be? :)

Dan Streetman (ddstreet) wrote :

Looks like that was the last one for this particular bug.

Dan Streetman (ddstreet) on 2019-07-23
Changed in bind9 (Ubuntu Xenial):
assignee: nobody → Dan Streetman (ddstreet)
importance: Undecided → Medium
status: New → In Progress
Dan Streetman (ddstreet) wrote :

> Looks like that was the last one for this particular bug.

spoke too soon. Next deadlock is same place for thread A, dns_resolver_shutdown, while thread B is similar, fctx_create->dns_view_findzonecut->dns_view_findzonecut2 where it hangs on the view->lock.

Really, dns_resolver_shutdown (which holds the view->lock and iterates through taking each of the view's bucket locks) and fctx_create (which requires the caller to hold the bucket lock, and calls lots of functions that take the view->lock) will continue to deadlock like this until upstream does large changes to locking, which is what they're doing:
https://gitlab.isc.org/isc-projects/bind9/merge_requests/2132

However that's still WIP, and the changes so far are large (30 commits as of this comment):
https://gitlab.isc.org/isc-projects/bind9/merge_requests/2132/commits

So backporting all that may be outside the scope of normal SRUs.

Blake Rouse (blake-rouse) wrote :

Last resolution update:

The bind9 source package should be modified to generate 2 binary versions of bind9.

bind9 -> standard multi-threaded bind9 (main)
bind9-single -> single-threaded bind9 (universe)

This will allow security updates to still be handled with the source bind9 package generating both versions of the binary package.

Once bind9-single is in the archive MAAS will update its dependencies to depend on either bind9 or bind9-single. Allowing bind9-single to be installed in replace of bind9 and MAAS will not try to pull the default bind9 when upgraded.

Note: "bind9-single" is just a name I am using for this comment.

Eric Desrochers (slashd) on 2019-07-30
tags: added: sts
Changed in bind9 (Ubuntu Bionic):
assignee: Dan Streetman (ddstreet) → Eric Desrochers (slashd)
Changed in bind9 (Ubuntu Disco):
assignee: Dan Streetman (ddstreet) → Eric Desrochers (slashd)
Changed in bind9 (Ubuntu Eoan):
assignee: Dan Streetman (ddstreet) → Eric Desrochers (slashd)
Changed in bind9 (Ubuntu Xenial):
assignee: Dan Streetman (ddstreet) → Eric Desrochers (slashd)
Eric Desrochers (slashd) wrote :

I started to look for the singlethread new binary approach. So far it seems like bind9 won't be able to use pkcs11 library provider, thus I'm afraid there will be no DNSSEC capabilities in the singlethread binary package as it seems to require pthreads.

Eric Desrochers (slashd) wrote :

For now I have to use the following to make the build works in

# debian/rules:
 dh_auto_configure -B build-singlethread -- \
                ........
                --disable-threads \
                --disable-native-pkcs11 \
                --with-pkcs11=no \
                ........

Eric Desrochers (slashd) wrote :

I think an external provider can be mentioned via '-E engine-name' (see: NAMED(8))

       -E engine-name
           When applicable, specifies the hardware to use for cryptographic operations, such as a secure key store used for signing.

           When BIND is built with OpenSSL PKCS#11 support, this defaults to the string "pkcs11", which identifies an OpenSSL engine that can drive a cryptographic accelerator or hardware service
           module. When BIND is built with native PKCS#11 cryptography (--enable-native-pkcs11), it defaults to the path of the PKCS#11 provider library specified via "--with-pkcs11".

I'll have a look our options once I have a binary pkg ready to be installed and tested.

Eric Desrochers (slashd) wrote :

After reading more, I think what is most important for DNSSEC is --with-openssl.
I don't quite get the goal of PKCS11 but it doesn't seem as important as I would first think, if I read this correctly.

I'll try to get an installable bind9-single-thread binary package and test DNSSEC after.

Reference:

https://github.com/isc-projects/bind9/blob/master/README#L185-L191

# ./configure --help
  --with-openssl=PATH Build with OpenSSL [yes|no|path]. (Crypto is
                          required for DNSSEC)

Seth Arnold (seth-arnold) wrote :

Am I reading this bug correctly, that MAAS currently asks BIND to reload its entire configure file on every machine provision and removal?

This seems like a problem worth solving rather than trying to work around.

At least PowerDNS provides several mechanisms for dynamically adding and removing records from a zone:

- dnsupdate: https://doc.powerdns.com/authoritative/dnsupdate.html
- REST api: https://doc.powerdns.com/authoritative/http-api/index.html
- direct SQL to a backing database: https://doc.powerdns.com/authoritative/migration.html

Since dnsupdate is an RFC-standardized protocol there's a pretty good shot BIND supports it as well. Was this tried and found lacking? The API and SQL approaches are likely to not have equivalents in BIND.

I'm not sure what your DNSSEC goals are, but PowerDNS's documentation describes choices, including pkcs#11 in case that's important: https://doc.powerdns.com/authoritative/dnssec/index.html

Thanks

Eric Desrochers (slashd) wrote :

Ok I have something ready and installable to do further testing now.

This is introducing a new binary package called "bind9-single-thread"

If someone has something against the package name, please let me know, but "bind9-single-thread" is what I found the most obvious without looking at the description/changelog or else. That is pure esthetic so it can be changed at anytime before the SRU, so I'm all ears if someone come up with a better naming idea.

* With the following "bind9" and "bind9-single-thread" can't co-exist on the same machine (not co-installable). To do so I have put in place 2 things as follow:

# d/control:
Package: bind9
Conflicts: bind9-single-thread
Replaces: bind9-single-thread

Package: bind9-single-thread
Conflicts: bind9
Replaces: bind9

References:
https://www.debian.org/doc/debian-policy/ch-relationships.html#s-conflicts #
when two packages provide the same file and will continue to do so,

https://www.debian.org/doc/debian-policy/ch-relationships.html#s-replaces
Second, Replaces allows the packaging system to resolve which package should be removed when there is a conflict (see Conflicting binary packages - Conflicts). This usage only takes effect when the two packages do conflict, so that the two usages of this field do not interfere with each other.

Next step is to test MAAS against "bind9-single-thread" with DNSSEC as it seems the way to trigger the deadlocks in bind9 and the reason why we are introducing a single-thread binary package.

I'll share the PPA later today.

- Eric

Eric Desrochers (slashd) wrote :

# Quick note before one want to test this package:
* This is a test package for others to test, ONLY made to determine that it works as expected and fixes the current situation (pre-sru).
* This is NOT a long term nor final solution yet, please wait until the package is found in the official Ubuntu archive before considering this official (Post-SRU)
* DO NOT test on production area.
* This package is subject to change as we progress.

# PPA instructions:
sudo add-apt-repository ppa:slashd/lp1710278
sudo apt-get update
sudo apt install bind9-single-thread

As I write this, the version is : 1:9.11.3+dfsg-1ubuntu1.8+hfv20190801lp1710278b2

# Testing:
* Test MAAS with DNSSEC on|off|automatic|.... with a significant amount of machines/VMs (at least 50 from what I heard/can tell)
* Validate that only bind9 or bind9-single-thread can be installed at the time on the same system (basically both bind packages can't co-exist and be co-installable at the same time/on the same machine)
- I already tested that part, and it did the trick for me but having more eyes on won't hurt.

Expected behaviour:
 - If bind9 is installed and one tries to install bind9-single-thread
"
The following packages will be REMOVED:
  bind9-single-thread
The following NEW packages will be installed:
  bind9
"
 - If bind9-single-thread is installed and one tries to install bind9:
"
The following packages will be REMOVED:
  bind9
The following NEW packages will be installed:
  bind9-single-thread
"
So that way it will conflicts, but won't block user to switch from one to another due to the "Replaces:" put in place, but users/package maintainer depending on one or the other will have to read carefully what apt will mentioned and know what they are doing here.

If you think of anything else, please do so. The more testing/feedback the better.

- Eric

Eric Desrochers (slashd) wrote :

I have inverted the output accidentally by mistake:

So to rectify:

* IF bind9 is installed the expected behaviour is the following:

# apt install bind9-single-thread
......
The following packages will be REMOVED:
  bind9
The following NEW packages will be installed:
  bind9-single-thread

* IF bind9-single-thread is installed the expected behaviour is the following:

# apt install bind9
......
The following packages will be REMOVED:
  bind9-single-thread
The following NEW packages will be installed:
  bind9

Eric Desrochers (slashd) wrote :

As mentioned in previous comment, I had to turn off pkcsk11

# debian/rules:
 dh_auto_configure -B build-singlethread -- \
                ........
                --disable-threads \
                --disable-native-pkcs11 \
                --with-pkcs11=no \
                ........

but --with-openssl has been preserve, I don't know yet how much this can impact DNSSEC (but we would definitely need to pay attention to that)

# ./configure --help
......
--with-openssl=PATH Build with OpenSSL [yes|no|path]. (Crypto is
                          required for DNSSEC)
......

In Reply to Seth's suggestion:

> Am I reading this bug correctly, that MAAS currently asks BIND to reload its entire configure
> file on every machine provision and removal?
>
> This seems like a problem worth solving rather than trying to work around.
>
> At least PowerDNS provides several mechanisms for dynamically adding and removing records from
> a zone:
>
> - dnsupdate: https://doc.powerdns.com/authoritative/dnsupdate.html

[...]

> Since dnsupdate is an RFC-standardized protocol there's a pretty good shot BIND supports it as
> well. Was this tried and found lacking? The API and SQL approaches are likely to not have
> equivalents in BIND.
>
> I'm not sure what your DNSSEC goals are, but PowerDNS's documentation describes choices,
> including pkcs#11 in case that's important:
> https://doc.powerdns.com/authoritative/dnssec/index.html

Yes bind has even a tool for RFC 2136 packaged [1]. A little howto mentioning DNSSEC in that regard can be found at [2]. It also mentions an apparmor Deny with the setup, but if that would be the blocker I'm sure we can come up with a safe rule that can be added.
This might really be much closer to the design of the DNS server then high-frequency restart/reload. So giving this a thought/experiment on the MAAS side might be great.

[1]: http://manpages.ubuntu.com/manpages/bionic/man1/nsupdate.1.html
[2]: https://dnns.no/dynamic-dns-with-bind-and-nsupdate.html

Eric Desrochers (slashd) wrote :

Please wait before testing my ppa [ppa:slashd/lp1710278]

I was doing some test, and notice named is still multi-threaded although I pass the --disable-thread parameter.

$ ls /proc/1791/task/
1791 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802

$ ps -fL -C named
UID PID PPID LWP C NLWP STIME TTY TIME CMD
bind 1791 1 1791 0 11 Aug01 ? 00:00:00 /usr/sbin/named -f -u bind
bind 1791 1 1793 0 11 Aug01 ? 00:00:00 /usr/sbin/named -f -u bind
bind 1791 1 1794 0 11 Aug01 ? 00:00:00 /usr/sbin/named -f -u bind
bind 1791 1 1795 0 11 Aug01 ? 00:00:00 /usr/sbin/named -f -u bind
bind 1791 1 1796 0 11 Aug01 ? 00:00:00 /usr/sbin/named -f -u bind
bind 1791 1 1797 0 11 Aug01 ? 00:00:00 /usr/sbin/named -f -u bind
bind 1791 1 1798 0 11 Aug01 ? 00:00:00 /usr/sbin/named -f -u bind
bind 1791 1 1799 0 11 Aug01 ? 00:00:00 /usr/sbin/named -f -u bind
bind 1791 1 1800 0 11 Aug01 ? 00:00:00 /usr/sbin/named -f -u bind
bind 1791 1 1801 0 11 Aug01 ? 00:00:00 /usr/sbin/named -f -u bind
bind 1791 1 1802 0 11 Aug01 ? 00:00:00 /usr/sbin/named -f -u bind

I'll fix that and produce a new binary package to test.

- Eric

Eric Desrochers (slashd) wrote :

I have double-check in the buildlog and everything seems to indicate that multi-treading is turned off:

# buildlog
.....
 2689 dh_auto_configure -B build-singlethread -- \
 2690 --libdir=/usr/lib/x86_64-linux-gnu \
 2691 --sysconfdir=/etc/bind \
 2692 --with-python=python3 \
 2693 --localstatedir=/ \
=> 2694 --disable-threads \
 ...
 3908 Features disabled or unavailable on this platform:
=> 3909 Multiprocessing support (--enable-threads)
.....

Need to investigate more.

Eric Desrochers (slashd) wrote :

The build works as expected, I think "dh_install" is where the named binary got overwritten because "-pPACKAGE" is not mentioned.

Testing with "fakeroot debian/rules build" revealed:

$ md5sum build/bin/named/named
9f9fad4761dccb84801351a32c8c1a4f build/bin/named/named

$ md5sum build-singlethread/bin/named/named
bbdff642ecbf573521c6143a7d21db15 build-singlethread/bin/named/named

While installing bind9 or bind9-single-thread, named binary have the same md5sum for both.
So definitely the build part is working as expected, but the installation/copy goes wrong.

Hopefully that would do the trick:

- dh_auto_install -B build --destdir=$(CURDIR)/debian/tmp
- dh_auto_install -B build-singlethread --destdir=$(CURDIR)/debian/tmp-singlethread
+ dh_auto_install -pbind9 -B build --destdir=$(CURDIR)/debian/tmp
+ dh_auto_install -pbind9-single-thread -B build-singlethread --destdir=$(CURDIR)/debian/tmp-singlethread

Eric Desrochers (slashd) wrote :

In the multiple binary package case, the files are instead installed into debian/tmp/, and should be moved from there to the appropriate package build directory using dh_install(1).

From debhelper compatibility level 7 on, dh_install will fall back to looking in debian/tmp for files, if it doesn't find them in the current directory (or wherever you've told it to look using --sourcedir).

That is the problem:
dh_auto_install -B build --destdir=$(CURDIR)/debian/tmp
dh_auto_install -B build-singlethread --destdir=$(CURDIR)/debian/tmp-singlethread

I have to specify the src dir inside dh_install, cause current setup only rely on debian/tmp, the default when multiples binary package are in place.

Eric Desrochers (slashd) wrote :

Which explain why multi thread binary pkg and single thread binary package have the same named binary.

Eric Desrochers (slashd) wrote :

Ok I made some progress

bind9-single-thread is now "threads support is disabled" according to "named -V"

and

bind9 is still "threads support is enabled" still according to "named -V"

There is a few things to fix still here and there due to the recent change, but at least I think I found what dh_install needed to separate the binaries into their respective binary package.

It is a bit more difficult since it involves multiple recompilation of the same software but with different configuration options for each of them. Still need some time to refine the configuration and test again.

Eric Desrochers (slashd) wrote :

** important note ***

For eventual newer version of bind upgrade into later Ubuntu.

https://ftp.isc.org/isc/bind9/9.14.0/RELEASE-NOTES-bind-9.14.0.html

....
Previously, it was possible to build BIND without thread support for old architectures and systems without threads support. BIND now requires threading support (either POSIX or Windows) from the operating system, and it cannot be built without threads.
....

Seems like we can use single-thread up to 9.14, after that it's no longer offer as an configuration option, so until the upstream deadlocks are fix, and considering that --disable-threads is our current workaround/best options, we should not go beyond 9.14 until we have a better solution using multi threading and/or if upstream fix the deadlocks issues that MAAS suffers.

Eric Desrochers (slashd) wrote :

I just confirmed the above ^ :

$ git checkout v9_13_0
$ ./configure --help | grep -i "enable-thread"
  --enable-threads enable multithreading

$ git checkout v9_14_0
$ ./configure --help | grep -i "enable-thread"
==> NO MORE OPTION.

Eric Desrochers (slashd) wrote :

ok next challenge, named single-thread coredump as following:

# named -f -u bind
named: ../nptl/pthread_mutex_lock.c:79: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
Aborted (core dumped)

Although named is clearly single threaded

# named -V
BIND <VERSION>-Ubuntu (Extended Support Version) <id:a375815>
running on Linux x86_64 5.0.0-20-generic #21-Ubuntu SMP Mon Jun 24 09:32:09 UTC 2019
built by make with '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=/usr/include' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=/usr/lib/x86_64-linux-gnu' '--libexecdir=/usr/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--disable-dependency-tracking' '--libdir=/usr/lib/x86_64-linux-gnu' '--sysconfdir=/etc/bind' '--with-python=python3' '--localstatedir=/' '--disable-threads' '--enable-largefile' '--with-libtool' '--enable-shared' '--enable-static' '--with-gost=no' '--with-openssl=/usr' '--with-gssapi=/usr' '--with-libjson=/usr' '--without-lmdb' '--with-gnu-ld' '--with-geoip=/usr' '--with-atf=no' '--enable-ipv6' '--enable-rrl' '--enable-filter-aaaa' '--disable-native-pkcs11' '--with-pkcs11=no' '--with-randomdev=/dev/urandom' '--with-eddsa=no' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fdebug-prefix-map=/build/bind9-MQj2Su/bind9-9.11.3+dfsg=. -fstack-protector-strong -Wformat -Werror=format-security -fno-strict-aliasing -fno-delete-null-pointer-checks -DNO_VERSION_DATE -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
compiled by GCC 7.4.0
compiled with OpenSSL version: OpenSSL 1.1.1 11 Sep 2018
linked to OpenSSL version: OpenSSL 1.1.1 11 Sep 2018
compiled with libxml2 version: 2.9.4
linked to libxml2 version: 20904
compiled with libjson-c version: 0.12.1
linked to libjson-c version: 0.12.1
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
threads support is disabled

I'll build/publish the debug symbol and see what is missing (possibly libs built by the bind9 source need to be single-threaded too)

Eric Desrochers (slashd) wrote :

$ ldd /usr/sbin/named | grep -i thread
 libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f6495778000)

Eric Desrochers (slashd) wrote :

Due to the complexity of the work and various challenges I'm facing (fix one problem, find another one, fix it, find another one, and so on) during the process of having a single thread package of bind9. Not counting the complexity to maintain both multi-thread/single-thread package types until the deadlocks situation upstream is fixed, plus version 9.14 not offering single-thread (possibly landing in next release 19.10, and/or 20.04 (which will also be LTS))

If 19.10 and/or 20.04 has bind 9.14 or late and the deadlocks situation is not fix, we will have to find another approach anyway, as the single-thread will no longer be an option.

The deadlocks situation will take time has IIRC they have to entirely refactor the locking mechanism inside bind9 (which is not trivial) so we may end up having to maintain the solution we take for quite some time.

With these new parameters, maybe we should re-consider our approach.

Possibly server team/MAAS team should take over at this point and have a cross team discussion about this situation ?

- Eric

Eric Desrochers (slashd) on 2019-08-06
Changed in bind9 (Ubuntu Xenial):
assignee: Eric Desrochers (slashd) → nobody
Changed in bind9 (Ubuntu Bionic):
assignee: Eric Desrochers (slashd) → nobody
Changed in bind9 (Ubuntu Disco):
assignee: Eric Desrochers (slashd) → nobody
Changed in bind9 (Ubuntu Eoan):
assignee: Eric Desrochers (slashd) → nobody
no longer affects: maas (Ubuntu Xenial)
no longer affects: maas (Ubuntu Eoan)
no longer affects: maas (Ubuntu Disco)
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in maas (Ubuntu Bionic):
status: New → Confirmed
Changed in maas (Ubuntu):
status: New → Confirmed

Hi, I see that the backport fix is released and/or committed to MAAS 2.2, 2.6 and 2.7. Can we get it backported to 2.4 as well? It is currently affecting a customer in production. Thank you!

Blake Rouse (blake-rouse) wrote :

We are current working on getting this backported to 2.4.

Eric Desrochers (slashd) wrote :

Additionally, any idea as of when proposed 2.6.1 (including the dns reload fix) will become available in MAAS/stable ?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.