Comment 9 for bug 1710278

Mike Pontillo (mpontillo) wrote :

I attempted to reproduce the bind9 issue by doing the following (in two separate sessions):

# Queue 10,000 concurrent reloads (also tried removing the & to make it less parallel)
i=0; while [ $i -lt 10000 ]; do (/usr/sbin/rndc reload&); let i=$i+1; done

# Hammer the DNS server with queries
while [ 1 ]; do dig @ <maas-hostname>; done

Everything works properly when I do this by itself. But if I have parallel reload requests running *and* I make manual changes to the DNS zones in /etc/bind/maas, I have observed bind9 behaving badly, including (eventually) what seemed to be the deadlock (but my bind9 was older, so my debug symbols didn't match).[1] Then I observed a similar state where after I updated the zone file, it was as if nothing changed (bind9 was returning old data, which didn't resolve itself until I did "service bind9 restart").

It's my impression that the problem is worse when I do reloads in parallel. So this is more evidence pointing to "we should ensure MAAS never tries to reload bind9 twice in parallel".

First observed extreme sluggishness in resolving queries, which resolved itself after several seconds.
Then observed a crash (which the system subsequently recovered from):
Then observed a deadlock with the same symptoms.