Comment 9 for bug 1710278

Mike Pontillo (mpontillo) wrote :

I attempted to reproduce the bind9 issue by doing the following (in two separate sessions):

# Queue 10,000 concurrent reloads (also tried removing the & to make it less parallel)
i=0; while [ $i -lt 10000 ]; do (/usr/sbin/rndc reload&); let i=$i+1; done

# Hammer the DNS server with queries
while [ 1 ]; do dig @127.0.0.1 <maas-hostname>; done

Everything works properly when I do this by itself. But if I have parallel reload requests running *and* I make manual changes to the DNS zones in /etc/bind/maas, I have observed bind9 behaving badly, including (eventually) what seemed to be the deadlock (but my bind9 was older, so my debug symbols didn't match).[1] Then I observed a similar state where after I updated the zone file, it was as if nothing changed (bind9 was returning old data, which didn't resolve itself until I did "service bind9 restart").

It's my impression that the problem is worse when I do reloads in parallel. So this is more evidence pointing to "we should ensure MAAS never tries to reload bind9 twice in parallel".

[1]:
First observed extreme sluggishness in resolving queries, which resolved itself after several seconds.
Then observed a crash (which the system subsequently recovered from): http://paste.ubuntu.com/25293751/
Then observed a deadlock with the same symptoms.