We are periodically seeing this occur in 1.8.0~beta8+bzr3951-0ubuntu1~trusty1. Restarting the clusterd service gets us back in operation for a number of days. It seems to repeat.
The impact we observe is the inability to release or acquire nodes ('Releasing Failed' at the moment).
2015-06-04 12:31:51+0000 [-] Region not available: Couldn't bind: 24: Too many open files. (While requesting RPC info at http://10.245.168.2/MAAS/rpc/).
...
exceptions.OSError: [Errno 24] Too many open files: '/tmp/tmpvQtPf2'
We are periodically seeing this occur in 1.8.0~beta8+ bzr3951- 0ubuntu1~ trusty1. Restarting the clusterd service gets us back in operation for a number of days. It seems to repeat.
The impact we observe is the inability to release or acquire nodes ('Releasing Failed' at the moment).
http:// paste.ubuntu. com/11564921/
2015-06-04 12:31:51+0000 [-] Region not available: Couldn't bind: 24: Too many open files. (While requesting RPC info at http:// 10.245. 168.2/MAAS/ rpc/).
...
exceptions.OSError: [Errno 24] Too many open files: '/tmp/tmpvQtPf2'
# Foo
sudo lsof > lsof.txt
cat lsof.txt | grep tgt | wc -l
72109
cat lsof.txt | grep tgt | awk '{ print $3 }' | uniq | wc -l
802