Comment 9 for bug 1470013

Revision history for this message
Raphaƫl Badin (rvb) wrote :

After some debugging I found a couple of problems:

- MAAS uses various timeouts to limit the time it takes to check the power state of nodes. The seamicro chassis sometimes takes as much as 35 seconds to reply to a power query and MAAS has a 15s timeout when the check is performed using the UI (using the "Check now" link). We should make the timeout bigger.

- The main problem that's happening when releasing a large number of nodes is that power.py:power_state_update ends up in a deadlock (thread starvation). After releasing 64 nodes I could see that the 10 threads allowed per region where busy (stuck in epoll_wait) waiting for something and the queue was growing and growing. Increasing the number of threads per region (to 20) solved the problem instantly. Conclusion: there is a deadlock in the code.