Comment 4 for bug 1651526

Revision history for this message
Antonio Messina (arcimboldo) wrote :

Sorry, but disabling the compute node doesn't solve the issue, although you might want to change the behavior, this is a solution for a different problem.

The problem here is twofold:
* Mitaka+ version of nova now creates multiple threads to delete VMs in parallel (and create snapshots etc), and as a consequence is more connection-hungry
* nova is not dealing properly with fd starvation.

Since the number of connections created is a function of the size of the ceph cluster, I would expect either:
a) nova is limiting the amount of parallel operations based on the max number of files he can create (man getrlimit)
b) (ugly but easier) a configuration option is provided to limit the amount of parallel connections nova will make

Option a) has the advantage that in some cases the limits are too low to be able to do anything and this issue might be spotted *before* actually failing, and a nice error might be printed in the log file.

So far we were able to *mitigate* the issue by:

1) setting EVENTLET_THREADPOOL_SIZE to a lower value in upstart script
2) increasing nfile (ulimit in upstart script)