Comment 5 for bug 1272840

Revision history for this message
Pavel Vaylov (pvaylov) wrote :

Folks, we really need to have a fix for deployed environment and master node.

- This is blocker issue that may led to 50 seconds delay in running of nova commands.
- Also this issue may led to extremely high load on controller node.
- corosync instance affected by the issue become unusable

lsof -n -p $(pidof crmd) | wc -l report 1098

default limit 1024

Tried to increase limits on "hot"

su -m hacluster -c "ulimit -Sn 4096"
su -m hacluster -c "ulimit -Hn 10240"

But didn't get lucky.

Tried to edit /etc/security/limits.conf
added

hacluster soft nofile 4096
hacluster hard nofile 10240

then rebooted node

But didn't get lucky.

Workaround:

just insert "ulimit -n 1024000" in the start command in init script just before corosync starts

But we didn't test it.

One more addition: bug description does not contain string from crmd.log that crmd complaining about too much open files

2014-02-05T18:22:08.928963+00:00 err: error: qb_ipcs_us_connection_acceptor: Could not accept client connection: Too many open files (24)

Questions:

 - Why the issue affected only one controller ?
 - Is there a fix without restarting of any services ?