Comment 0 for bug 821732

Revision history for this message
Wolfgang Scherer (wolfgang-scherer) wrote : socket leak in lrmd

ii cluster-glue 1.0.7-3ubuntu2 The reusable cluster components for Linux HA

The comamnds `crm ra classes` and `cr ra list` cause a socket leak in the lrmd daemon.

When approx. 1024 sockets are allocated, the lrmd becomes unresponsive and must be killed.
The syslog then shows repeated entries:

  Aug 3 10:25:08 server lrmd: [1941]: ERROR: socket_accept_connection: accept(sock=6): Too many open files

While I only use these commands during development, it is still a nuisance.

The leak does not appear for other commands, e.g. `crm resource
list`, but I have not tested exhaustively.

I originally reported this bug to http://developerbugs.linux-foundation.org/show_bug.cgi?id=2626.

There I was informed that the behavior most likely stems from an
unsupported patch (raexecupstart.patch) in the Ubuntu package.
When I remove that patch, the socket leaks does indeed go away.

Although I did not have any "deadlock" situations with the original
code, I replaced it with the attached patch which should prevent any
possible recursive calls of the `on_remove_client' function.