Canonical System Image

Bug #1470750
Comment #9

Comment 9 for bug 1470750

Revision history for this message

Michi Henning (michihenning) wrote on 2015-07-02:

This is almost certainly caused by zmq. If the scope disappears unexpectedly, and there is a message from the registry to the scope pending, zmq by default tries to re-connect to the peer once every millisecond. If you look at the registry, you won't see anything unusual because all its threads will be exactly where you'd expect them to be. The re-connect spinning happens inside one of zmq's threads.

I've come across this once before. I'll try and dredge up the details. Basically, what I did was to add a reaper mechanism to the outgoing connection pool that trashes the socket if the request fails.

I'm wondering whether, possibly, this is happening on a oneway request from the registry to the scope?

There is a way to set a zmq socket option that adjusts the retry interval to something less aggressive. But setting that wouldn't fix the problem; instead, the registry would still be trying to re-connect indefinitely, just less often. zmq does not allow the number of retries to be restricted to some limit. As far as I know, the only way to stop the problem is to trash the offending socket.

It would be good to know what invocation is in flight when we enter that state.