System Instance of Memcached Causes Check Failure
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
libmemcached |
Fix Released
|
Medium
|
Brian Aker |
Bug Description
It appears that when Server::Cycle is called it tries to kill stale instances of the memcached server. And somehow its detecting my global instance of memcached, and trying to kill it. Since the global instance belongs to a different user, the kill attempt fails, along with the unit test. I got a little confused tracing the pid logic around, otherwise I'd submit a patch. My suggestion would be adding a uid check of the memcached process before killing it? Or perhaps wrapping the memcached launch logic in a script or executable and then only killing memcached instances that are children of the magic wrapper?
I noticed kill_pid() does test for EPERM; is there a reason those can't silently return true and be ignored? In other words, is there a situation where EPERM could show up, but we'd want a kill attempt to fail?
I'm attaching logs showing a build of 0.51, and 0.52 so you can see that 0.51 worked just fine, but that 0.52 fails. A sample error is:
libtest/
libtest/
libtest/
libtest/
libtest/
For the record, the shared memcached pid is 11939:
[ladar@magma memcached]$ ps -ef | grep memcached | grep -v grep
memcache 11939 1 0 15:09 ? 00:00:00 /usr/local/
Changed in libmemcached: | |
importance: | Undecided → Medium |
assignee: | nobody → Brian Aker (brianaker) |
Changed in libmemcached: | |
status: | Fix Committed → Fix Released |
I spent some time studying the code, and realized that the problem is with how the memcached servers are added. When the unit tests create a memcached socket server the function memcached_ server_ add is called with the hostname = "" and port = 0. Because these values are invalid, the memcached_ server_ add function substitutes the default values: localhost:11211.
Because I have a global memcached server running at localhost:11211, a valid record is created in the server pool. The absence of a pid file doesn't cause problems because the libmemcached_ util_getpid function is being used for pid's, not the pid files.
I haven't tried creating a patch because I'm still not sure where I can add a fix that that doesn't break one of the alternative configs (sasl, gearmand, etc) . The fix should add logic to the server startup/shutdown code so that kill attempts are limited to servers spawned by the unit test logic.
The workaround for v0.52 is to kill any system-wide memcached server instances on the default port before running "make check", or moving the system-wide instance to a port that different from libmemcached's value for MEMCACHED_ DEFAULT_ PORT (which is 11211 by default).