tooz

group membership not updated with redis driver and member death

Bug #1386684 reported by Chris Dent on 2014-10-28

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tooz	Fix Released	High	Unassigned	tooz 0.10

Bug Description

When using the redis driver as the coordinator with ceilometer-compute-agent, group membership is not updated when an existing member departs the group (for example by being manually terminated).

  redis server 2.6.16
  python-redis 2.10.3
  tooz 0.8
  ceilometer-compute-agent from master, running under devstack master

The replication scenario goes something like this:

* Update pipeline.yaml so that readings happen frequently enough not to get bored (I chose 30)
* Set these guys

  [coordination]
  backend_url = redis://localhost:6379
  #backend_url = memcached://localhost

[compute]
workload_partitioning = True

* boot several (10-ish) instances with nova
* run one ceilometer-compute-agent
* tail -f ceilometer-agent-compute.log |grep coord # watch group membership
* run another ceilometer-compute-agent
* see membership change and partitioning update in log
* wait a while
* kill one of the agents
* see membership and partitioning not update
* try again with memcached, see it work okay

Is this a version specific problem or a mistake in the driver?

Revision history for this message

Chris Dent (cdent) wrote on 2014-10-28:

#1

Confirmed on 2.8.17

Also: https://github.com/stackforge/tooz/blob/master/tooz/drivers/redis.py#L42 (there is no TimeoutError in exceptions in python-redis 2.10.3

(mentioning it here as I thought it might be a contributing factor, but apparently not)

Revision history for this message

Chris Dent (cdent) wrote on 2014-10-28:

#2

that's "confirmed on redis-server 2.8.17"

Revision history for this message

Chris Dent (cdent) wrote on 2014-10-28:

#3

The problem with TimeoutError was a red-herring: confused python-redis versions

Revision history for this message

Joshua Harlow (harlowja) wrote on 2014-10-28:

#4

Not sure why this didn't post here:

https://review.openstack.org/#/c/131533/

Revision history for this message

Chris Dent (cdent) wrote on 2014-10-28:

#5

I can confirm that the proposed fix seems to fix the problem. Adding and removing agents impacts the partitioning as expected.

A related observation: I'm not certain but it appears that the catchup (at least when on a 30 second polling cycle) is a bit slower than when I've previously done similar tests with memcache. I'm guessing the timeout handling is a bit different? I don't think we need to expect consistency between drivers on this front, just noting it for reference.

Julien Danjou (jdanjou) on 2014-11-17

Changed in python-tooz:
status:	New → Fix Committed
importance:	Undecided → High

Julien Danjou (jdanjou) on 2015-01-07

Changed in python-tooz:
milestone:	none → 0.10
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.