tooz

ceilometer group partitioning coordination with tooz+redis+sentinel fails to failover to new master

Bug #1434043 reported by Chris Dent on 2015-03-19

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Ceilometer	Fix Released	Undecided	Unassigned
tooz	Status tracked in Kilo
Kilo	Fix Released	Medium	Chris Dent	tooz 0.13.2
Liberty	Fix Released	Medium	Chris Dent	tooz 0.14.0

Bug Description

When using tooz configured with multiple sentinels to coordinate group membership for the central (and other) agents the coordinator fails to update to use a new master redis server.

This appears to be happening because there's no retry logic when there is a ToozConnectionError which would (eventually) lead to tooz.driver.redis:_make_client being called to query the sentinels for the new master.

There's a question about where the retry logic should go: in ceilometer.coordination? in the tooz redis driver?

When the redis sentinel code was first created there was a (now proven to be mistaken) belief that there already was retry logic in ceilometer. However since the sentinel handling is quite specific in the way it works, and tooz is a tool for lots of stuff besides ceilometer, it should probably go in there.

There are some (now out of date) notes that led to this discovery at: https://tank.peermore.com/tanks/cdent-rhat/TestCeiloRedisPackstack#things-dont-work

Tags:

Julien Danjou (jdanjou) on 2015-03-19

Changed in python-tooz:
status:	New → Triaged
importance:	Undecided → Medium

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-19: Fix proposed to tooz (master)

Fix proposed to branch: master
Review: https://review.openstack.org/165890

Changed in python-tooz:
assignee:	nobody → Chris Dent (chdent)
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-20:

Fix proposed to branch: master
Review: https://review.openstack.org/166291

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-20: Change abandoned on tooz (master)

Change abandoned by Chris Dent (<email address hidden>) on branch: master
Review: https://review.openstack.org/165890
Reason: Abandoned in favor of: I8fd672e664d98097944a7c984cadab5fb08dd2d6

Thanks to harlowj for pointing me in the right direction.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-23: Fix merged to tooz (master)

Reviewed: https://review.openstack.org/166291
Committed: https://git.openstack.org/cgit/openstack/tooz/commit/?id=54d6bb1c94270d2794ecefbcaf3f8832010e3d58
Submitter: Jenkins
Branch: master

commit 54d6bb1c94270d2794ecefbcaf3f8832010e3d58
Author: Chris Dent <email address hidden>
Date: Fri Mar 20 15:50:56 2015 +0000

Use a sentinel connection pool to manage failover

    When configured to use sentinel with the redis driver, allow the
    redis-py client to manage the connection to the currently elected
    master.

    'master_for' will return a StricRedis client which is bound to a
    connection pool that queries the Sentinel[s] when providing
    connections from the pool.

This means that failover handling is automatic as long as the
sentinels can be reached and they have elected a new master.

Change-Id: I8fd672e664d98097944a7c984cadab5fb08dd2d6
Closes-Bug: #1434043

Changed in python-tooz:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-25: Fix proposed to tooz (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/167598

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-10: Fix merged to tooz (stable/kilo)

Reviewed: https://review.openstack.org/167598
Committed: https://git.openstack.org/cgit/openstack/tooz/commit/?id=01859a03e84e7b5de8dc08bbdff0d458e4130c51
Submitter: Jenkins
Branch: stable/kilo

commit 01859a03e84e7b5de8dc08bbdff0d458e4130c51
Author: Chris Dent <email address hidden>
Date: Fri Mar 20 15:50:56 2015 +0000

Use a sentinel connection pool to manage failover

    When configured to use sentinel with the redis driver, allow the
    redis-py client to manage the connection to the currently elected
    master.

    'master_for' will return a StricRedis client which is bound to a
    connection pool that queries the Sentinel[s] when providing
    connections from the pool.

This means that failover handling is automatic as long as the
sentinels can be reached and they have elected a new master.

    Change-Id: I8fd672e664d98097944a7c984cadab5fb08dd2d6
    Closes-Bug: #1434043
    (cherry picked from commit 54d6bb1c94270d2794ecefbcaf3f8832010e3d58)

tags:

added: in-stable-kilo

Doug Hellmann (doug-hellmann) on 2015-04-13

Changed in python-tooz:
milestone:	none → 0.13.2
status:	Fix Committed → Fix Released

Julien Danjou (jdanjou) on 2015-04-13

Changed in python-tooz:
milestone:	0.13.2 → 0.14.0

Doug Hellmann (doug-hellmann) on 2015-04-13

no longer affects:

ceilometer/kilo

Julien Danjou (jdanjou) on 2015-04-13

Changed in python-tooz:
milestone:	0.14.0 → 0.13.2

Doug Hellmann (doug-hellmann) on 2015-04-13

no longer affects:

ceilometer/liberty

Chris Dent (cdent) on 2015-09-10

Changed in ceilometer:
status:	New → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.