In the HA deployment, move redis out of pacemaker's grasp and on to systemd

Bug #1673715 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Expired
Undecided
Unassigned

Bug Description

When deploying with -e environments/puppet-pacemaker.yaml we have redis running under pacemaker:
  OS::TripleO::Services::Redis: ../puppet/services/pacemaker/database/redis.yaml

The pike release timeframe might be a good time to remove redis from pacemaker's management and put it under systemd's control. The following work items need to happen for us to be able to make that happen:
1) Trivial change to environments/puppet-pacemaker.yaml defaulting to the non-pacemaker version

2) Add a release note / documentation

3) Code up an upgrade path:
3.1) Stop the redis resource via pcs. Delete it from the CIB
3.2) Make sure in THT redis is set to the non-pacemaker service
3.3) Run puppet

4) Testing:
4.1) Make sure that redis is still working after the master node dies
4.2) Verify that the sentinel is working correctly
4.3) Test reboots/restarts of redis
4.4) ???

Tags: upgrade
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/446956

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
Michele Baldessari (michele) wrote :

Here are some sosreports from a recent pike HA environment with redis managed by systemd:
http://acksyn.org/files/tripleo/redis-nopcmk-pike/

Revision history for this message
Michele Baldessari (michele) wrote :

Chatted with sileht about this. There are a few more things to take care of to have this working properly. Namely we should get rid of haproxy and configure the coordination_url like this:
ceilometer coordinaltion_url will change from redis://vip-redis/ to something like redis://host1?sentinel=<clustername>&sentinel_fallback=host2&sentinel_fallback=host3

We will also need to be rather careful about the timeouts. The pacemaker RA calls for a cli disconnect of the clients to speed up failover

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/446956
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=b6a7ac432c59aedd6bf9795270bfe7a58cd90379
Submitter: Jenkins
Branch: master

commit b6a7ac432c59aedd6bf9795270bfe7a58cd90379
Author: Michele Baldessari <email address hidden>
Date: Fri Mar 17 12:01:15 2017 +0100

    Bind redis-sentinel to its network

    We currently do not bind redis-sentinel to any IP:
    redis 21144 0.0 0.0 142908 5908 ? Ssl 07:43 0:11 /usr/bin/redis-sentinel *:26379 [sentinel]

    Let's bind it to the same network as redis.

    Change-Id: I8a782ae1db84eb614aa3995a1638a2f370e70d06
    Partial-Bug: #1673715

Changed in tripleo:
status: In Progress → Triaged
Changed in tripleo:
milestone: pike-1 → pike-2
Changed in tripleo:
milestone: pike-2 → pike-3
Revision history for this message
Emilien Macchi (emilienm) wrote :

There are no currently open reviews on this bug, changing the status back to the previous state and unassigning. If there are active reviews related to this bug, please include links in comments.

Changed in tripleo:
assignee: Michele Baldessari (michele) → nobody
Changed in tripleo:
milestone: pike-3 → pike-rc1
Changed in tripleo:
milestone: pike-rc1 → queens-1
Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
milestone: stein-1 → stein-2
Revision history for this message
Emilien Macchi (emilienm) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (FUTURE, PIKE, QUEENS, ROCKY, STEIN).
  Valid example: CONFIRMED FOR: FUTURE

Changed in tripleo:
importance: Medium → Undecided
status: Triaged → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.