redis promotion is problematic

Bug #1742086 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Michele Baldessari

Bug Description

Started from https://bugzilla.redhat.com/show_bug.cgi?id=1414967:
"""
The problem here is that on redis node demotion, another redis node promotion to master takes place and redis client kill followed by reconnection happens before haproxy gave up on "dead" (demoted) redis node. That means that redis clients keep connecting to slave node and not the new master since haproxy keeps redirecting it to the demoted node and eventually redis clients gave up before haproxy start to redirect redis clients to new promoted redis master node.
"""

To recap the problem is the following:
- Connection made to redis via haproxy, where haproxy directs to one redis node.
- That chosen redis node fails, new redis node it promoted.
- HAProxy will send *new* connections to redis to the newly promoted redis server, but you still have a session active (ie. that has not hit timeout), so haproxy will attempt to use that connection.

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
Michele Baldessari (michele) wrote :
Changed in tripleo:
assignee: Michele Baldessari (michele) → Alex Schultz (alex-schultz)
Changed in tripleo:
assignee: Alex Schultz (alex-schultz) → Michele Baldessari (michele)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/529107
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=dbfc8e1c1d5f39c7e64de7d7188cb2325492ee2a
Submitter: Zuul
Branch: master

commit dbfc8e1c1d5f39c7e64de7d7188cb2325492ee2a
Author: Michele Baldessari <email address hidden>
Date: Tue Dec 19 17:27:49 2017 +0100

    Use on-marked-down shutdown-sessions for redis haproxy conf

    The problem we have with redis can be described as follows:
    - Connection is made to redis via haproxy, where haproxy directs to one redis node
    - That chosen redis node fails and a new redis node gets promoted
    - HAProxy will send *new* connections to redis to the newly promoted
      redis server, but you still have a session active (ie. that has not hit
      timeout), so haproxy will attempt to use that connection.

    By using 'on-marked-down shutdown-sessions' we make sure we close
    old existing sessions when the redis master node changes.

    Closes-Bug: #1742086
    Tested-By: Marian Krcmarik <email address hidden>

    Change-Id: Ia4d8c27057ee2de9e49e4358aa069571d1c952a9

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/537671

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/537675

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/537676

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 8.2.0

This issue was fixed in the openstack/puppet-tripleo 8.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/pike)

Reviewed: https://review.openstack.org/537671
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=3eebb0b00b06dd8f57d8f17d5b822d7346b6d2f2
Submitter: Zuul
Branch: stable/pike

commit 3eebb0b00b06dd8f57d8f17d5b822d7346b6d2f2
Author: Michele Baldessari <email address hidden>
Date: Tue Dec 19 17:27:49 2017 +0100

    Use on-marked-down shutdown-sessions for redis haproxy conf

    The problem we have with redis can be described as follows:
    - Connection is made to redis via haproxy, where haproxy directs to one redis node
    - That chosen redis node fails and a new redis node gets promoted
    - HAProxy will send *new* connections to redis to the newly promoted
      redis server, but you still have a session active (ie. that has not hit
      timeout), so haproxy will attempt to use that connection.

    By using 'on-marked-down shutdown-sessions' we make sure we close
    old existing sessions when the redis master node changes.

    NB: Cherry-pick not 100% clean due to some context differences

    Closes-Bug: #1742086
    Tested-By: Marian Krcmarik <email address hidden>

    Change-Id: Ia4d8c27057ee2de9e49e4358aa069571d1c952a9
    (cherry picked from commit dbfc8e1c1d5f39c7e64de7d7188cb2325492ee2a)

tags: added: in-stable-pike
tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/newton)

Reviewed: https://review.openstack.org/537676
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=db2d6f719865dac387c7922f3d8bccec81e12f8a
Submitter: Zuul
Branch: stable/newton

commit db2d6f719865dac387c7922f3d8bccec81e12f8a
Author: Michele Baldessari <email address hidden>
Date: Tue Dec 19 17:27:49 2017 +0100

    Use on-marked-down shutdown-sessions for redis haproxy conf

    The problem we have with redis can be described as follows:
    - Connection is made to redis via haproxy, where haproxy directs to one redis node
    - That chosen redis node fails and a new redis node gets promoted
    - HAProxy will send *new* connections to redis to the newly promoted
      redis server, but you still have a session active (ie. that has not hit
      timeout), so haproxy will attempt to use that connection.

    By using 'on-marked-down shutdown-sessions' we make sure we close
    old existing sessions when the redis master node changes.

    NB: Cherry-pick not 100% clean due to some context differences

    Closes-Bug: #1742086
    Tested-By: Marian Krcmarik <email address hidden>

    Change-Id: Ia4d8c27057ee2de9e49e4358aa069571d1c952a9
    (cherry picked from commit dbfc8e1c1d5f39c7e64de7d7188cb2325492ee2a)
    (cherry picked from commit 3eebb0b00b06dd8f57d8f17d5b822d7346b6d2f2)
    (cherry picked from commit 507fd3a69fbe85ff3c1eac45ad1b88cc70769157)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/ocata)

Reviewed: https://review.openstack.org/537675
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=f9e27f76f3fab2114f5ed4f2c526a3d7bd90f230
Submitter: Zuul
Branch: stable/ocata

commit f9e27f76f3fab2114f5ed4f2c526a3d7bd90f230
Author: Michele Baldessari <email address hidden>
Date: Tue Dec 19 17:27:49 2017 +0100

    Use on-marked-down shutdown-sessions for redis haproxy conf

    The problem we have with redis can be described as follows:
    - Connection is made to redis via haproxy, where haproxy directs to one redis node
    - That chosen redis node fails and a new redis node gets promoted
    - HAProxy will send *new* connections to redis to the newly promoted
      redis server, but you still have a session active (ie. that has not hit
      timeout), so haproxy will attempt to use that connection.

    By using 'on-marked-down shutdown-sessions' we make sure we close
    old existing sessions when the redis master node changes.

    NB: Cherry-pick not 100% clean due to some context differences

    Closes-Bug: #1742086
    Tested-By: Marian Krcmarik <email address hidden>

    Change-Id: Ia4d8c27057ee2de9e49e4358aa069571d1c952a9
    (cherry picked from commit dbfc8e1c1d5f39c7e64de7d7188cb2325492ee2a)
    (cherry picked from commit 3eebb0b00b06dd8f57d8f17d5b822d7346b6d2f2)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 7.4.9

This issue was fixed in the openstack/puppet-tripleo 7.4.9 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 6.5.10

This issue was fixed in the openstack/puppet-tripleo 6.5.10 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 5.6.8

This issue was fixed in the openstack/puppet-tripleo 5.6.8 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.