Bump up default pacemaker monitor timeout value for OVN DBs

Bug #1853000 reported by Kamil Sambor
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Undecided
Kamil Sambor

Bug Description

Under pressure, the default monitor timeout value of 20 seconds is not enough to prevent unnecessary failovers of the ovn-dbs pacemaker resource.

While spawning a few VMs in the same time this could lead to unnecessary movements of master DB, then re-connections of ovn-controllers (slaves are read-only), further peaks of load on DBs, and at the end it could lead to snowball effect.

We should bump the default value in puppet to 60 seconds and provide an option to change it in the future from THT.

Kamil Sambor (ksambor)
Changed in tripleo:
assignee: nobody → Kamil Sambor (ksambor)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/stein)

Reviewed: https://review.opendev.org/692119
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=2631cb8c54443092733637933382af341fff42bb
Submitter: Zuul
Branch: stable/stein

commit 2631cb8c54443092733637933382af341fff42bb
Author: Kamil Sambor <email address hidden>
Date: Thu Oct 17 15:30:58 2019 +0200

    Add configurable monitor timeouts for ovn dbs

    Under pressure, the default monitor timeout value of 20 seconds is not
    enough to prevent unnecessary failovers of the ovn-dbs pacemaker resource.
    While spawning a few VMs in the same time this could lead to unnecessary
    movements of master DB, then re-connections of ovn-controllers (slaves are
    read-only), further peaks of load on DBs, and at the end it could lead to
    snowball effect. Now this value can be configurable by dbs_timeout in
    tripleo::profile::pacemaker::ovn_dbs_bundle and by default is set to 60s.

    Change-Id: Ib95c6b7614631eed264d42e6cf61672b705e7893
    Signed-off-by: Kamil Sambor <email address hidden>
    Partial-Bug: #1853000
    (cherry picked from commit 15e21010a8a8594678afe385821ee804ec9e16c7)
    (cherry picked from commit 223e786c5716015c7ac1bdda94feabcd9c79716a)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/train)

Reviewed: https://review.opendev.org/692114
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=95a7f217e2f29b77f598c0920533aff749273b8a
Submitter: Zuul
Branch: stable/train

commit 95a7f217e2f29b77f598c0920533aff749273b8a
Author: Kamil Sambor <email address hidden>
Date: Thu Oct 17 15:30:58 2019 +0200

    Add configurable monitor timeouts for ovn dbs

    Under pressure, the default monitor timeout value of 20 seconds is not
    enough to prevent unnecessary failovers of the ovn-dbs pacemaker resource.
    While spawning a few VMs in the same time this could lead to unnecessary
    movements of master DB, then re-connections of ovn-controllers (slaves are
    read-only), further peaks of load on DBs, and at the end it could lead to
    snowball effect. Now this value can be configurable by dbs_timeout in
    tripleo::profile::pacemaker::ovn_dbs_bundle and by default is set to 60s.

    Change-Id: Ib95c6b7614631eed264d42e6cf61672b705e7893
    Signed-off-by: Kamil Sambor <email address hidden>
    Partial-Bug: #1853000
    (cherry picked from commit 15e21010a8a8594678afe385821ee804ec9e16c7)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/692127
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=fa5086f1fe1b86679892a6094c86cd1fe2c77a66
Submitter: Zuul
Branch: stable/train

commit fa5086f1fe1b86679892a6094c86cd1fe2c77a66
Author: Kamil Sambor <email address hidden>
Date: Thu Oct 17 16:06:26 2019 +0200

    Add configurable monitor timeouts for ovn dbs

    Under pressure, the default monitor timeout value of 20 seconds is not
    enough to prevent unnecessary failovers of the ovn-dbs pacemaker resource.
    While spawning a few VMs in the same time this could lead to unnecessary
    movements of master DB, then re-connections of ovn-controllers (slaves are
    read-only), further peaks of load on DBs, and at the end it could lead to
    snowball effect. Now this value can be configurable by
    OVNDBSPacemakerTimeout which will configure
    tripleo::profile::pacemaker::ovn_dbs_bundle (default is set to 60s)

    Depends-On: https://review.opendev.org/#/c/692114/
    Signed-off-by: Kamil Sambor <email address hidden>
    Closes-Bug: #1853000
    Change-Id: I1afb5f2ef31ec61b3b224e5e1672fb9f12bcb110
    (cherry picked from commit ad1ef91aa4158c96356b13291605d8f2044f3069)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/stein)

Reviewed: https://review.opendev.org/692130
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=436b72550ec2b1940b642c4b5d7e208e69c2b783
Submitter: Zuul
Branch: stable/stein

commit 436b72550ec2b1940b642c4b5d7e208e69c2b783
Author: Kamil Sambor <email address hidden>
Date: Thu Oct 17 16:06:26 2019 +0200

    Add configurable monitor timeouts for ovn dbs

    Under pressure, the default monitor timeout value of 20 seconds is not
    enough to prevent unnecessary failovers of the ovn-dbs pacemaker resource.
    While spawning a few VMs in the same time this could lead to unnecessary
    movements of master DB, then re-connections of ovn-controllers (slaves are
    read-only), further peaks of load on DBs, and at the end it could lead to
    snowball effect. Now this value can be configurable by
    OVNDBSPacemakerTimeout which will configure
    tripleo::profile::pacemaker::ovn_dbs_bundle (default is set to 60s)

    Depends-On: https://review.opendev.org/#/c/692119/
    Signed-off-by: Kamil Sambor <email address hidden>
    Closes-Bug: #1853000
    Change-Id: I1afb5f2ef31ec61b3b224e5e1672fb9f12bcb110
    (cherry picked from commit ad1ef91aa4158c96356b13291605d8f2044f3069)
    (cherry picked from commit ca335af7cba23a8159cf0d875fb87f37549ced53)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/rocky)

Reviewed: https://review.opendev.org/692133
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=1bb1cb50344644de46998d3267b8dafa8d9e9813
Submitter: Zuul
Branch: stable/rocky

commit 1bb1cb50344644de46998d3267b8dafa8d9e9813
Author: Kamil Sambor <email address hidden>
Date: Thu Oct 17 16:06:26 2019 +0200

    Add configurable monitor timeouts for ovn dbs

    Under pressure, the default monitor timeout value of 20 seconds is not
    enough to prevent unnecessary failovers of the ovn-dbs pacemaker resource.
    While spawning a few VMs in the same time this could lead to unnecessary
    movements of master DB, then re-connections of ovn-controllers (slaves are
    read-only), further peaks of load on DBs, and at the end it could lead to
    snowball effect. Now this value can be configurable by
    OVNDBSPacemakerTimeout which will configure
    tripleo::profile::pacemaker::ovn_dbs_bundle (default is set to 60s)

    Depends-On: https://review.opendev.org/#/c/692120/
    Signed-off-by: Kamil Sambor <email address hidden>
    Closes-Bug: #1853000
    Change-Id: I1afb5f2ef31ec61b3b224e5e1672fb9f12bcb110
    (cherry picked from commit ad1ef91aa4158c96356b13291605d8f2044f3069)
    (cherry picked from commit ca335af7cba23a8159cf0d875fb87f37549ced53)
    (cherry picked from commit a429b0cf0b5065b925db26ed6558ff2128d0cbe3)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 10.6.2

This issue was fixed in the openstack/tripleo-heat-templates 10.6.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.3.1

This issue was fixed in the openstack/tripleo-heat-templates 11.3.1 release.

wes hayutin (weshayutin)
Changed in tripleo:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates rocky-eol

This issue was fixed in the openstack/tripleo-heat-templates rocky-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.