Pacemaker default configurations are unreasonable

Bug #1719540 reported by Dan Xu on 2017-09-26
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Medium
Unassigned

Bug Description

Tripleo uses pacemaker's default configurations of virtual ip monitor interval and haproxy monitor interval, which is
sudo pcs resource show ip-192.168.36.17
Resource: ip-192.168.36.17 (class=ocf provider=heartbeat type=IPaddr2)
    Attributes: ip=192.168.36.17 cidr_netmask=32
    Operations: start interval=0s timeout=20s (ip-192.168.36.17-start-interval-0s)
                stop interval=0s timeout=20s (ip-192.168.36.17-stop-interval-0s)
                monitor interval=10s timeout=20s (ip-192.168.36.17-monitor-interval-10s)

sudo pcs resource show haproxy
Resource: haproxy (class=systemd type=haproxy)
    Operations: start interval=0s timeout=200s (haproxy-start-interval-0s)
                stop interval=0s timeout=200s (haproxy-stop-interval-0s)
                monitor interval=60s (haproxy-monitor-interval-60s)

The first configuration will make the outage time of openstack service to be more than 20s.
The second configuration may make the outage time of haproxy processes to be 60s (the worst case).

All these make the HA not very effective (outage time is too long).

Changed in tripleo:
milestone: none → queens-1
importance: Undecided → Medium
status: New → Triaged
Michele Baldessari (michele) wrote :

Can you expand here in detail what failure scenario is suboptimal?

When the VIP moves it does not really matter if haproxy still needs to be monitored or not, because connections will go to the moved VIP. Are you saying the VIP timeouts need to be lower?

Changed in tripleo:
milestone: queens-1 → queens-2
Tim Rozet (trozet) wrote :

Yes I think the VIP monitor interval and timeout needs to be lower. It should be as low as possible, without adverse effects. Right now if the interval is at 10s, and timeout 20s, it seems possible that a VIP failure could take 30 seconds to failover. Right? I hope we can lower this to at least only a couple seconds.

Dan Xu (xudan) wrote :

Agree with Tim Rozet.

Changed in tripleo:
milestone: queens-2 → queens-3
Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1

Reviewed: https://review.openstack.org/554672
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=8975100f773a2dcfdc39eaf6e979a763dae82866
Submitter: Zuul
Branch: master

commit 8975100f773a2dcfdc39eaf6e979a763dae82866
Author: Michele Baldessari <email address hidden>
Date: Tue Mar 20 20:29:51 2018 +0100

    Allow VIP resource to have customized ops

    Introduce meta_params and op_params which will be
    passed to the pcs resource create comand when creating
    the VIPs. This allows us to specify custom ops for VIPs
    in the following manner, by setting hiera. E.g.:
    tripleo::profile::pacemaker::haproxy_bundle::op_params: 'start timeout=200s stop timeout=200s monitor timeout=5s'

    Change-Id: I9a1c700051fc6dfc302e1d94347df2956442354e
    Depends-On: Iadf0cd3805f72141563707f43130945c9d362f5c
    Related-Bug: #1719540

Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers