Change haproxy health check from TCP to HTTP

Bug #1394195 reported by Ilya Shakhat
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Aleksandr Didenko
5.1.x
Fix Committed
High
Sergii Golovatiuk

Bug Description

Currently haproxy is configured to use TCP health check for services. Recently we faced an issue when Keystone endpoint hanged and didn't process HTTP connections. At the same time it was not moved out of the round-robin pull since the Python process was still able to establish TCP connections. It's proposed to change health check to be HTTP and use HTTP HEAD request to the root URI.

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "5.1.1"
  api: "1.0"
  build_number: "35"
  build_id: "2014-11-12_15-25-10"
  astute_sha: "702af3db6f5bca92525bc8322d7d5d7675ec857e"
  fuellib_sha: "e5b3de834a400d98d8c6ba416249832a0c16076c"
  ostf_sha: "64cb59c681658a7a55cc2c09d079072a41beb346"
  nailgun_sha: "bbc9dfe78a0c33040dcd16de9a40a3491788719c"
  fuelmain_sha: "e5e534ade6f3765a87feee3d44d39df68ae28f80"

Haproxy config:
root@node-13:~# cat /etc/haproxy/conf.d/020-keystone-1.cfg

listen keystone-1
  bind 172.16.44.221:5000
  bind 192.168.0.2:5000
  balance roundrobin
  option httplog
  server node-8 192.168.0.10:5000 check
  server node-13 192.168.0.15:5000 check
  server node-15 192.168.0.17:5000 check

Mike Scherbakov (mihgen)
Changed in fuel:
milestone: none → 6.0
assignee: nobody → Fuel Library Team (fuel-library)
Changed in fuel:
importance: Undecided → Medium
tags: added: low-hanging-fruit scale
Revision history for this message
Roman Prykhodchenko (romcheg) wrote :

It hits code freeze in 6.0. Moving to 6.1

Changed in fuel:
milestone: 6.0 → 6.1
Revision history for this message
Ilya Shakhat (shakhat) wrote :

The issue is pretty critical and better to be kept in 6.0. Having one of 3 servers in hanging state affects service availability and this needs to be improved.

Changed in fuel:
status: New → Triaged
importance: Medium → Low
importance: Low → High
milestone: 6.1 → 6.0
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

Currently only 'horizon' and 'mysql' backends use L7 checks.

For the following backends:

cinder-api
glance-api
heat-api
heat-api-cfn
heat-api-cloudwatch
keystone-1
keystone-2
neutron
nova-api-2
nova-metadata-api

it could be enabled with the following options right away (tested on live Juno env):

option => ['httpchk', 'httplog','httpclose'],

In addition I suggest to use these options to lower HTTP check rate (default interval is 2s):

balancermember_options => inter 10s fastinter 2s downinter 3s rise 3 fall 3

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Aleksandr Didenko (adidenko)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/136298

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/136298
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=539460f99cba534363650d6fa322cb0bb252d39e
Submitter: Jenkins
Branch: master

commit 539460f99cba534363650d6fa322cb0bb252d39e
Author: Aleksandr Didenko <email address hidden>
Date: Fri Nov 21 12:45:22 2014 +0200

    Change haproxy health check from TCP to HTTP

    Switch from L4 healtch checks to L7 where possible.

    Change-Id: Ie2523d05a6a52cdac08b0d8c70c68642f64bcb8f
    Closes-bug: #1394195

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

This issue also affects 5.1.x, I reproduced it during performance testing: under high load keystone on one controller crashed and begun to return 504 error. But HAProxy still continued to send requests to it:

root@node-2:~# echo "show stat" | nc -U /var/lib/haproxy/stats | awk -F',' '{if (NR == 1 || $1 ~/keystone/){printf "%14s %14s %10s %10s %10s %10s %10s %10s %10s %10s %10s\n", $1,$2,$8,$13,$14,$15,$18,$22,$23,$25,$28}}'
      # pxname svname stot ereq econ eresp status chkfail chkdown downtime iid
    keystone-1 FRONTEND 12898 0 OPEN 3
    keystone-1 node-2 4308 0 769 UP 0 0 0 3
    keystone-1 node-3 4295 0 0 UP 0 1 809 3
    keystone-1 node-6 4295 0 0 UP 0 1 817 3
    keystone-1 BACKEND 12898 0 769 UP 0 0 3
    keystone-2 FRONTEND 135668 0 OPEN 4
    keystone-2 node-2 44786 0 8305 UP 147 16 191 4
    keystone-2 node-3 46007 0 6 UP 0 1 807 4
    keystone-2 node-6 46011 0 8 UP 0 1 817 4
    keystone-2 BACKEND 135668 0 8319 UP 0 0 4

So lots of attempts to create new instances failed:

2014-12-02 17:52:14.069 20474 INFO rally.benchmark.runners.base [-] Task b40d77e3-e267-4fa2-9bdd-9f10ccb9c68e | ITER: 846 END: Error GetResourceFailure: Failed to get the resource <Server: rally_novaserver_jpxlzicqprsefvsy>: The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-6829db25-b851-44db-aa6b-2f37dbb890c3)
2014-12-02 17:52:14.069 20474 INFO rally.benchmark.runners.base [-] Task b40d77e3-e267-4fa2-9bdd-9f10ccb9c68e | ITER: 923 START
2014-12-02 17:52:14.247 20476 INFO rally.benchmark.runners.base [-] Task b40d77e3-e267-4fa2-9bdd-9f10ccb9c68e | ITER: 818 END: Error GatewayTimeout: Gateway Timeout (HTTP 504)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/5.1)

Fix proposed to branch: stable/5.1
Review: https://review.openstack.org/138713

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/5.1)

Reviewed: https://review.openstack.org/138713
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=bc49271672643d79bb61929897d936c8e98615dd
Submitter: Jenkins
Branch: stable/5.1

commit bc49271672643d79bb61929897d936c8e98615dd
Author: Aleksandr Didenko <email address hidden>
Date: Fri Nov 21 12:45:22 2014 +0200

    Change haproxy health check from TCP to HTTP

    Switch from L4 healtch checks to L7 where possible.

    Change-Id: Ie2523d05a6a52cdac08b0d8c70c68642f64bcb8f
    Closes-bug: #1394195

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.