keepalived haproxy check script too simple

Bug #1674742 reported by Chenjun Shen
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Medium
Chenjun Shen

Bug Description

location: playbooks/vars/configs/keepalived_haproxy.yml

This part:

keepalived_scripts:
  haproxy_check_script:
    check_script: "killall -0 haproxy"

We found "killall -0 haproxy" is too simple, and might cause problem.

For example:
In our envs, we provide load balance as a service in neutron. In the neutron-agent container, there is also a haproxy process running there. Even we stop the haproxy on the control nodes, haproxy process in the container still lists in the ps output of control nodes.

Then "killall -0 haproxy" will still give exit code = 0, which makes keepalived not recognize that haproxy process is down.

We suggest to change the check script as below, which only checks the process running locally, not influenced by the one in the container:

keepalived_scripts:
  haproxy_check_script:
    check_script: "service haproxy status"

If this would be accepted, I could also provide a patch or whatever as wished.

Best regards,
Chenjun Shen

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

If collocating haproxies, that could indeed be a problem (which is gonna be happening with neutron metadata or LBaaS).

We could move keepalived and haproxy into containers, or change the check script to be namespaced. I haven't checked how "service haproxy status" works under systemd. Probably worth investigating too.

Revision history for this message
Chenjun Shen (cshen) wrote :

Hi Jean,

"service haproxy status" worked for my company under Ubuntu 16.04.

Alternative will be "kill -0 `cat /var/run/haproxy.pid`" under Ubuntu/Systemd, which also worked for us.

Changed in openstack-ansible:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Jean-Philippe Evrard (jean-philippe-evrard)
tags: added: low-hanging-fruit
Revision history for this message
Chenjun Shen (cshen) wrote :
Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

Thank you!

Changed in openstack-ansible:
assignee: Jean-Philippe Evrard (jean-philippe-evrard) → Chenjun Shen (cshen)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (master)

Reviewed: https://review.openstack.org/451409
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=6d757cade5fe4fcefa83ff27f134787e6881e6d2
Submitter: Jenkins
Branch: master

commit 6d757cade5fe4fcefa83ff27f134787e6881e6d2
Author: Chenjun Shen <email address hidden>
Date: Wed Mar 29 15:37:09 2017 +0200

    Update haproxy check script in keepalived.

    Fixes-Bug: 1674742

    Change-Id: If4bdc9d8882ba902e14508e3c28a78b8c0f74b09

Changed in openstack-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/463869

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (stable/ocata)

Reviewed: https://review.openstack.org/463869
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=3ac6ae0d4be619596a04c76d70bb7156df0beb95
Submitter: Jenkins
Branch: stable/ocata

commit 3ac6ae0d4be619596a04c76d70bb7156df0beb95
Author: Chenjun Shen <email address hidden>
Date: Wed Mar 29 15:37:09 2017 +0200

    Update haproxy check script in keepalived.

    Fixes-Bug: 1674742

    Change-Id: If4bdc9d8882ba902e14508e3c28a78b8c0f74b09
    (cherry picked from commit 6d757cade5fe4fcefa83ff27f134787e6881e6d2)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/513814

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (stable/newton)

Reviewed: https://review.openstack.org/513814
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=0513d1cd6858d71fc50e65c14c987b0edfc128ae
Submitter: Zuul
Branch: stable/newton

commit 0513d1cd6858d71fc50e65c14c987b0edfc128ae
Author: Chenjun Shen <email address hidden>
Date: Wed Mar 29 15:37:09 2017 +0200

    Update haproxy check script in keepalived.

    Fixes-Bug: 1674742

    Change-Id: If4bdc9d8882ba902e14508e3c28a78b8c0f74b09
    (cherry picked from commit 6d757cade5fe4fcefa83ff27f134787e6881e6d2)

tags: added: in-stable-newton
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.