keepalived haproxy check script too simple
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack-Ansible |
Fix Released
|
Medium
|
Chenjun Shen |
Bug Description
location: playbooks/
This part:
keepalived_scripts:
haproxy_
check_script: "killall -0 haproxy"
We found "killall -0 haproxy" is too simple, and might cause problem.
For example:
In our envs, we provide load balance as a service in neutron. In the neutron-agent container, there is also a haproxy process running there. Even we stop the haproxy on the control nodes, haproxy process in the container still lists in the ps output of control nodes.
Then "killall -0 haproxy" will still give exit code = 0, which makes keepalived not recognize that haproxy process is down.
We suggest to change the check script as below, which only checks the process running locally, not influenced by the one in the container:
keepalived_scripts:
haproxy_
check_script: "service haproxy status"
If this would be accepted, I could also provide a patch or whatever as wished.
Best regards,
Chenjun Shen
Changed in openstack-ansible: | |
status: | New → Confirmed |
importance: | Undecided → Medium |
assignee: | nobody → Jean-Philippe Evrard (jean-philippe-evrard) |
tags: | added: low-hanging-fruit |
Changed in openstack-ansible: | |
assignee: | Jean-Philippe Evrard (jean-philippe-evrard) → Chenjun Shen (cshen) |
status: | Confirmed → In Progress |
If collocating haproxies, that could indeed be a problem (which is gonna be happening with neutron metadata or LBaaS).
We could move keepalived and haproxy into containers, or change the check script to be namespaced. I haven't checked how "service haproxy status" works under systemd. Probably worth investigating too.