When Zabbix plugin restarts HAproxy, the routes disappear from "haproxy" namespace

Bug #1794025 reported by Alexander Rubtsov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Oleksiy Molchanov

Bug Description

--- Environment ---
MOS: 9.2
Zabbix plugin: 2.5.5

--- Description ---
From ./plugin_zabbix/manifests/ha/haproxy.pp:
...
class plugin_zabbix::ha::haproxy {

  include openstack::ha::haproxy_restart
...

From /etc/puppet/modules/openstack/manifests/ha/haproxy_restart.pp:
class openstack::ha::haproxy_restart {
  exec { 'haproxy-restart':
    command => '/usr/lib/ocf/resource.d/fuel/ns_haproxy reload',
    environment => ['OCF_ROOT=/usr/lib/ocf'],
    path => '/usr/bin:/usr/sbin:/bin:/sbin',
    logoutput => true,
    provider => 'shell',
    tries => 10,
    try_sleep => 10,
    returns => [0, ''],
    refreshonly => true,
  }
}

The only passed variable here is "OCF_ROOT". There is no variable "other_networks"
Such call of haproxy restart flushes the routes (https://bugs.launchpad.net/fuel/+bug/1652765) however then re-creates not all of them.

--- Steps to reproduce ---
1) Identify the value of "other_networks" variable
# crm configure show | grep -A5 "primitive p_haproxy"
primitive p_haproxy ocf:fuel:ns_haproxy \
        params debug=false ns=haproxy other_networks="172.16.34.0/24 10.20.17.0/24 192.168.0.0/24 192.168.1.0/24" \
        meta failure-timeout=120 migration-threshold=3 \
        op monitor interval=30 timeout=60 \
        op start interval=0 timeout=60 \
        op stop interval=0 timeout=60

2) Identify the current routes
# ip netns exec haproxy ip route show
default via 172.16.34.1 dev b_public metric 10
default via 240.0.0.1 dev hapr-ns metric 10000
10.20.17.0/24 via 240.0.0.1 dev hapr-ns metric 10000
172.16.34.0/24 dev b_public proto kernel scope link src 172.16.34.3
172.16.34.0/24 via 240.0.0.1 dev hapr-ns metric 10000
192.168.0.0/24 dev b_management proto kernel scope link src 192.168.0.2
192.168.0.0/24 via 240.0.0.1 dev hapr-ns metric 10000
192.168.1.0/24 via 240.0.0.1 dev hapr-ns metric 10000
240.0.0.0/30 dev hapr-ns proto kernel scope link src 240.0.0.2

3) Reload HAproxy by the OCF script
# export OCF_ROOT=/usr/lib/ocf
# /usr/lib/ocf/resource.d/fuel/ns_haproxy reload

4) Check the routes list again
# ip netns exec haproxy ip route show

--- Actual behavior ---
The routes for the networks from "other_networks" variable are missing:
# ip netns exec haproxy ip route show
default via 240.0.0.1 dev hapr-ns metric 10000
172.16.34.0/24 dev b_public proto kernel scope link src 172.16.34.3
192.168.0.0/24 dev b_management proto kernel scope link src 192.168.0.2
240.0.0.0/30 dev hapr-ns proto kernel scope link src 240.0.0.2

--- Expected behavior ---
The routes for the networks from "other_networks" variable have been re-created:
default via 172.16.34.1 dev b_public metric 10
default via 240.0.0.1 dev hapr-ns metric 10000
10.20.17.0/24 via 240.0.0.1 dev hapr-ns metric 10000
172.16.34.0/24 dev b_public proto kernel scope link src 172.16.34.3
172.16.34.0/24 via 240.0.0.1 dev hapr-ns metric 10000
192.168.0.0/24 dev b_management proto kernel scope link src 192.168.0.2
192.168.0.0/24 via 240.0.0.1 dev hapr-ns metric 10000
192.168.1.0/24 via 240.0.0.1 dev hapr-ns metric 10000
240.0.0.0/30 dev hapr-ns proto kernel scope link src 240.0.0.2

--- Notes ---
The expected behavior can be reached if HAproxy is restarted by Pacemaker CLI
# pcs resource disable p_haproxy
# pcs resource enable p_haproxy
# ip netns exec haproxy ip route show
default via 172.16.34.1 dev b_public metric 10
default via 240.0.0.1 dev hapr-ns metric 10000
10.20.17.0/24 via 240.0.0.1 dev hapr-ns metric 10000
172.16.34.0/24 dev b_public proto kernel scope link src 172.16.34.3
172.16.34.0/24 via 240.0.0.1 dev hapr-ns metric 10000
192.168.0.0/24 dev b_management proto kernel scope link src 192.168.0.2
192.168.0.0/24 via 240.0.0.1 dev hapr-ns metric 10000
192.168.1.0/24 via 240.0.0.1 dev hapr-ns metric 10000
240.0.0.0/30 dev hapr-ns proto kernel scope link src 240.0.0.2

Revision history for this message
Alexander Rubtsov (arubtsov) wrote :

sla1 for 9.0-updates

Changed in fuel:
importance: Undecided → High
assignee: nobody → MOS Maintenance (mos-maintenance)
milestone: none → 9.x-updates
tags: added: customer-found sla1
Changed in fuel:
assignee: MOS Maintenance (mos-maintenance) → Vladimir Khlyunev (vkhlyunev)
status: New → Confirmed
Changed in fuel:
milestone: 9.x-updates → 9.2-mu-9
Changed in fuel:
assignee: Vladimir Khlyunev (vkhlyunev) → Oleksiy Molchanov (omolchanov)
status: Confirmed → In Progress
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/fuel-library (9.0/mitaka)

Fix proposed to branch: 9.0/mitaka
Change author: Oleksiy Molchanov <email address hidden>
Review: https://review.fuel-infra.org/39338

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/fuel-library (9.0/mitaka)

Reviewed: https://review.fuel-infra.org/39338
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0/mitaka

Commit: b27b4746892840f9cb03c4c0732fd29888a6b4cc
Author: Oleksiy Molchanov <email address hidden>
Date: Thu Sep 27 14:07:23 2018

Update haproxy restart approach

Closes-Bug: 1794025
Change-Id: Id2fabdc03c5cda7bfe8af950db3b56d245d14493

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Mikhail Samoylov (msamoylov) wrote :
Download full text (23.0 KiB)

Verified. But after discussing with O. Molchanov we change reload to restart in verification steps.
Steps for reproduce:
root@node-1:~# crm configure show | grep -A5 "primitive p_haproxy"
primitive p_haproxy ocf:fuel:ns_haproxy \
 params debug=false ns=haproxy other_networks="10.109.3.0/24 10.109.0.0/24 10.109.1.0/24 10.109.2.0/24" \
 meta failure-timeout=120 migration-threshold=3 \
 op monitor interval=30 timeout=60 \
 op start interval=0 timeout=60 \
 op stop interval=0 timeout=60
root@node-1:~# ip netns exec haproxy ip route show
default via 240.0.0.1 dev hapr-ns metric 10000
10.109.0.0/24 via 240.0.0.1 dev hapr-ns metric 10000
10.109.1.0/24 dev b_management proto kernel scope link src 10.109.1.3
10.109.1.0/24 via 240.0.0.1 dev hapr-ns metric 10000
10.109.2.0/24 via 240.0.0.1 dev hapr-ns metric 10000
10.109.3.0/24 dev b_public proto kernel scope link src 10.109.3.3
10.109.3.0/24 via 240.0.0.1 dev hapr-ns metric 10000
240.0.0.0/30 dev hapr-ns proto kernel scope link src 240.0.0.2
root@node-1:~# export OCF_ROOT=/usr/lib/ocf
root@node-1:~# /usr/lib/ocf/resource.d/fuel/ns_haproxy reload
ocf-ns_haproxy: DEBUG: default: check_ns(): recieved netns list: haproxy
ocf-ns_haproxy: DEBUG: default: get_variables(): set up variables and PIDFILE name
ocf-ns_haproxy: DEBUG: default: check_ns(): recieved netns list: haproxy
ocf-ns_haproxy: DEBUG: default: get_variables(): set up variables and PIDFILE name
ocf-ns_haproxy: INFO: haproxy daemon running
ocf-ns_haproxy: INFO: Blocked all SYN for the Haproxy reload operation
ocf-ns_haproxy: INFO: Unblocked all SYN for the Haproxy reload operation
ocf-ns_haproxy: DEBUG: Bringing up host interface: hapr-host
ocf-ns_haproxy: DEBUG: Bringing up the namespace interface: hapr-ns
ocf-ns_haproxy: DEBUG: Flushing global scope routes
ocf-ns_haproxy: DEBUG: Creating default route inside the namespace to 240.0.0.1 with metric 10000
ocf-ns_haproxy: INFO: net.ipv4.conf.hapr-host.rp_filter = 2
ocf-ns_haproxy: INFO: net.ipv4.conf.all.rp_filter = 2
root@node-1:~# ip netns exec haproxy ip route show
default via 240.0.0.1 dev hapr-ns metric 10000
10.109.1.0/24 dev b_management proto kernel scope link src 10.109.1.3
10.109.3.0/24 dev b_public proto kernel scope link src 10.109.3.3
240.0.0.0/30 dev hapr-ns proto kernel scope link src 240.0.0.2
root@node-1:~# crm configure show | grep -A5 "primitive p_haproxy"
primitive p_haproxy ocf:fuel:ns_haproxy \
 params debug=false ns=haproxy other_networks="10.109.3.0/24 10.109.0.0/24 10.109.1.0/24 10.109.2.0/24" \
 meta failure-timeout=120 migration-threshold=3 \
 op monitor interval=30 timeout=60 \
 op start interval=0 timeout=60 \
 op stop interval=0 timeout=60
root@node-1:~# dpkg -l | grep fuel
ii fuel-ha-utils 9.0.0-1~u14.04+mos8781 all Fuel Library HA utils
ii fuel-misc 9.0.0-1~u14.04+mos8781 all Misc Fuel library scripts
ii fuel-rabbit-fence 9.0.0-1~u14.04+mos8781 all Fuel RabbitMQ fencing utilitites
ii fuel-umm 9.0.0-1~u14.04+mos8781 ...

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.