[zabbix plugin] VIP becomes unavailable after its Controller reboot if Zabbix with OVS bridges are used
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Released
|
High
|
Olivier Bourdon |
Bug Description
=== Environment ===
MOS: 9.0
Mode: HA
Zabbix: 2.5.1
Network template: br-mgmt and br-ex use OVS provider
=== Description ===
After reboot of the Controller running VIP, this VIP becomes unreachable
=== Steps to reproduce ===
Create an HA environment with Zabbix plugin enabled
[root@fuel ~]# fuel plugins
id | name | version | package_version | releases
---+---
1 | zabbix_monitoring | 2.5.1 | 3.0.0 | ubuntu (2015.1.0-7.0, liberty-8.0, liberty-9.0, mitaka-9.0)
Prepare and upload a network template which makes "br-mgmt" and "br-ex" bridges use OVS provider. Deploy the environment
...
- action: add-br
name: br-mgmt
provider: ovs
...
- action: add-br
name: br-ex
provider: ovs
...
Determine which Controller node is running VIP
# pcs status | grep "vip__management"
vip__management (ocf::fuel:
Reboot this Controller
root@cic-0-1:~# reboot
Broadcast message from <email address hidden>
Ensure that VIP has migrated to another Controller
# pcs status | grep "vip__management"
vip__management (ocf::fuel:
Wait for the initial Controller is back from reboot
root@cic-0-1:~# uptime
11:46:26 up 3 min, 1 user, load average: 8.12, 2.66, 0.94
During the boot, OpenvSwitch re-created the bridges according to ovsdb records.
There is a record in ovsdb regarding "v_management" port/interface in br-mgmt bridge.
However, there is no actually "v_management" interface in system anymore (because it presents only on node where mgmt VIP is running).
Therefore, br-mgmt bridge looks like that:
root@cic-0-1:~# ovs-vsctl show
.....
Bridge br-mgmt
.....
Port v_management
.....
root@cic-0-1:~/arp# ovs-vsctl list interface v_management
...
admin_state : []
...
error : "could not open network device v_management (No such device)"
...
name : v_management
...
Wait for Pacemaker to migrate the VIP back to the initial Controller according to resource stickiness (or you can migrate it manually to speed up the process)
# pcs status | grep "vip__management"
vip__management (ocf::fuel:
=== Actual behavior ===
Despite the fact that Pacemaker shows VIP as "started", it's not reachable actually
root@cic-0-3:~# ping -DOnv 192.168.2.25
PING 192.168.2.25 (192.168.2.25) 56(84) bytes of data.
[1480020008.953425] no answer yet for icmp_seq=1
[1480020009.953287] no answer yet for icmp_seq=2
[1480020010.952807] no answer yet for icmp_seq=3
[1480020011.959394] no answer yet for icmp_seq=4
=== Expected behavior ===
Normally, after the migration, Pacemaker starts monitoring of VIP resource and detects that it doesn't respond:
less /var/log/daemon.log
...
ocf-ns_IPaddr2: ERROR: ARPING 192.168.2.25 from 192.168.2.6 br-ex Sent 3 probes (3 broadcast(s)) Received 0 response(s)
...
After the monitoring detected the failure, Pacemaker restart the resource, which includes deleting ports/interfaces from the bridge and inserting them back again.
It leads OVS bridge to correct state:
root@cic-0-1:~# ovs-vsctl show
.....
Bridge br-mgmt
.....
Port v_management
.....
And it makes VIP actually available:
root@cic-0-3:~# ping -DOnv 192.168.2.25
PING 192.168.2.25 (192.168.2.25) 56(84) bytes of data.
[1480020001.956581] 64 bytes from 192.168.2.25: icmp_seq=1 ttl=64 time=0.438 ms
[1480020002.956785] 64 bytes from 192.168.2.25: icmp_seq=2 ttl=64 time=0.332 ms
[1480020003.955785] 64 bytes from 192.168.2.25: icmp_seq=3 ttl=64 time=0.368 ms
[1480020004.954933] 64 bytes from 192.168.2.25: icmp_seq=4 ttl=64 time=0.490 ms
[1480020005.953754] 64 bytes from 192.168.2.25: icmp_seq=5 ttl=64 time=0.352 ms
[1480020006.953911] 64 bytes from 192.168.2.25: icmp_seq=6 ttl=64 time=0.332 ms
=== Possible reason ===
Probably, Zabbix VIP responds to Pacemaker's ARPING requests instead of mgmt-VIP which causes false-positive monitoring results
less /var/log/daemon.log
...
2016-11-
Unicast reply from 192.168.2.26 [9A:F3:C2:78:81:5B] 0.816ms \
Unicast reply from 192.168.2.26 [9A:F3:C2:78:81:5B] 0.570ms \
Unicast reply from 192.168.2.26 [9A:F3:C2:78:81:5B] 0.540ms \
Sent 3 probes (1 broadcast(s)) Received 3 response(s)
...
Changed in fuel: | |
assignee: | nobody → Fuel Plugin Zabbix (fuel-plugin-zabbix) |
status: | New → Confirmed |
Changed in fuel: | |
assignee: | Fuel Plugin Zabbix (fuel-plugin-zabbix) → Fuel Sustaining (fuel-sustaining-team) |
Changed in fuel: | |
assignee: | Fuel Sustaining (fuel-sustaining-team) → Stanislaw Bogatkin (sbogatkin) |
Changed in fuel: | |
assignee: | Stanislaw Bogatkin (sbogatkin) → Fuel Sustaining (fuel-sustaining-team) |
Changed in fuel: | |
assignee: | Fuel Sustaining (fuel-sustaining-team) → Stanislaw Bogatkin (sbogatkin) |
Changed in fuel: | |
status: | Confirmed → In Progress |
Changed in fuel: | |
status: | In Progress → Fix Committed |
Changed in fuel: | |
status: | Fix Committed → Fix Released |
sla1 for 9.0-updates