After hours of trying to reproduce this situation as specified above, I cannot say this actually fix the situation completely, and I can say that I could not reproduce it exactly as described above.
My setup was 3 Servers, using KVM, with eth0 in a bridged network,and eth1 in a NATed network. Config done as follows:
global_defs {
router_id UBUNTULVS1
}
vrrp_sync_group VG1 {
group {
VI_IP1
VI_IP2
}
}
vrrp_instance VI_IP1 {
state BACKUP
interface eth0
virtual_router_id 50
priority 250
virtual_ipaddress { 192.168.1.100/24 dev eth0
}
preempt_delay 300
}
vrrp_instance VI_IP2 {
state BACKUP
interface eth1
virtual_router_id 51
priority 250
virtual_ipaddress { 192.168.122.100/24 dev eth1
}
preempt_delay 300
}
====================================
global_defs {
router_id UBUNTULVS2
}
vrrp_sync_group VG1 {
group {
VI_IP1
VI_IP2
}
}
vrrp_instance VI_IP1 {
state BACKUP
interface eth0
virtual_router_id 50
priority 200
virtual_ipaddress { 192.168.1.100/24 dev eth0
}
preempt_delay 300
}
vrrp_instance VI_IP2 {
state BACKUP
interface eth1
virtual_router_id 51
priority 200
virtual_ipaddress { 192.168.122.100/24 dev eth1
}
preempt_delay 300
}
=========================
global_defs {
router_id UBUNTULVS3
}
vrrp_sync_group VG1 {
group {
VI_IP1
VI_IP2
}
}
vrrp_instance VI_IP1 {
state MASTER
interface eth0
virtual_router_id 50
priority 150
virtual_ipaddress { 192.168.1.100/24 dev eth0
}
preempt_delay 300
}
vrrp_instance VI_IP2 {
state MASTER
interface eth1
virtual_router_id 51
priority 150
virtual_ipaddress { 192.168.122.100/24 dev eth1
}
preempt_delay 300
}
After trying to reproduce following the steps, I encountered that after starting Machine 3, Machine 2 does indeed display race condition according to logs, *but* only for a while, then it stabilizes and sets to BACKUP state. This means, I could not fully reproduce the bug described above.
After applying the proposed patch, I indeed did not experience the race condition (temporary in my case). However, I decided to do further testing. For this, I brought down eth1 in Machine1. Then, Machine 3 tried to became MASTER, displaying the race condition mentioned above. Once, I bring up eth1 back in Machine1, this race condition stops in Machine3.
After my testing, I believe the proposed patch seems to be fixing the described race condition in some situations but not in others. However, this might also be related to the use of KVM and the type of networks used.
However, I'd not go ahead and apply this patch to get it into Maverick after actually having Upstream involved and hearing feedback from them, and having some more testing done.
This has been reported upstream as well, however, there hasn't been any response just yet. I'll follow this issue and see what upstream says before actually applying the patch!!
Arjan, thank you for reporting the bug and providing a patch. If you could please continue testing and try to reproduce what I've encountered and provide some feedback, it would be helpful.
After hours of trying to reproduce this situation as specified above, I cannot say this actually fix the situation completely, and I can say that I could not reproduce it exactly as described above.
My setup was 3 Servers, using KVM, with eth0 in a bridged network,and eth1 in a NATed network. Config done as follows:
global_defs {
router_id UBUNTULVS1
}
vrrp_sync_group VG1 {
group {
VI_IP1
VI_IP2
}
}
vrrp_instance VI_IP1 { router_ id 50 ipaddress {
192.168. 1.100/24 dev eth0
state BACKUP
interface eth0
virtual_
priority 250
virtual_
}
preempt_delay 300
}
vrrp_instance VI_IP2 { router_ id 51 ipaddress {
192.168. 122.100/ 24 dev eth1
state BACKUP
interface eth1
virtual_
priority 250
virtual_
}
preempt_delay 300
}
======= ======= ======= ======= ======= =
global_defs {
router_id UBUNTULVS2
}
vrrp_sync_group VG1 {
group {
VI_IP1
VI_IP2
}
}
vrrp_instance VI_IP1 { router_ id 50 ipaddress {
192.168. 1.100/24 dev eth0
state BACKUP
interface eth0
virtual_
priority 200
virtual_
}
preempt_delay 300
}
vrrp_instance VI_IP2 { router_ id 51 ipaddress {
192.168. 122.100/ 24 dev eth1
state BACKUP
interface eth1
virtual_
priority 200
virtual_
}
preempt_delay 300
}
======= ======= ======= ====
global_defs {
router_id UBUNTULVS3
}
vrrp_sync_group VG1 {
group {
VI_IP1
VI_IP2
}
}
vrrp_instance VI_IP1 { router_ id 50 ipaddress {
192.168. 1.100/24 dev eth0
state MASTER
interface eth0
virtual_
priority 150
virtual_
}
preempt_delay 300
}
vrrp_instance VI_IP2 { router_ id 51 ipaddress {
192.168. 122.100/ 24 dev eth1
state MASTER
interface eth1
virtual_
priority 150
virtual_
}
preempt_delay 300
}
After trying to reproduce following the steps, I encountered that after starting Machine 3, Machine 2 does indeed display race condition according to logs, *but* only for a while, then it stabilizes and sets to BACKUP state. This means, I could not fully reproduce the bug described above.
After applying the proposed patch, I indeed did not experience the race condition (temporary in my case). However, I decided to do further testing. For this, I brought down eth1 in Machine1. Then, Machine 3 tried to became MASTER, displaying the race condition mentioned above. Once, I bring up eth1 back in Machine1, this race condition stops in Machine3.
After my testing, I believe the proposed patch seems to be fixing the described race condition in some situations but not in others. However, this might also be related to the use of KVM and the type of networks used.
However, I'd not go ahead and apply this patch to get it into Maverick after actually having Upstream involved and hearing feedback from them, and having some more testing done.
This has been reported upstream as well, however, there hasn't been any response just yet. I'll follow this issue and see what upstream says before actually applying the patch!!
Arjan, thank you for reporting the bug and providing a patch. If you could please continue testing and try to reproduce what I've encountered and provide some feedback, it would be helpful.