Comment 4 for bug 619712

Revision history for this message
Andres Rodriguez (andreserl) wrote :

After hours of trying to reproduce this situation as specified above, I cannot say this actually fix the situation completely, and I can say that I could not reproduce it exactly as described above.

My setup was 3 Servers, using KVM, with eth0 in a bridged network,and eth1 in a NATed network. Config done as follows:

global_defs {
   router_id UBUNTULVS1
}

vrrp_sync_group VG1 {
   group {
      VI_IP1
      VI_IP2
   }
}

vrrp_instance VI_IP1 {
    state BACKUP
    interface eth0
    virtual_router_id 50
    priority 250
    virtual_ipaddress {
        192.168.1.100/24 dev eth0
    }
    preempt_delay 300
}

vrrp_instance VI_IP2 {
    state BACKUP
    interface eth1
    virtual_router_id 51
    priority 250
    virtual_ipaddress {
        192.168.122.100/24 dev eth1
    }
    preempt_delay 300
}

====================================

global_defs {
   router_id UBUNTULVS2
}

vrrp_sync_group VG1 {
   group {
      VI_IP1
      VI_IP2
   }
}

vrrp_instance VI_IP1 {
    state BACKUP
    interface eth0
    virtual_router_id 50
    priority 200
    virtual_ipaddress {
        192.168.1.100/24 dev eth0
    }
    preempt_delay 300
}

vrrp_instance VI_IP2 {
    state BACKUP
    interface eth1
    virtual_router_id 51
    priority 200
    virtual_ipaddress {
        192.168.122.100/24 dev eth1
    }
    preempt_delay 300
}

=========================

global_defs {
   router_id UBUNTULVS3
}

vrrp_sync_group VG1 {
   group {
      VI_IP1
      VI_IP2
   }
}

vrrp_instance VI_IP1 {
    state MASTER
    interface eth0
    virtual_router_id 50
    priority 150
    virtual_ipaddress {
        192.168.1.100/24 dev eth0
    }
    preempt_delay 300
}

vrrp_instance VI_IP2 {
    state MASTER
    interface eth1
    virtual_router_id 51
    priority 150
    virtual_ipaddress {
        192.168.122.100/24 dev eth1
    }
    preempt_delay 300
}

After trying to reproduce following the steps, I encountered that after starting Machine 3, Machine 2 does indeed display race condition according to logs, *but* only for a while, then it stabilizes and sets to BACKUP state. This means, I could not fully reproduce the bug described above.

After applying the proposed patch, I indeed did not experience the race condition (temporary in my case). However, I decided to do further testing. For this, I brought down eth1 in Machine1. Then, Machine 3 tried to became MASTER, displaying the race condition mentioned above. Once, I bring up eth1 back in Machine1, this race condition stops in Machine3.

After my testing, I believe the proposed patch seems to be fixing the described race condition in some situations but not in others. However, this might also be related to the use of KVM and the type of networks used.

However, I'd not go ahead and apply this patch to get it into Maverick after actually having Upstream involved and hearing feedback from them, and having some more testing done.

This has been reported upstream as well, however, there hasn't been any response just yet. I'll follow this issue and see what upstream says before actually applying the patch!!

Arjan, thank you for reporting the bug and providing a patch. If you could please continue testing and try to reproduce what I've encountered and provide some feedback, it would be helpful.