keepalived vrrp race condition and fix (versions 1.1.17 and 1.2.0 but perhaps all?)

Bug #619712 reported by Arjan on 2010-08-18
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
Undecided
Unassigned
keepalived (Ubuntu)
High
Canonical Server Team
Maverick
High
Andres Rodriguez

Bug Description

Binary package hint: keepalived

Hello,

About a keepalived vrrp race condition and fix (versions 1.1.17 and 1.2.0)
and probably all other versions.

Impact: possible complete/partial datacenter disruptions due to gratious
arp/multicast floods

The situation to reproduce the race condition

The race condition got triggered in our situation with 1 cluster of 4
machines, sharing 3 (vlan) interfaces.

Reason for doing doing so, not really importand, using vrrp and redundant
cluster (2x2) for changing the default gateway, in case of a lease line
failure.

The most basic setup, and easyest way to trigger is a subset of above
-3 machines, with keepalived
-2 interfaces/machine, defined in keepalived as vrrp_instance
-1 vrrp_sync_group, with those interfaces included

some key settings:

machine1:
 initial state: state BACKUP
 int 1, prio 250
 int 2, prio 250
machine2:
 initial state: state BACKUP
 int 1, prio 200
 int 2, prio 200
machine3:
 initial state: state MASTER
 int 1, prio 150
 int 2, prio 150

Machine3 is set to master, just to trigger the race condition, but in
initial state BACKUP, a connection failure (without a link down) will do the
same, which is the situation we encountered.

WARNING: below may/will cause network disruption, so be prepared.

The way to trigger, is having started keepalived on machine 1 & 2, where
machine 1 gently becomes a master as meant/configured.

Now, start machine3, and here it happens, as soon machine sends a single
vrrp message (as it want's to be master), machine 1 & 2 get's crazy(haywire), and
come in a race condition which will never stop, until you stop one of
machine 1 or 2. Stopping machine 3 will definitly NOT stop the race.

A sample of the relevant logging of the race:

machine 1:
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received lower prio advert, forcing new election
Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received lower prio advert, forcing new election

machine 2:
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV610) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV610) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV610) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV610) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV610) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a new MASTER election
Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to MASTER state

After some review of the keepalived code, and analyzed traces, and
researching the situation, i came to a fix of the issue. Be aware i've
never looked at the keepalived code before, so perhaps, there may be a more
ultimate fix.

This patch prevents sending another vrrp message in the
vrrp_sync.c: vrrp_sync_master_election() when already in the isync->wantstate is in a
VRRP_STATE_GOTO_MASTER state.

old code in vrrp_sync.c:
void
vrrp_sync_master_election(vrrp_rt * vrrp)
{
        vrrp_rt *isync;
        vrrp_sgroup *vgroup = vrrp->sync;
        list l = vgroup->index_list;
        element e;

        if (vrrp->wantstate != VRRP_STATE_GOTO_MASTER)
                return;
        if (GROUP_STATE(vgroup) == VRRP_STATE_FAULT)
                return;

        log_message(LOG_INFO, "VRRP_Group(%s) Transition to MASTER state",
               GROUP_NAME(vgroup));

        /* Perform sync index */
        for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {
                isync = ELEMENT_DATA(e);
                if (isync != vrrp) {
                        /* Force a new protocol master election */
                        isync->wantstate = VRRP_STATE_GOTO_MASTER;
                        log_message(LOG_INFO,
                               "VRRP_Instance(%s) forcing a new MASTER election",
                               isync->iname);
                        vrrp_send_adv(isync, isync->effective_priority);
                }
        }
}

patched code in vrrp_sync.c:
void
vrrp_sync_master_election(vrrp_rt * vrrp)
{
        vrrp_rt *isync;
        vrrp_sgroup *vgroup = vrrp->sync;
        list l = vgroup->index_list;
        element e;

        if (vrrp->wantstate != VRRP_STATE_GOTO_MASTER)
                return;
        if (GROUP_STATE(vgroup) == VRRP_STATE_FAULT)
                return;

        log_message(LOG_INFO, "VRRP_Group(%s) Transition to MASTER state",
               GROUP_NAME(vgroup));

        /* Perform sync index */
        for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {
                isync = ELEMENT_DATA(e);
                if (isync != vrrp) {
                        if ( isync->wantstate != VRRP_STATE_GOTO_MASTER ) { // Fix ArFi 2010/08/16 to prevent VRRP loops with 3 or more members and 2 or more shared interfaces in 1 group
                        /* Force a new protocol master election */
                        isync->wantstate = VRRP_STATE_GOTO_MASTER;
                        log_message(LOG_INFO,
                               "VRRP_Instance(%s) forcing a new MASTER election",
                               isync->iname);
                        vrrp_send_adv(isync, isync->effective_priority);
                        }
                }
        }
}

diff keepalived-1.1.17:

# diff -u vrrp_sync.c.orig_ vrrp_sync.c
--- vrrp_sync.c.orig_ 2010-08-17 14:12:30.944634215 +0200
+++ vrrp_sync.c 2010-08-17 14:15:05.964634624 +0200
@@ -157,12 +157,14 @@
  for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {
   isync = ELEMENT_DATA(e);
   if (isync != vrrp) {
+ if ( isync->wantstate != VRRP_STATE_GOTO_MASTER ) { // Fix ArFi 2010/08/16 to prevent VRRP loops with 3 or more members and 2 or more shared interfaces in 1 group
    /* Force a new protocol master election */
    isync->wantstate = VRRP_STATE_GOTO_MASTER;
    log_message(LOG_INFO,
           "VRRP_Instance(%s) forcing a new MASTER election",
           isync->iname);
    vrrp_send_adv(isync, isync->effective_priority);
+ }
   }
  }
 }

diff keepalived-1.2.0:

# diff -u vrrp_sync.c.orig_ vrrp_sync.c
--- vrrp_sync.c.orig_ 2010-08-18 08:17:52.766910325 +0200
+++ vrrp_sync.c 2010-08-18 08:20:23.216968929 +0200
@@ -155,12 +155,14 @@
  for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {
   isync = ELEMENT_DATA(e);
   if (isync != vrrp) {
+ if ( isync->wantstate != VRRP_STATE_GOTO_MASTER ) { // Fix ArFi 2010/08/16 to prevent VRRP loops with 3 or more members and 2 or more shared interfaces in 1 group
    /* Force a new protocol master election */
    isync->wantstate = VRRP_STATE_GOTO_MASTER;
    log_message(LOG_INFO,
           "VRRP_Instance(%s) forcing a new MASTER election",
           isync->iname);
    vrrp_send_adv(isync, isync->effective_priority);
+ }
   }
  }
 }

This solves our issue, and we did some basic testing with good results, but it may be wise an
experienced keepalived developer should have a look at it too.

PS: marked this as a security vulnerability as it _may_ bring a system/datacenter
down, by just a single (vrrp) packet.
Posted this bugreport as well on the <email address hidden> ML.

Regards,

Arjan Filius
<email address hidden>

Related branches

Arjan (iafilius) on 2010-08-18
visibility: private → public
Serge Hallyn (serge-hallyn) wrote :

Thanks for taking the time to report this bug and helping to make Ubuntu better. Thanks
also for getting the upstream project involved. I will mark this as triaged, with the expectation
that we will merge whatever fixes are needed from upstream.

Changed in keepalived (Ubuntu):
status: New → Triaged
importance: Undecided → High
Thierry Carrez (ttx) wrote :

Andres, could you look into that issue for Maverick and potnetially apply the proposed patch ? If not, please just unassign yourself :)

Changed in keepalived (Ubuntu):
assignee: nobody → Andres Rodriguez (andreserl)
Thierry Carrez (ttx) on 2010-09-02
tags: added: server-mrs
Andres Rodriguez (andreserl) wrote :

HI Thierry,

I'll take care of it!

Andres Rodriguez (andreserl) wrote :
Download full text (3.2 KiB)

After hours of trying to reproduce this situation as specified above, I cannot say this actually fix the situation completely, and I can say that I could not reproduce it exactly as described above.

My setup was 3 Servers, using KVM, with eth0 in a bridged network,and eth1 in a NATed network. Config done as follows:

global_defs {
   router_id UBUNTULVS1
}

vrrp_sync_group VG1 {
   group {
      VI_IP1
      VI_IP2
   }
}

vrrp_instance VI_IP1 {
    state BACKUP
    interface eth0
    virtual_router_id 50
    priority 250
    virtual_ipaddress {
        192.168.1.100/24 dev eth0
    }
    preempt_delay 300
}

vrrp_instance VI_IP2 {
    state BACKUP
    interface eth1
    virtual_router_id 51
    priority 250
    virtual_ipaddress {
        192.168.122.100/24 dev eth1
    }
    preempt_delay 300
}

====================================

global_defs {
   router_id UBUNTULVS2
}

vrrp_sync_group VG1 {
   group {
      VI_IP1
      VI_IP2
   }
}

vrrp_instance VI_IP1 {
    state BACKUP
    interface eth0
    virtual_router_id 50
    priority 200
    virtual_ipaddress {
        192.168.1.100/24 dev eth0
    }
    preempt_delay 300
}

vrrp_instance VI_IP2 {
    state BACKUP
    interface eth1
    virtual_router_id 51
    priority 200
    virtual_ipaddress {
        192.168.122.100/24 dev eth1
    }
    preempt_delay 300
}

=========================

global_defs {
   router_id UBUNTULVS3
}

vrrp_sync_group VG1 {
   group {
      VI_IP1
      VI_IP2
   }
}

vrrp_instance VI_IP1 {
    state MASTER
    interface eth0
    virtual_router_id 50
    priority 150
    virtual_ipaddress {
        192.168.1.100/24 dev eth0
    }
    preempt_delay 300
}

vrrp_instance VI_IP2 {
    state MASTER
    interface eth1
    virtual_router_id 51
    priority 150
    virtual_ipaddress {
        192.168.122.100/24 dev eth1
    }
    preempt_delay 300
}

After trying to reproduce following the steps, I encountered that after starting Machine 3, Machine 2 does indeed display race condition according to logs, *but* only for a while, then it stabilizes and sets to BACKUP state. This means, I could not fully reproduce the bug described above.

After applying the proposed patch, I indeed did not experience the race condition (temporary in my case). However, I decided to do further testing. For this, I brought down eth1 in Machine1. Then, Machine 3 tried to became MASTER, displaying the race condition mentioned above. Once, I bring up eth1 back in Machine1, this race condition stops in Machine3.

After my testing, I believe the proposed patch seems to be fixing the described race condition in some situations but not in others. However, this might also be related to the use of KVM and the type of networks used.

However, I'd not go ahead and apply this patch to get it into Maverick after actually having Upstream involved and hearing feedback from them, and having some more testing done.

This has been reported upstream as well, however, there hasn't been any response just yet. I'll follow this issue and see what upstream says before actually applying the patch!!

Arjan, thank you for reporting the bug and providing a patch. If you could please continue testing and try to reproduc...

Read more...

Andres Rodriguez (andreserl) wrote :

Forgot to mention that I used keepalived 1.1.17 from Ubuntu repositories.

I'm marking this bug as incomplete after further confirm the situation.

Changed in keepalived (Ubuntu Maverick):
status: Triaged → Incomplete
Thierry Carrez (ttx) wrote :

Needs more upstream investigation, will have to land in natty.

Changed in keepalived (Ubuntu Maverick):
status: Incomplete → Won't Fix
Thierry Carrez (ttx) on 2010-09-10
tags: removed: server-mrs
Thierry Carrez (ttx) wrote :

Cannot accurately describe the issue... sounds like a corner case that shouldn't warrant a release note anyway. Andres, let me know if you think it should make it, and you can provide a description for the issue.

Changed in ubuntu-release-notes:
status: New → Invalid

Thierry,

Personally, I think that this should not make it into the release notes
either.

I believe that this is a very rare issue that I couldn't even fully confirm.
Not even upstream has payed attention to it. So, if I wasn't even able to
confirm this bug, I think that this should not go to the release notes.

On Fri, Oct 8, 2010 at 6:57 AM, Thierry Carrez <email address hidden>wrote:

> Cannot accurately describe the issue... sounds like a corner case that
> shouldn't warrant a release note anyway. Andres, let me know if you
> think it should make it, and you can provide a description for the
> issue.
>
> ** Changed in: ubuntu-release-notes
> Status: New => Invalid
>
> --
> keepalived vrrp race condition and fix (versions 1.1.17 and 1.2.0 but
> perhaps all?)
> https://bugs.launchpad.net/bugs/619712
> You received this bug notification because you are a bug assignee.
>

--
Andres Rodriguez (RoAkSoAx)
Ubuntu MOTU Developer
Systems Engineer

Mark Foster (fostermarkd) wrote :

Noting that the original configuration shown looks like a problem. The prio settings on the MASTER (machine3) are too low. In keepalived, higher priority wins, so machine1 and machine3 are probably trying to fight it out. I suggest the situation can be avoided by using state MASTER on all three machines, or by swapping the priority settings of machines 1 and 3 so that 3 has prio 250 and 1 has 150.

Arjan (iafilius) wrote :
Download full text (13.3 KiB)

Mark,

thanks for responding, i think you might be wrong. the master/slave
state is only the initial state, and current BACKUP/BACKUP/MASTER
settings are just to illustrate the issue.
yes you're right machine 3 won't get master in a stationary state, i
configured it as master just to show the problem more clear/reproducible
without to have cutting cables etc.
it's to demonstrate, that when 2 vrrp machines (1 & 2) are just running
fine, it needs only 1 (yes one) vrrp "i'm master" packet from machine 3.
so as soon as you start keepalived on machine 3 you will produce that
packet, and even if you stop machine 3 (immediate) the harm is done, and
machine 1 and 2 get and keep haystack (vrrp loop like crazy and flood
your network with multi-cast).
So basicaly a blackhat which is able to drop a matching (vrrp
password/etc) packet in the local network with a regular dual
vrrp/keepalived config, will get your network down. our case was
triggered in a quad vrrp setup.
My patch, perhaps not the ultimate one (but upstream keepalived mailing
list seems to be dead), prevents that.

regards,

On Tue, 2010-11-02 at 16:30 +0000, Mark Foster wrote:

> Noting that the original configuration shown looks like a problem. The
> prio settings on the MASTER (machine3) are too low. In keepalived,
> higher priority wins, so machine1 and machine3 are probably trying to
> fight it out. I suggest the situation can be avoided by using state
> MASTER on all three machines, or by swapping the priority settings of
> machines 1 and 3 so that 3 has prio 250 and 1 has 150.
>
> --
> keepalived vrrp race condition and fix (versions 1.1.17 and 1.2.0 but perhaps all?)
> https://bugs.launchpad.net/bugs/619712
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Release Notes for Ubuntu: Invalid
> Status in “keepalived” package in Ubuntu: Incomplete
> Status in “keepalived” source package in Maverick: Won't Fix
>
> Bug description:
> Binary package hint: keepalived
>
> Hello,
>
> About a keepalived vrrp race condition and fix (versions 1.1.17 and 1.2.0)
> and probably all other versions.
>
> Impact: possible complete/partial datacenter disruptions due to gratious
> arp/multicast floods
>
>
>
> The situation to reproduce the race condition
>
> The race condition got triggered in our situation with 1 cluster of 4
> machines, sharing 3 (vlan) interfaces.
>
> Reason for doing doing so, not really importand, using vrrp and redundant
> cluster (2x2) for changing the default gateway, in case of a lease line
> failure.
>
>
> The most basic setup, and easyest way to trigger is a subset of above
> -3 machines, with keepalived
> -2 interfaces/machine, defined in keepalived as vrrp_instance
> -1 vrrp_sync_group, with those interfaces included
>
> some key settings:
>
> machine1:
> initial state: state BACKUP
> int 1, prio 250
> int 2, prio 250
> machine2:
> initial state: state BACKUP
> int 1, prio 200
> int 2, prio 200
> machine3:
> initial state: state MASTER
> int 1, prio 150
> int 2, prio 150
>
>
>
> Machine3 is set to master, just to trigger the race condition, but in
> initial state BACKUP, a connection failure (wit...

Arjan (iafilius) wrote :

Hello all,

just want to let you known an upstream version 1.2.1 version is released with my patch/fix included.
http://www.keepalived.org/changelog.html

Maybe you may want to reconsider this bug report/status and apply the fix to current releases.
At least our data center is safe on this matter, is yours too?

Regards,

Arjan Filius

Andres Rodriguez (andreserl) wrote :

Hi Arjan,

I'll take a look at it and consider a backport of 1.2.1 to Maverick, rather than applying the patch. But first, we'll need to get 1.2.1 into Natty.

Thank you!

Changed in keepalived (Ubuntu):
status: Incomplete → In Progress
Changed in keepalived (Ubuntu):
status: In Progress → Fix Committed
assignee: Andres Rodriguez (andreserl) → nobody
Jamie Strandboge (jdstrand) wrote :

This is not fix committed in Ubuntu. A merge with 1:1.2.2-1 from unstable will fix this issue.

Changed in keepalived (Ubuntu):
status: Fix Committed → Triaged
assignee: nobody → Canonical Server Team (canonical-server)
security vulnerability: yes → no
Colin Watson (cjwatson) wrote :

I've uploaded 1:1.2.2-1ubuntu1 (mainly just merging 1:1.2.2-1 from Debian) to Ubuntu precise. I'm given to understand from the above comments that that should fix this bug.

Changed in keepalived (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers