congress

HAHT - Missed action mitigation during PE failover

Bug #1600014 reported by Eric K on 2016-07-07

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	congress	Triaged	Wishlist	Eric K

Bug Description

Implement changes to mitigate missed actions during PE failover

Please see discussion, spec and blueprint for more information.
https://github.com/openstack/congress-specs/blob/master/specs/newton/high-availability-design.rst
https://blueprints.launchpad.net/congress/+spec/high-availability-design
https://review.openstack.org/#/c/318383/
Relevant discussion:
Let's say there are two PE instances P1 and P2.
A sequence of data updates takes place which is expected to trigger a sequence of actions a1, a2. Both PE instances are expected to see the same data updates and requests the same sequence of actions. Assume we have a single data source driver D.
Let's consider the following sequence of events:
P1 requests a1.
D receives P1:a1, selects P1 as primary, executes a1.
P2 requests a1.
D receives P2:a1, logs request but does not execute because P2 is not primary.
P1 crashes.
P2 requests a2.
D receives P2:a2, logs request but does not execute because P2 is not primary.
D detects that P1 has crashed.
At this point, D selects P2 as the new primary, and will executes future requests from P2. But the action a2 was never executed. To make sure the action is not missed, D can look at the recent requests it received from P2 (a1, a2) and compare it with the recently executed actions (a1) and determine that it should execute a2. Everything worked perfectly.
From the information given so far, D cannot distinguish the above sequence of events from the following sequence of events, where the correct sequence of actions is actually a1, a1, a2:
P1 requests a1.
D receives P1:a1, selects P1 as primary, executes a1.
P2 crashes.
[P2 should request a1, but doesn't because it is down.]
P2 recovers.
P1 crashes.
[P1 should request a1 again, but doesn't because it is down.]
P2 requests a1.
D receives P2:a1, logs request but does not execute because P2 is not primary.
P2 requests a2.
D receives P2:a2, logs request but does not execute because P2 is not primary.
D detects that P1 has crashed.
In order to determine whether a1 should be executed once or twice, D must determine whether the a1 request from P2 "matches" the a1 request from P1. In order to do this matching well, we can keep track of the following meta-info around the action request:
time-stamp
the latest data update that triggered the action request on the PE, expressed logically as <datasource, table, seqnum>
The this meta-info still isn't enough to guarantee perfect matching. Time-stamp can be off by an arbitrarily large amount because two PE instances may receive the updates at different times and therefore trigger the action request at different times. The seqnum of the data update that triggered the action request is much better, but it's not infallible because logically the same action event may be triggered by different data updates on different PE instances.
Say we have two rules:
execute[a1] :- ds1:p
execute[a1] :- ds2:q
Two data updates, ds1:p and ds2:q go out at "the same time", but PE1 receives p first, and PE2 receives q first. Now they both send the execute[a1] request, but with different data update triggers. The receiving datasource driver does not know it should be executed only once, and in order to guarantee at-least-once execution, must execute twice.
There is an even tougher deduplication scenario:
execute[a1] :- ds1:p, NOT ds2:q
execute[a2] :- ds2:q, NOT ds1:p
If PE1 receives p before q, PE1 requests execute[a1]
If PE2 receives q before p, PE2 requests execute[a2]
The receiving datasource driver has no choice but to execute both a1 and a2, but in fact it is technically incorrect because neither ordering of the two data updates should trigger both a1 and a2.
All that being said, the duplicate executions should be quite rare because unless the primary PE goes down, the receiving DSD simply continues executing requests from the primary PE, without having to do matching.
All other approaches suffer from a similar problem because there is simply no way to guarantee exactly-once execution without having Congress and all its effectors inside a transactional system.

See original description

Tags:

Eric K (ekcs) on 2016-07-07

description:

updated

Eric K (ekcs) on 2016-07-13

Changed in congress:
assignee:	nobody → Eric K (ekcs)

Revision history for this message

Eric K (ekcs) wrote on 2016-07-13:

During the discussion around deduplication in mitigating missed actions during PE failover, we've been assuming that data comes from multiple sources (say DSDs spread over several DseNodes). But since we have adopted a design that puts all the DSDs on a single DseNode, we can sidestep a lot of the deduplication issues having to do with different PEs getting updates in different orders.
Basically, we can add a global sequencer at the DseNode which sequences all the data published from that node (currently we have independent sequencer per DSD).

The big advantage is it gives us a simple way to perfectly execute all action requests without any hiccup during a PE failover (see leaderless deduplication in spec).
With global update sequencing, two action requests are duplicates if and only if the action and args and the global sequence number of the update that triggered the action all match.

The main disadvantage is that this approach is not easy to scale out to having DSDs distributed across multiple PE nodes.

Any thoughts on whether to implement this leaderless deduplication via global sequencing in the DseNode?

Eric K (ekcs) on 2017-06-29