Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Bug #1261688
Comment #6

Comment 6 for bug 1261688

Revision history for this message

Krunal Bauskar (krunal-bauskar) wrote on 2015-11-17:

commit 794f3cddb0c194767a760dc51b30b00ab94c55ac
Merge: 304970e be2dd53
Author: Krunal Bauskar <email address hidden>
Date: Mon Nov 16 19:49:55 2015 +0530

Merge pull request #33 from kbauskar/3.x-pxc-456

- PXC#456: WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BAC…

commit be2dd5305479d621c94ba26992610efd84ca9752
Author: Krunal Bauskar <email address hidden>
Date: Mon Nov 16 10:58:42 2015 +0530

- PXC#456: WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK with
LOAD DATA INFILE

Issue:
-----

      LDI for that matter DML statement can fail due to multiple reasons.
      Some probable reasons are:
      - Creating table w/o pk and setting wsrep_certify_nonPK = off
      - Existing bug that causes partitioned table LDI to fail.
      ....etc.

      Statement failure will skip append_key which besides appending key also
      set valid trx_id.
      Such failed statements are rolled back with trx_id = default.
      Galera-Plugin try to check if there is an existing Trx Object with
      given trx_id before creating a new one.

      If there are 2 independent connections (connected to same cluster node)
      and both of these connections execute a failing statement then
      both of them will try to rollback with trx_id = default.

      Logic that cached trx_id to trx-object never considered this situation
      and one of the such connection will get reference to a object that belongs
      to other connection which is logically wrong as both connection are unrelated.
      This also causes operational in-consistency as latter connection accesses
      state already modified by former connection.
      (Causing the famous ROLLBACK -> ROLLBACK assert).

      Solution(s):
      -----------
      (I am listing all possible solution with one we have selected)

* trx-map should use pair of <trx_id, conn_id> as map key.

      * trx-map should use multi-map with trx_id -> TrxObject
        TrxObject can use valid conn_id (vs -1 for now).
        For valid trx_id there only 1 trx_id -> TrxObject pair
        for default there could be multiple trx_id -> TrxObjects pair
        so proper pair is selected based on conn_id.

[Both of the above approach needs interface change so ruled out for now]

      * Re-arrange the logic to discard_trx object while holding lock on trx
        so that latter connection will get reference to the object but will
        not be able to operate on it till former one is done.
        (Logically 2 connections are sharing the objects which itself is wrong
         but if this can be made possible with some tweak in the code it will
         introduce flow control as it involves exception handling).

      * Introduce a separate map that will cache pthread_id -> TrxObject if
        trx_id = default.
      (Given the limited changes involved we opted for this solution though
       we would love to sort this out with upstream using interface change
       solutions mentioned above).

commit 794f3cddb0c194767a760dc51b30b00ab94c55ac
Merge: 304970e be2dd53
Author: Krunal Bauskar <krunal.bauskar@percona.com>
Date:   Mon Nov 16 19:49:55 2015 +0530

Merge pull request #33 from kbauskar/3.x-pxc-456
    
    - PXC#456: WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BAC…

commit be2dd5305479d621c94ba26992610efd84ca9752
Author: Krunal Bauskar <krunal.bauskar@percona.com>
Date:   Mon Nov 16 10:58:42 2015 +0530

- PXC#456: WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK with
      LOAD DATA INFILE
    
      Issue:
      -----
    
      LDI for that matter DML statement can fail due to multiple reasons.
      Some probable reasons are:
      - Creating table w/o pk and setting wsrep_certify_nonPK = off
      - Existing bug that causes partitioned table LDI to fail.
      ....etc.
    
      Statement failure will skip append_key which besides appending key also
      set valid trx_id.
      Such failed statements are rolled back with trx_id = default.
      Galera-Plugin try to check if there is an existing Trx Object with
      given trx_id before creating a new one.
    
      If there are 2 independent connections (connected to same cluster node)
      and both of these connections execute a failing statement then
      both of them will try to rollback with trx_id = default.
    
      Logic that cached trx_id to trx-object never considered this situation
      and one of the such connection will get reference to a object that belongs
      to other connection which is logically wrong as both connection are unrelated.
      This also causes operational in-consistency as latter connection accesses
      state already modified by former connection.
      (Causing the famous ROLLBACK -> ROLLBACK assert).
    
      Solution(s):
      -----------
      (I am listing all possible solution with one we have selected)
    
      * trx-map should use pair of <trx_id, conn_id> as map key.
    
      * trx-map should use multi-map with trx_id -> TrxObject
        TrxObject can use valid conn_id (vs -1 for now).
        For valid trx_id there only 1 trx_id -> TrxObject pair
        for default there could be multiple trx_id -> TrxObjects pair
        so proper pair is selected based on conn_id.
    
      [Both of the above approach needs interface change so ruled out for now]
    
      * Re-arrange the logic to discard_trx object while holding lock on trx
        so that latter connection will get reference to the object but will
        not be able to operate on it till former one is done.
        (Logically 2 connections are sharing the objects which itself is wrong
         but if this can be made possible with some tweak in the code it will
         introduce flow control as it involves exception handling).
    
      * Introduce a separate map that will cache pthread_id -> TrxObject if
        trx_id = default.
      (Given the limited changes involved we opted for this solution though
       we would love to sort this out with upstream using interface change
       solutions mentioned above).