Lowest group communication layer (evs) fails to handle the situation properly when big number of nodes suddenly start to see each other

Bug #1271918 reported by Miguel Angel Nieto on 2014-01-23
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Galera
Status tracked in 3.x
2.x
Undecided
Unassigned
3.x
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Fix Released
Medium
Unassigned
5.6
Fix Released
Medium
Unassigned

Bug Description

We have a 9 node cluster. Suddenly they stop to see each other:

140122 9:57:38 [Note] WSREP: view(view_id(NON_PRIM,378576e2-82be-11e3-b36b-96b118ad9ea1,10428) memb {
        5c773ef3-82be-11e3-ab13-4ec5e0489f56,
} joined {
} left {
} partitioned {
        378576e2-82be-11e3-b36b-96b118ad9ea1,
        49ce39e7-82be-11e3-a6da-e3fdac1aff99,
        4d4cbe47-5379-11e3-9597-437084d45b0f,
        79ae5df7-82be-11e3-af7a-6fad1d747d02,
        9dbbcf2a-82be-11e3-9c79-367e9eb841fb,
        b7a4648a-82bd-11e3-9b24-33ceb08ce291,
        d682b20c-82bd-11e3-9955-477180b12d21,
        fa3ce7e9-82bd-11e3-92ee-969e5429ffde,
})

Later on the problem is solved but they can't reconnect:

140122 9:58:38 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
140122 9:58:38 [Note] WSREP: Flow-control interval: [16, 16]
140122 9:58:38 [Note] WSREP: Received NON-PRIMARY.
140122 9:58:38 [Note] WSREP: New cluster view: global state: 840ae537-bb36-11e2-0800-55dad0151e6b:47649869, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
140122 9:58:38 [Warning] WSREP: evs::proto(5c773ef3-82be-11e3-ab13-4ec5e0489f56, GATHER, view_id(REG,5c773ef3-82be-11e3-ab13-4ec5e0489f56,10430)) source 49ce39e7-82be-11e3-a6da-e3fdac1aff99 is not supposed to be representative
140122 9:58:39 [Warning] WSREP: evs::proto(5c773ef3-82be-11e3-ab13-4ec5e0489f56, GATHER, view_id(REG,5c773ef3-82be-11e3-ab13-4ec5e0489f56,10430)) source 49ce39e7-82be-11e3-a6da-e3fdac1aff99 is not supposed to be representative
140122 9:58:40 [Warning] WSREP: evs::proto(5c773ef3-82be-11e3-ab13-4ec5e0489f56, GATHER, view_id(REG,5c773ef3-82be-11e3-ab13-4ec5e0489f56,10430)) source 49ce39e7-82be-11e3-a6da-e3fdac1aff99 is not supposed to be representative
140122 9:58:41 [Warning] WSREP: evs::proto(5c773ef3-82be-11e3-ab13-4ec5e0489f56, GATHER, view_id(REG,5c773ef3-82be-11e3-ab13-4ec5e0489f56,10430)) source 49ce39e7-82be-11e3-a6da-e3fdac1aff99 is not supposed to be representative
140122 9:58:42 [Warning] WSREP: evs::proto(5c773ef3-82be-11e3-ab13-4ec5e0489f56, GATHER, view_id(REG,5c773ef3-82be-11e3-ab13-4ec5e0489f56,10430)) source 49ce39e7-82be-11e3-a6da-e3fdac1aff99 is not supposed to be representative

Similar messages on all nodes.

Jervin R (revin) wrote :

Miguel, what is the Galera version? Looks similar, at least in behavior to https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1269236

Teemu Ollakka (teemu-ollakka) wrote :

This is a bit different than lp:1269236. Message "... is not supposed to be representative" indicates that there were problems forming a new group after nodes reconnected. In lp:1269236 nodes ended up in non-primary because one of them crashed while cluster was fully partitioned.

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1096

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers