Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Support multi-network clusters

Bug #1204500 reported by Bernhard Schmidt on 2013-07-24

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	New	Wishlist	Unassigned

Bug Description

As far as I can tell, Percona/Galera only supports a single network to be used for replication traffic. If that network happens to be down the cluster will split.

Most HA systems (heartbeat, corosync) support and even recommend more than one path between the nodes. They are used simultaniously and the cluster will split only when all paths are down.

Revision history for this message

Jay Janssen (jay-janssen) wrote on 2013-07-24:

This is an interesting idea. On the one hand, you can handle this at a lower level with interface bonding, but at a higher level this should in theory make for a more robust cluster.

Such a feature would allow internal clusters with "bridge" nodes that would act as relays to external networks (say a DR site). Galera can already do this kind of relaying, but I don't know what complexity multiple gcomm networks would introduce (potentially non-trivial).

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2013-07-25:

It would be nice to see a concrete example of a problem and a solution that "multi-network" would bring. This is to see
- how important this feature is
- how exactly it is supposed to work

By all means we too recommend multiple paths between the nodes. In fact it is an implicit requirement for any HA system, so it goes without saying. But it is a matter of infrastructure, it is not clear what is required of Galera here.

Revision history for this message

Bernhard Schmidt (berni) wrote on 2013-07-25:

Okay, one example.

We run a cluster of three nodes. They have normal network connectivity to a switch where user data is exchanged and the application is running. They also have direct connections to two seperate switches with a dedicated subnet each, say 192.168.1.0/24 and 192.168.2.0/24.

Corosync/Heartbeat/Pacemaker supports cluster heartbeat over both dedicated switches and the normal network connectivity. They can fail/reboot whenever needed, because one connection is sufficient for the cluster. As long as the nodes still see each other over one connection they are in sync and the normal quorum rules do not apply. This also allows the cluster to detect weird errors like (A can talk to B and C, but B and C cannot talk to each other). If a node looses uplink connectivity corosync (thus affecting the application) corosync can detect that and do something, i.e. _gracefully_ migrate the application away.

Percona however is different here. I can only connect to another Percona node with one IP address. If that connection is down, the cluster is split, the side without quorum is going down. Doesn't matter whether there are two other working connections to that node, if the primary IP of a node Percona is using goes down the cluster splits hard.

I think is needs to be a Galera feature (support multiple IP connections to a single neighbor node and gracefully failover). But maybe I'm getting the architecture wrong.

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2013-07-25:

Ok, I can't verify it right away, but I believe that if you use the same subnet on all three interfaces, the kernel routing will do the trick for you, because Linux will responds to arp requests on all interfaces, regardless of the interface address. E.g. in case of

node1 node2
192.168.0.1 <-- X --> 192.168.0.2
192,168.0.3 <-------> 192.168.0.4

you should still be able to reach 192.168.0.2 from 192.168.0.1

Notice, that you can assign multiple IP addresses to a single interface, so you can still use those dedicated subnets for applications that want them.

Other options to consider is network interface bonding or bridging.

Krunal Bauskar (krunal-bauskar) on 2015-11-17

Changed in percona-xtradb-cluster:
importance:	Undecided → Wishlist

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1167

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.