Query for available cluster nodes

Bug #1003908 reported by Jay Janssen
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Galera
Fix Released
High
Alex Yurchenko

Bug Description

I think it's important that we can query any cluster node to discover the state of all known cluster notes. Ideally, I'd like to see:

- all known cluster nodes
- what state they are in (active, offline, donor, etc.)
- their various addresses:
  - wsrep communication address
  - sst address
  - wsrep node incoming address for client connections (maybe this is the most important).
-???

Certainly this amount of data makes the most sense in an I_S table, but a SHOW GLOBAL STATUS variable might be acceptable too (though with probably just a subset of info above).

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Hi Jay!

A few setbacks here:

1 - you can easily know only the nodes which are in the current component - i.e. nodes that see each other.
2 - only static node information (addresses and ports) can be passed via view callback, whereas node states can change any time and will require adding a number of new callbacks (which I frankly would prefer to avoid). So I'm wondering if querying node state directly would suffice.
3 - at the moment only incoming address information is globally available.

So, I can give you a list of incoming addresses by Monday, but the rest is quite a lot of work, involving protocol updates and, maybe, wsrep API changes.

How would you like to have your incoming list separated?

Revision history for this message
Jay Janssen (jay-janssen) wrote :

Hey Alex,
  #1 I expect -- part of my health check for cluster nodes should also be that the cluster I'm connected to is 'Primary'.
  #2 Static node info only for now is fine -- my thought there was if the cluster knew the state of a given node (especially offline), then it would be useful to be able to know that. However for a STATUS variable, I'd expect to only get the list of available nodes.
  #3 acknowledged

  Incoming list separation: I guess just CSV?

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

CSV be it.

About "offline" nodes: the thing is that the node is either in the cluster - and then it has a state like PRIM, JOINED, SYNCED or the like (it can be OFFLINE if need be, but we have not implemented that yet for lack of purpose), or it is out of the cluster, and then the cluster does not know anything about it. In other words, if the node does not communicate with the cluster (I believe this is the meaning of "offline" here), how can cluster tell it exists at all? The cluster could of course remember all the nodes it ever saw, but then how would you remove the nodes from the cluster?

Revision history for this message
Jay Janssen (jay-janssen) wrote : Re: [Bug 1003908] Re: Query for available cluster nodes

On May 30, 2012, at 9:31 AM, Alex Yurchenko wrote:

> About "offline" nodes: the thing is that the node is either in the
> cluster - and then it has a state like PRIM, JOINED, SYNCED or the like
> (it can be OFFLINE if need be, but we have not implemented that yet for
> lack of purpose), or it is out of the cluster, and then the cluster does
> not know anything about it. In other words, if the node does not
> communicate with the cluster (I believe this is the meaning of "offline"
> here), how can cluster tell it exists at all?

I was assuming that there was some state where a node had joined, but was current disconnected (and the cluster still was aware of them), but maybe that's not the case. I wouldn't expect to know about unjoined nodes.

> The cluster could of
> course remember all the nodes it ever saw, but then how would you remove
> the nodes from the cluster?

When the cluster experienced a node disconnect/timeout event, I would presume it could update this state table, no?

Maybe a better implementation of this is simply a cluster events table in the I_S -- a record of all node joins/disconnects, at least as far as the queried node is aware of. The cluster has a config version #, and I presume all nodes must agree on a new config, so simply a delta of the previous and new config versions would be a better way thing to make available, not sure.

Jay Janssen, MySQL Consultant, Percona Inc.
Telephone: +1 888 401 3401 ext 563 (US/Eastern)
Emergency: +1 888 401 3401 ext 911
Skype: percona.jayj
GTalk/MSN: <email address hidden>
Calendar: http://tungle.me/jayjanssen

Percona Live in NYC Oct 1-2nd: http://www.percona.com/live/nyc-2012/

Changed in galera:
importance: Undecided → High
status: New → Confirmed
assignee: nobody → Alex Yurchenko (ayurchen)
milestone: none → 23.2.2
Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Hi Jay,

I'm about to commit the implementation of node list feature, so before I do it, here's "request for comments".

This is how it will look like:
mysql> SHOW STATUS LIKE 'wsrep_incoming_addresses';
+--------------------------+-------------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------------+
| wsrep_incoming_addresses | 192.168.0.1:3305, 192.168.0.1:3304, unspecified |
+--------------------------+-------------------------------------------------+

It displays the listening addresses of all nodes in the component. Note that component may be non-primary (network partition) and so an address in the list does not mean that it will serve clients. You also need to check wsrep_cluster_status.

Node incoming address is determined in this order of preference:
1. wsrep_node_incoming_address explicitly set.
2. wsrep_node_address IP part + mysqld_port value.
3. IP of the first ethernet interface as returned by ifconfig + mysqld_port value
4. 'unspecified'

Please comment if you'd like something changed here.

Revision history for this message
Jay Janssen (jay-janssen) wrote :

Nothing else comes to mind immediately.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

added status variable to display incoming addresses of the nodes. I guess any further work in this direction can be done in another ticket.

Changed in galera:
status: Confirmed → Fix Committed
Changed in galera:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.