Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Bug #1396601
Comment #13

Comment 13 for bug 1396601

Revision history for this message

Jay Janssen (jay-janssen) wrote on 2014-12-03: Re: [Bug 1396601] Freshly started joiner gets stuck in joining state

#13

That is the entire my.cnf, yes.

> On Dec 2, 2014, at 8:25 PM, Alex Yurchenko <email address hidden> wrote:
>
> *** This bug is a duplicate of bug 1373796 ***
> https://bugs.launchpad.net/bugs/1373796
>
> Huh, is that all my.cnf? Only one slave thread? Any errors in the error log between JOINED and SYNCED?
> Suspicion: two subsequent monitor drains with the same seqno.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1396601
>
> Title:
> Freshly started joiner gets stuck in joining state
>
> Status in Percona XtraDB Cluster - HA scalable solution for MySQL:
> New
>
> Bug description:
> I have an intermittent problem with freshly installed nodes (nodes
> with a clean datadir and fresh package install via vagrant) that get
> wedged in a 'Joining' state.
>
>
> [root@pxc3 mysql]# mysql -e "show global status like 'wsrep%'"
> +------------------------------+-------------------------------------------------------+
> | Variable_name | Value |
> +------------------------------+-------------------------------------------------------+
> | wsrep_local_state_uuid | cabef144-756b-11e4-ae70-326c53cc04d5 |
> | wsrep_protocol_version | 6 |
> | wsrep_last_committed | 12 |
> | wsrep_replicated | 0 |
> | wsrep_replicated_bytes | 0 |
> | wsrep_repl_keys | 0 |
> | wsrep_repl_keys_bytes | 0 |
> | wsrep_repl_data_bytes | 0 |
> | wsrep_repl_other_bytes | 0 |
> | wsrep_received | 2 |
> | wsrep_received_bytes | 276 |
> | wsrep_local_commits | 0 |
> | wsrep_local_cert_failures | 0 |
> | wsrep_local_replays | 0 |
> | wsrep_local_send_queue | 0 |
> | wsrep_local_send_queue_max | 1 |
> | wsrep_local_send_queue_min | 0 |
> | wsrep_local_send_queue_avg | 0.000000 |
> | wsrep_local_recv_queue | 392 |
> | wsrep_local_recv_queue_max | 392 |
> | wsrep_local_recv_queue_min | 0 |
> | wsrep_local_recv_queue_avg | 194.507614 |
> | wsrep_local_cached_downto | 18446744073709551615 |
> | wsrep_flow_control_paused_ns | 0 |
> | wsrep_flow_control_paused | 0.000000 |
> | wsrep_flow_control_sent | 0 |
> | wsrep_flow_control_recv | 0 |
> | wsrep_cert_deps_distance | 0.000000 |
> | wsrep_apply_oooe | 0.000000 |
> | wsrep_apply_oool | 0.000000 |
> | wsrep_apply_window | 0.000000 |
> | wsrep_commit_oooe | 0.000000 |
> | wsrep_commit_oool | 0.000000 |
> | wsrep_commit_window | 0.000000 |
> | wsrep_local_state | 1 |
> | wsrep_local_state_comment | Joining |
> | wsrep_cert_index_size | 0 |
> | wsrep_causal_reads | 0 |
> | wsrep_cert_interval | 0.000000 |
> | wsrep_incoming_addresses | 172.28.128.4:3306,172.28.128.7:3306,172.28.128.3:3306 |
> | wsrep_evs_delayed | |
> | wsrep_evs_evict_list | |
> | wsrep_evs_repl_latency | 0/0/0/0/0 |
> | wsrep_evs_state | OPERATIONAL |
> | wsrep_gcomm_uuid | 67850d43-756d-11e4-93cc-82163e0cb5e1 |
> | wsrep_cluster_conf_id | 7 |
> | wsrep_cluster_size | 3 |
> | wsrep_cluster_state_uuid | cabef144-756b-11e4-ae70-326c53cc04d5 |
> | wsrep_cluster_status | Primary |
> | wsrep_connected | ON |
> | wsrep_local_bf_aborts | 0 |
> | wsrep_local_index | 1 |
> | wsrep_provider_name | Galera |
> | wsrep_provider_vendor | Codership Oy <email address hidden> |
> | wsrep_provider_version | 3.8(rf6147dd) |
> | wsrep_ready | OFF |
> +------------------------------+-------------------------------------------------------+
>
>
> The node is stuck and I have to kill it. After restarting mysqld, it joins the cluster fine and normally. I cannot reproduce the 'Joining' wedge after the first time unless I rebuild the entire node from scratch.
>
> Oddly, the other nodes see it as a member of the cluster and it
> receives replication (wsrep_local_recv_queue grows).
>
>
> [root@pxc3 mysql]# rpm -qa | grep -i percona
> percona-toolkit-2.2.11-1.noarch
> percona-xtrabackup-2.2.6-5042.el7.x86_64
> Percona-XtraDB-Cluster-devel-56-5.6.21-25.8.938.el7.x86_64
> Percona-XtraDB-Cluster-shared-56-5.6.21-25.8.938.el7.x86_64
> Percona-XtraDB-Cluster-client-56-5.6.21-25.8.938.el7.x86_64
> Percona-XtraDB-Cluster-galera-3-3.8-1.3390.rhel7.x86_64
> Percona-XtraDB-Cluster-server-56-5.6.21-25.8.938.el7.x86_64
>
>
> I will attach the error log.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1396601/+subscriptions

Jay Janssen, Managing Consultant, Percona
http://about.me/jay.janssen

That is the entire my.cnf, yes.

> On Dec 2, 2014, at 8:25 PM, Alex Yurchenko <1396601@bugs.launchpad.net> wrote:
> 
> *** This bug is a duplicate of bug 1373796 ***
>    https://bugs.launchpad.net/bugs/1373796
> 
> Huh, is that all my.cnf? Only one slave thread? Any errors in the error log between JOINED and SYNCED?
> Suspicion: two subsequent monitor drains with the same seqno.
> 
> -- 
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1396601
> 
> Title:
>  Freshly started joiner gets stuck in joining state
> 
> Status in Percona XtraDB Cluster - HA scalable solution for MySQL:
>  New
> 
> Bug description:
>  I have an intermittent problem with freshly installed nodes (nodes
>  with a clean datadir and fresh package install via vagrant) that get
>  wedged in a 'Joining' state.
> 
> 
>  [root@pxc3 mysql]# mysql -e "show global status like 'wsrep%'"
>  +------------------------------+-------------------------------------------------------+
>  | Variable_name                | Value                                                 |
>  +------------------------------+-------------------------------------------------------+
>  | wsrep_local_state_uuid       | cabef144-756b-11e4-ae70-326c53cc04d5                  |
>  | wsrep_protocol_version       | 6                                                     |
>  | wsrep_last_committed         | 12                                                    |
>  | wsrep_replicated             | 0                                                     |
>  | wsrep_replicated_bytes       | 0                                                     |
>  | wsrep_repl_keys              | 0                                                     |
>  | wsrep_repl_keys_bytes        | 0                                                     |
>  | wsrep_repl_data_bytes        | 0                                                     |
>  | wsrep_repl_other_bytes       | 0                                                     |
>  | wsrep_received               | 2                                                     |
>  | wsrep_received_bytes         | 276                                                   |
>  | wsrep_local_commits          | 0                                                     |
>  | wsrep_local_cert_failures    | 0                                                     |
>  | wsrep_local_replays          | 0                                                     |
>  | wsrep_local_send_queue       | 0                                                     |
>  | wsrep_local_send_queue_max   | 1                                                     |
>  | wsrep_local_send_queue_min   | 0                                                     |
>  | wsrep_local_send_queue_avg   | 0.000000                                              |
>  | wsrep_local_recv_queue       | 392                                                   |
>  | wsrep_local_recv_queue_max   | 392                                                   |
>  | wsrep_local_recv_queue_min   | 0                                                     |
>  | wsrep_local_recv_queue_avg   | 194.507614                                            |
>  | wsrep_local_cached_downto    | 18446744073709551615                                  |
>  | wsrep_flow_control_paused_ns | 0                                                     |
>  | wsrep_flow_control_paused    | 0.000000                                              |
>  | wsrep_flow_control_sent      | 0                                                     |
>  | wsrep_flow_control_recv      | 0                                                     |
>  | wsrep_cert_deps_distance     | 0.000000                                              |
>  | wsrep_apply_oooe             | 0.000000                                              |
>  | wsrep_apply_oool             | 0.000000                                              |
>  | wsrep_apply_window           | 0.000000                                              |
>  | wsrep_commit_oooe            | 0.000000                                              |
>  | wsrep_commit_oool            | 0.000000                                              |
>  | wsrep_commit_window          | 0.000000                                              |
>  | wsrep_local_state            | 1                                                     |
>  | wsrep_local_state_comment    | Joining                                               |
>  | wsrep_cert_index_size        | 0                                                     |
>  | wsrep_causal_reads           | 0                                                     |
>  | wsrep_cert_interval          | 0.000000                                              |
>  | wsrep_incoming_addresses     | 172.28.128.4:3306,172.28.128.7:3306,172.28.128.3:3306 |
>  | wsrep_evs_delayed            |                                                       |
>  | wsrep_evs_evict_list         |                                                       |
>  | wsrep_evs_repl_latency       | 0/0/0/0/0                                             |
>  | wsrep_evs_state              | OPERATIONAL                                           |
>  | wsrep_gcomm_uuid             | 67850d43-756d-11e4-93cc-82163e0cb5e1                  |
>  | wsrep_cluster_conf_id        | 7                                                     |
>  | wsrep_cluster_size           | 3                                                     |
>  | wsrep_cluster_state_uuid     | cabef144-756b-11e4-ae70-326c53cc04d5                  |
>  | wsrep_cluster_status         | Primary                                               |
>  | wsrep_connected              | ON                                                    |
>  | wsrep_local_bf_aborts        | 0                                                     |
>  | wsrep_local_index            | 1                                                     |
>  | wsrep_provider_name          | Galera                                                |
>  | wsrep_provider_vendor        | Codership Oy <info@codership.com>                     |
>  | wsrep_provider_version       | 3.8(rf6147dd)                                         |
>  | wsrep_ready                  | OFF                                                   |
>  +------------------------------+-------------------------------------------------------+
> 
> 
>  The node is stuck and I have to kill it.  After restarting mysqld, it joins the cluster fine and normally.  I cannot reproduce the 'Joining' wedge after the first time unless I rebuild the entire node from scratch.
> 
>  Oddly, the other nodes see it as a member of the cluster and it
>  receives replication (wsrep_local_recv_queue grows).
> 
> 
>  [root@pxc3 mysql]# rpm -qa | grep -i percona
>  percona-toolkit-2.2.11-1.noarch
>  percona-xtrabackup-2.2.6-5042.el7.x86_64
>  Percona-XtraDB-Cluster-devel-56-5.6.21-25.8.938.el7.x86_64
>  Percona-XtraDB-Cluster-shared-56-5.6.21-25.8.938.el7.x86_64
>  Percona-XtraDB-Cluster-client-56-5.6.21-25.8.938.el7.x86_64
>  Percona-XtraDB-Cluster-galera-3-3.8-1.3390.rhel7.x86_64
>  Percona-XtraDB-Cluster-server-56-5.6.21-25.8.938.el7.x86_64
> 
> 
>  I will attach the error log.
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1396601/+subscriptions

Jay Janssen, Managing Consultant, Percona
http://about.me/jay.janssen