"State request preparation failed, aborting: unknown exception" when manipulating wsrep_provider and wsrep_node_address is not set

Bug #1379276 reported by Philip Stoev on 2014-10-09
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Status tracked in 5.6
5.6
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Confirmed
Undecided
Unassigned
5.6
Confirmed
Undecided
Unassigned

Bug Description

When running the galera.galera_wsrep_provider_unset_set test, which unsets galera_wsrep_provider temporarily before setting it again, under Valgrind, the test reliably fails with the node where this happens reliably failing with:

State request preparation failed, aborting: unknown exception

Test case:

--source include/big_test.inc
--source include/galera_cluster.inc
--source include/have_innodb.inc

--connection node_1
CREATE TABLE t1 (f1 INTEGER) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1);

--connection node_2
--let $wsrep_provider_orig = `SELECT @@wsrep_provider`
--let $wsrep_cluster_address_orig = `SELECT @@wsrep_cluster_address`

SET GLOBAL wsrep_provider='none';
INSERT INTO t1 VALUES (2);

--connection node_1
INSERT INTO t1 VALUES (3);

--connection node_2
--disable_query_log
--eval SET GLOBAL wsrep_provider = '$wsrep_provider_orig';
--eval SET GLOBAL wsrep_cluster_address = '$wsrep_cluster_address_orig';
--enable_query_log

--sleep 10

--source include/galera_wait_ready.inc

INSERT INTO t1 VALUES (4);

# Node #2 has all the inserts
SELECT COUNT(*) = 4 FROM t1;

--connection node_1
# Node #1 is missing the insert made while Node #2 was not replicated
SELECT COUNT(*) = 3 FROM t1;

DROP TABLE t1;

Philip Stoev (philip-stoev-f) wrote :
Philip Stoev (philip-stoev-f) wrote :
Changed in codership-mysql:
status: New → Confirmed
summary: - [ERROR] WSREP: State request preparation failed, aborting: unknown
- exception
+ "State request preparation failed, aborting: unknown exception" when
+ manipulating wsrep_provider

This also happened without valgrind, when the machine was under load because of other executing processes.

Philip Stoev (philip-stoev-f) wrote :

Please enable the galera.galera_sst_mysqldump test once this is fixed.

Alex Yurchenko (ayurchen) wrote :

Philip, irrelevant to the bug itself:

2014-10-09 13:21:01 10325 [Note] WSREP: You have configured 'rsync' state snapshot transfer method which cannot be performed on a running server. Wsrep provider won't be able to fall back to it if other means of state transfer are unavailable. In that case you will need to restart the server.

why the test is called galera.galera_sst_mysqldump? Is there a mistake?

Philip Stoev (philip-stoev-f) wrote :

In fact, two tests are affected by this problem: galera_sst_mysqldump and galera_wsrep_provider_unset_set. The output I have attached is from the galera_wsrep_provider_unset_set test, which uses rsync.. The mysqldump test uses mysqldump as a SST provider.

The reason both tests are affected is that they both manipulate wsrep_provider.

Philip Stoev (philip-stoev-f) wrote :

This issue also happens with a simple MTR test that simply shuts down the slave cleanly and then restarts it, if updates have happened on the master in the meantime.

The unknown exception in question is gu::NotSet

The key that was not set is ist.recv_addr, which means that wsrep_node_address is not being set. And it was not being set by default because autodetection via ifconfig failed due to output not parsable by the default regex.

summary: "State request preparation failed, aborting: unknown exception" when
- manipulating wsrep_provider
+ manipulating wsrep_provider and wsrep_node_address is not set

Able to reproduce with PXC 5.5.

--connection node_1
CREATE TABLE t1 (f1 INTEGER) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1);

--connection node_2

mysql> SELECT @@wsrep_provider;
+-----------------------------+
| @@wsrep_provider |
+-----------------------------+
| /usr/lib64/libgalera_smm.so |
+-----------------------------+
1 row in set (0.00 sec)

mysql> SELECT @@wsrep_cluster_address;
+------------------------------------------------+
| @@wsrep_cluster_address |
+------------------------------------------------+
| gcomm://192.168.1.60,192.168.1.61,192.168.1.62 |
+------------------------------------------------+
1 row in set (0.00 sec)

SET GLOBAL wsrep_provider='none';
INSERT INTO t1 VALUES (2);

--connection node_1
INSERT INTO t1 VALUES (3);

--connection node_2

mysql> SET GLOBAL wsrep_provider='/usr/lib64/libgalera_smm.so';
Query OK, 0 rows affected (2.02 sec)

mysql> SET GLOBAL wsrep_cluster_address = 'gcomm://192.168.1.60,192.168.1.61,192.168.1.62';
Query OK, 0 rows affected (3.00 sec)

INSERT INTO t1 VALUES (4);

# Node #2 has all the inserts

mysql> select * from t1;
+------+
| f1 |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
+------+

--connection node_1 is missing the insert made while Node #2 was not replicated

mysql> select * from t1;
+------+
| f1 |
+------+
| 1 |
| 3 |
| 4 |
+------+
3 rows in set (0.00 sec)

Able to reproduce with the same above steps on PXC 5.6

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1751

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers