Teemu, Hi. I do need to add a routine to check the wsrep_ready variable. For most of these tests though, I have a 5 second delay from the initialization of the node until I start issuing queries, which had been sufficient on my test machine. However, I will make sure to update the code to ensure we properly wait. Thank you for the insight + the additional information on wsrep_casual_reads. This definitely helps explain the issues I've been seeing. Will update + rerun. On 02/07/2012 06:15 AM, Teemu Ollakka wrote: > Patrick, > > I was able to have similar results from some of the cases, and usually > it seemed to be that although slave nodes were started, they hadn't > received state snapshot yet. Looking inside galera.py I see that > is_started() method just checks whether server pid file has been > created. This is not enough to make sure that wsrep enabled server is > actually synchronized with other nodes. This method should also check > value of 'wsrep_ready' status variable. If it is 'ON', node is > synchronized with the group. > > > There were also other kind of test failures which were related to query causality. Although galera ensures that all changes are received on all nodes before control is returned to client, it does not guarantee by default that all changes are applied. For this reason there is 'wsrep_causal_reads' session variable, which if set to '1', guarantees that all previously replicated changes are also applied before query is actually executed. While this should be enough to guarantee strict consistency for autocommit DML, unfortunately it seems that even this is not enough for DDLs (for the reasons I'm not complete sure about yet), but with following hack to kewpie I was able to get rid of causality related failures even with DDLs. > > The following patch enforces one causal read on each slave in > check_slaves_by_query() and check_slaves_by_checksum() before running > actual check query. > > > === modified file 'lib/util/mysqlBaseTestCase.py' > --- lib/util/mysqlBaseTestCase.py 2012-02-04 23:03:30 +0000 > +++ lib/util/mysqlBaseTestCase.py 2012-02-07 10:48:40 +0000 > @@ -87,6 +87,16 @@ > results.append(table_name) > return results > > + def causal_read(self, server): > + """ Execute causal read on server to make sure that all > + changes from master have been propagated and applied > + (galera specific) > + > + """ > + queries = ["SET wsrep_causal_reads=1", "SELECT 0"] > + self.execute_queries(queries, server) > + return None > + > def check_slaves_by_query( self > , master_server > , other_servers > @@ -111,6 +121,7 @@ > # run against master for 'good' value > retcode, expected_result = self.execute_query(query, master_server) > for server in other_servers: > + self.causal_read(server) > retcode, slave_result = self.execute_query(query, server) > #print "%s: expected_result= %s | slave_result= %s" % ( server.name > # , expected_result > @@ -149,6 +160,7 @@ > comp_results = {} > logging = master_server.logging > for server in other_servers: > + self.causal_read(server) > for schema in schemas: > for table in self.get_tables(master_server, schema): > query = "CHECKSUM TABLE %s.%s" %(schema, table) >