Percona Server moved to https://jira.percona.com/projects/PS

Bug #1475107
Comment #2

Comment 2 for bug 1475107

Revision history for this message

Yura Sorokin (yura-sorokin) wrote on 2015-08-19:

Consider 2 test cases included in
https://github.com/percona-ysorokin/percona-server/commit/5749da6de8e4d1ecd4970ad32886379b3bb80c57

1. We are using fully-qualified table names (rpl_bug1475107working.test).
  DROP TABLE IF EXISTS db1.table1;
2. We are using "USE db1;" statement (rpl_bug1475107working.test).
  USE db1;
  ...
  DROP TABLE IF EXISTS table1;

The difference in behavior turned out to be with mysql_execute_command() at sql_parse.cc:2609
The following code fragment behaves differently in case (1) and (2)

sql_parse.cc:2694
#ifdef HAVE_REPLICATION
  if (unlikely(thd->slave_thread))
  {
    // Database filters.
    if (lex->sql_command != SQLCOM_BEGIN &&
        lex->sql_command != SQLCOM_COMMIT &&
        lex->sql_command != SQLCOM_SAVEPOINT &&
        lex->sql_command != SQLCOM_ROLLBACK &&
        lex->sql_command != SQLCOM_ROLLBACK_TO_SAVEPOINT &&
        !rpl_filter->db_ok(thd->db))
      DBUG_RETURN(0);

The problem is that in case (1) "thd->db" is set to a database name which is not even mentioned in the original SQL statement "DROP TABLE IF EXISTS db1.table1;". In case of running via MTR, it was set to "test". So, rpl_filter->db_ok(thd->db) checking returned "true" as database "test" is not a name in the replication ignore list. Therefore, "DBUG_RETURN(0)" was skipped.
Later, at sql_parse.cc:2785
    if (!(lex->sql_command == SQLCOM_UPDATE_MULTI) &&
!(lex->sql_command == SQLCOM_SET_OPTION) &&
!(lex->sql_command == SQLCOM_DROP_TABLE &&
          lex->drop_temporary && lex->drop_if_exists) &&
        all_tables_not_ok(thd, all_tables))
    {
      /* we warn the slave SQL thread */
      my_message(ER_SLAVE_IGNORED_TABLE, ER(ER_SLAVE_IGNORED_TABLE), MYF(0));

this function set error in thd->m_stmt_da to ER_SLAVE_IGNORED_TABLE and immediately returned "0".

Case (2) is completely different. thd->db was set to "db1" and therefore function execution terminated at
if(... && !rpl_filter->db_ok(thd->db))
DBUG_RETURN(0);
However, it did not change anything in "thd->m_stmt_da". "sql_errno" there was still 0.
After that, in Query_log_event::do_apply_event() we compare master error with slave error
log_event.cc:4968

    if ((expected_error && expected_error != actual_error &&
         !concurrency_error_code(expected_error)) &&
        !ignored_error_code(actual_error) &&
        !ignored_error_code(expected_error))
    {
      rli->report(ERROR_LEVEL, 0,
                      "\
Query caused different errors on master and slave. \
Error on master: message (format)='%s' error code=%d ; \
Error on slave: actual message='%s', error code=%d. \
Default database: '%s'. Query: '%s'",
                      ER_SAFE(expected_error),
                      expected_error,
                      actual_error ? thd->get_stmt_da()->message() : "no error",
                      actual_error,
                      print_slave_db_safe(db), query_arg);

To sum up, for the "non-failing" case (1) it works only because we pass invalid database name in rpl_filter->db_ok(). At the same time, "failing" case (2) fails only because sql_errno wasn't changed (was still zero) when the statement was ignored.

Consider 2 test cases included in 
https://github.com/percona-ysorokin/percona-server/commit/5749da6de8e4d1ecd4970ad32886379b3bb80c57

The difference in behavior turned out to be with mysql_execute_command() at sql_parse.cc:2609
The following code fragment behaves differently in case (1) and (2)

The problem is that in case (1) "thd->db" is set to a database name which is not even mentioned in the original SQL statement "DROP TABLE IF EXISTS db1.table1;". In case of running via MTR, it was set to "test". So,  rpl_filter->db_ok(thd->db) checking returned "true" as database "test" is not a name in the replication ignore list. Therefore, "DBUG_RETURN(0)" was skipped.
Later, at sql_parse.cc:2785
    if (!(lex->sql_command == SQLCOM_UPDATE_MULTI) &&
	!(lex->sql_command == SQLCOM_SET_OPTION) &&
	!(lex->sql_command == SQLCOM_DROP_TABLE &&
          lex->drop_temporary && lex->drop_if_exists) &&
        all_tables_not_ok(thd, all_tables))
    {
      /* we warn the slave SQL thread */
      my_message(ER_SLAVE_IGNORED_TABLE, ER(ER_SLAVE_IGNORED_TABLE), MYF(0));

this function set error in thd->m_stmt_da to ER_SLAVE_IGNORED_TABLE and immediately returned "0".

Case (2) is completely different. thd->db was set to "db1" and therefore function execution terminated at 
if(... && !rpl_filter->db_ok(thd->db))
  DBUG_RETURN(0);
However, it did not change anything in "thd->m_stmt_da". "sql_errno" there was still 0.
After that, in Query_log_event::do_apply_event() we compare master error with slave error
log_event.cc:4968

if ((expected_error && expected_error != actual_error &&
         !concurrency_error_code(expected_error)) &&
        !ignored_error_code(actual_error) &&
        !ignored_error_code(expected_error))
    {
      rli->report(ERROR_LEVEL, 0,
                      "\
Query caused different errors on master and slave.     \
Error on master: message (format)='%s' error code=%d ; \
Error on slave: actual message='%s', error code=%d. \
Default database: '%s'. Query: '%s'",
                      ER_SAFE(expected_error),
                      expected_error,
                      actual_error ? thd->get_stmt_da()->message() : "no error",
                      actual_error,
                      print_slave_db_safe(db), query_arg);