Percona Server with XtraDB

Unstable changed page tracking tests due to trying to delete a non-existent file | Valgrind error at log_online_should_overwrite()

Reported by Laurynas Biveinis on 2013-05-22
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server
Status tracked in 5.6
5.1
Medium
Unassigned
5.5
Medium
Unassigned
5.6
Critical
Laurynas Biveinis

Bug Description

A number of bitmap tests fail on certain Jenkins slaves as follows:

http://jenkins.percona.com/job/percona-server-5.6-param/109/BUILD_TYPE=release,Host=centos5-64/consoleFull

innodb.percona_changed_page_bmp w3 [ fail ]
innodb.percona_changed_page_bmp_1105726 w3 [ fail ]
innodb.percona_changed_page_bmp_requests w4 [ fail ]
innodb.percona_changed_page_bmp_flush w4 [ fail ]
innodb.percona_changed_page_bmp_log_block_size w4 [ fail ]

        Test ended at 2013-05-21 10:50:42

Server log is:
2013-05-21 17:50:41 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2013-05-21 17:50:42 25398 [Note] Plugin 'FEDERATED' is disabled.
2013-05-21 17:50:42 2aeb6bfcb170 InnoDB: Warning: Using innodb_additional_mem_pool_size is DEPRECATED. This option may be removed in future releases, together with the option innodb_use_sys_malloc and with the InnoDB's internal memory allocator.
2013-05-21 17:50:42 25398 [Note] InnoDB: The InnoDB memory heap is disabled
2013-05-21 17:50:42 25398 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2013-05-21 17:50:42 25398 [Note] InnoDB: Compressed tables use zlib 1.2.3
2013-05-21 17:50:42 25398 [Note] InnoDB: CPU does not support crc32 instructions
2013-05-21 17:50:42 25398 [Note] InnoDB: Using Linux native AIO
2013-05-21 17:50:42 25398 [Note] InnoDB: Initializing buffer pool, size = 8.0M
2013-05-21 17:50:42 25398 [Note] InnoDB: Completed initialization of buffer pool
2013-05-21 17:50:42 25398 [Note] InnoDB: Highest supported file format is Barracuda.
2013-05-21 17:50:42 2aeb6bfcb170 InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create
InnoDB: directories yourself, InnoDB does not create them.
2013-05-21 17:50:42 2aeb6bfcb170 InnoDB: File name ./ib_modified_log_1_0.xdb
2013-05-21 17:50:42 2aeb6bfcb170 InnoDB: File operation call: 'delete' returned OS error 71.
2013-05-21 17:50:42 2aeb6bfcb170 InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create
InnoDB: directories yourself, InnoDB does not create them.
2013-05-21 17:50:42 25398 [ERROR] InnoDB: cannot create './ib_modified_log_1_0.xdb'

Related branches

The code path in question must be this

- a call to log_online_should_overwrite(bitmap file) finding a 0-sized file :

 /* Currently, it's OK to overwrite 0-sized files only */
 success = os_file_get_status(path, &file_info, false);
 return success && file_info.size == 0LL;

- then log_online_start_bitmap_file() trying to delete it and failing because the file is not found, and then aborting the file create.

 /* Check for an old file that should be deleted first */
 if (log_online_should_overwrite(log_bmp_sys->out.name)) {

  success = static_cast<ibool>(
   os_file_delete(innodb_file_bmp_key,
           log_bmp_sys->out.name));
 }

...

 if (UNIV_UNLIKELY(!success)) {

  /* The following call prints an error message */
  os_file_get_last_error(TRUE);
  ib_logf(IB_LOG_LEVEL_ERROR,
   "cannot create \'%s\'\n", log_bmp_sys->out.name);
  log_bmp_sys->out.file = -1;
  return FALSE;
 }

The immediate fix is to ignore file-not-found for the delete attempt. This would be correct, but the fact that the file disappears between checking and deleting is still very suspicious and needs investigating.

tags: added: bitmap ci

The reason is that os_file_get_status() is returning dberr_t instead of ibool in 5.6. Which casts to ibool without a warning, but "return success" becomes a wrong check.

Also causes Valgrind errors as in

Conditional jump or move depends on uninitialised value(s)
   at 0xB9CBEB: log_online_should_overwrite(char const*)
          (log0online.cc:511)
   by 0xB9CC41: log_online_start_bitmap_file() (log0online.cc:526)
   by 0xB9CDCB: log_online_rotate_bitmap_file(unsigned long)
          (log0online.cc:573)
   by 0xB9D324: log_online_read_init() (log0online.cc:725)
   by 0xC86881: init_log_online() (srv0start.cc:1456)

Thus the fix is to
- fix os_file_get_status() return code check; and
- the previous partial fix of ignoring file not founds for os_file_delete().

Split off race between file status check and delete to bug 1184517, leaving this bug to deal with the incorrect os_file_get_status() return value handling.

summary: Unstable changed page tracking tests due to trying to delete a non-
- existent file
+ existent file | Valgrind error at log_online_should_overwrite()
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers