Comment 1 for bug 1075129

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Considering the following fragment from log_write_up_to:

 group = UT_LIST_GET_FIRST(log_sys->log_groups);

 /* Do the write to the log files */

 while (group) {
  log_group_write_buf(
   group, log_sys->buf + area_start,
   area_end - area_start,
   ut_uint64_align_down(log_sys->written_to_all_lsn,
          OS_FILE_LOG_BLOCK_SIZE),
   start_offset - area_start);

  log_group_set_fields(group, log_sys->write_lsn);

  group = UT_LIST_GET_NEXT(log_groups, group);
 }

 mutex_exit(&(log_sys->mutex));

 if (srv_unix_file_flush_method == SRV_UNIX_O_DSYNC
     || srv_unix_file_flush_method == SRV_UNIX_ALL_O_DIRECT) {
  /* O_DSYNC means the OS did not buffer the log file at all:
  so we have also flushed to disk what we have written */

  log_sys->flushed_to_disk_lsn = log_sys->write_lsn;

 } else if (flush_to_disk) {

  group = UT_LIST_GET_FIRST(log_sys->log_groups);

  fil_flush(group->space_id, FALSE);
  log_sys->flushed_to_disk_lsn = log_sys->write_lsn;
 }

There already is a log_do_write in log_group_write_buf:

 if (log_do_write) {
  log_sys->n_log_ios++;

  srv_os_log_pending_writes++;

  fil_io(OS_FILE_WRITE | OS_FILE_LOG, TRUE, group->space_id, 0,
         next_offset / UNIV_PAGE_SIZE,
         next_offset % UNIV_PAGE_SIZE, write_len, buf, group);

  srv_os_log_pending_writes--;

  srv_os_log_written+= write_len;
  srv_log_writes++;
 }

However, it is unconditionally set to TRUE in non-UNIV_DEBUG (and
nowhere set to false in UNIV_DEBUG too).

However, the same variable cannot be reused, since to increment
log_sys->n_log_ios++ among others requires the log_sys mutex.

So, one may want to replace fil_io over there with an in-memory
buffering so that counters are updated (the worst can happen with a crash is the counters
being incorrect) and then do the I/O after mutex_exit in
log_write_up_to but before the if condition with
SRV_UNIX_O_DSYNC.

Even this should benefit O_DSYNC / ALL_O_DIRECT the most, it will
also benefit normal case since it will avoid the overhead of
_fil_aio when under the mutex.