Random Segmentation faults on fileio test

Bug #1187040 reported by Joe on 2013-06-03
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sysbench
Undecided
Alexey Kopytov

Bug Description

I am getting random seg falts when running fileio tests.

gdb output:
(gdb) bt
#0 0x0000003755289c1f in memcpy () from /lib64/libc.so.6
#1 0x000000000040997d in sb_percentile_calculate (percentile=0x8905a0, percent=95) at sb_percentile.c:95
#2 0x000000000040b6df in file_print_stats (type=<value optimized out>) at sb_fileio.c:850
#3 0x0000000000404b4a in report_thread_proc (arg=<value optimized out>) at sysbench.c:661
#4 0x0000003755607851 in start_thread () from /lib64/libpthread.so.0
#5 0x00000037552e890d in clone () from /lib64/libc.so.6
(gdb) frame 1
#1 0x000000000040997d in sb_percentile_calculate (percentile=0x8905a0, percent=95) at sb_percentile.c:95
95 sb_percentile.c: No such file or directory.
        in sb_percentile.c
(gdb) quit

sb_percentile.c output:
<snip>
     82 double sb_percentile_calculate(sb_percentile_t *percentile, double percent)
     83 {
     84 unsigned long long ncur, nmax;
     85 unsigned int i;
     86
     87 pthread_mutex_lock(&percentile->mutex);
     88
     89 if (percentile->total == 0)
     90 {
     91 pthread_mutex_unlock(&percentile->mutex);
     92 return 0.0;
     93 }
     94
     95 memcpy(percentile->tmp, percentile->values,
     96 percentile->size * sizeof(unsigned long long));
     97 nmax = floor(percentile->total * percent / 100 + 0.5);
     98
     99 pthread_mutex_unlock(&percentile->mutex);
    100
    101 ncur = percentile->tmp[0];
    102 for (i = 1; i < percentile->size; i++)
    103 {
    104 ncur += percentile->tmp[i];
    105 if (ncur >= nmax)
    106 break;
    107 }
<snip>

Joe (joegrasse) wrote :

sysbench 0.5 rev: 116

Joe (joegrasse) wrote :

sysbench --test=fileio --file-test-mode=rndwr --file-total-size=100M --file-num=1 --num-threads=16 --file-io-mode=sync --max-time=10 --max-requests=0 --report-interval=10 --rand-init=on --file-fsync-freq=1

I believe the important piece here is the --report-interval parameter. I believe percentile is getting destroyed before the memcpy happens.

It looks like the return value of pthread_mutex_lock isn't being check. Maybe the lock attempt isn't successful.

Joe (joegrasse) wrote :

After more digging, when the interval lines up with the total bench time you have a condition percentile->values and percentile->tmp are getting freed then then used. To mitigate most of the cores I encountered I change sb_percentile_reset and sb_percentile_done in sb_percentile.c to the following. This isn't a complete/proper fix though.

void sb_percentile_reset(sb_percentile_t *percentile)
{
  int err;

  err = pthread_mutex_lock(&percentile->mutex);
  if( err == 0){
    percentile->total = 0;
    memset(percentile->values, 0, percentile->size * sizeof(unsigned long long));
    pthread_mutex_unlock(&percentile->mutex);
  }
}

void sb_percentile_done(sb_percentile_t *percentile)
{
  int err;

  err = pthread_mutex_destroy(&percentile->mutex);
  if( err == 0){
    free(percentile->values);
    free(percentile->tmp);
  }
}

Another workaround is to use an interval that doesn't line up to total test time.

Alexey Kopytov (akopytov) wrote :

Thanks for the report. Fixed in the LP repository (rev. 117).

Changed in sysbench:
status: New → Fix Committed
assignee: nobody → Alexey Kopytov (akopytov)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers