stress-ng based disk tests failing

Bug #1640547 reported by Mike Rushton
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Stress-ng
Fix Released
High
Colin Ian King
plainbox-provider-checkbox (Ubuntu)
Fix Released
High
Unassigned
Xenial
Fix Released
High
Unassigned

Bug Description

When running the stress-ng disk test, we are seeing the following failures as well as I/O lockup:

/dev/sda is a block device
Found largest partition: "/dev/sda2"
Test will use /dev/sda2, mounted at "/", using "ext4"
test_dir is /tmp/disk_stress_ng
Estimated total run time is 4560 seconds

Running stress-ng aio stressor for 240 seconds....
stress-ng: info: [4305] dispatching hogs: 160 aio
stress-ng: info: [4305] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [4305] cache allocate: default cache size: 2048K
stress-ng: info: [4305] successful run completed in 240.27s (4 mins, 0.27 secs)
return_code is 0
Running stress-ng aiol stressor for 240 seconds....
stress-ng: info: [4917] dispatching hogs: 160 aio-linux
stress-ng: info: [4917] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [4917] cache allocate: default cache size: 2048K
stress-ng: fail: [5002] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5012] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5020] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5018] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5019] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5013] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [4959] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5032] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5041] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5017] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5022] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5016] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5038] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5040] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5026] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5027] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5031] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5049] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5021] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5045] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5047] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5044] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5052] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5029] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5015] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5053] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5051] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5063] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5058] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5054] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5061] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5059] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5055] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5023] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5036] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5030] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5060] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5046] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5066] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5050] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5048] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5073] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5039] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5077] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5076] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5068] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5074] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5062] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5075] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5064] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5056] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5070] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5069] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5071] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5065] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5072] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5067] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: fail: [5057] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
stress-ng: error: [4917] process 4959 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5002 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5012 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5013 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5015 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5016 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5017 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5018 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5019 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5020 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5021 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5022 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5023 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5026 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5027 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5029 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5030 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5031 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5032 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5036 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5038 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5039 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5040 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5041 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5044 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5045 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5046 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5047 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5048 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5049 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5050 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5051 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5052 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5053 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5054 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5055 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5056 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5057 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5058 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5059 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5060 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5061 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5062 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5063 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5064 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5065 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5066 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5067 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5068 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5069 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5070 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5071 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5072 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5073 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5074 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5075 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5076 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: error: [4917] process 5077 (stress-ng-aio_linux) terminated with an error, exit status=1
stress-ng: info: [4917] unsuccessful run completed in 241.90s (4 mins, 1.90 secs)
return_code is 2
*****************************************************************
** Error 2 reported on stressor aiol!)
*****************************************************************
Running stress-ng chdir stressor for 240 seconds....
stress-ng: info: [5089] dispatching hogs: 160 chdir
stress-ng: info: [5089] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [5089] cache allocate: default cache size: 2048K
stress-ng: info: [5089] successful run completed in 679.46s (11 mins, 19.46 secs)
return_code is 0
Running stress-ng chmod stressor for 240 seconds....
stress-ng: info: [5299] dispatching hogs: 160 chmod
stress-ng: info: [5299] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [5299] cache allocate: default cache size: 2048K
stress-ng: info: [5299] successful run completed in 240.02s (4 mins, 0.02 secs)
return_code is 0
Running stress-ng dentry stressor for 240 seconds....
stress-ng: info: [5462] dispatching hogs: 160 dentry
stress-ng: info: [5462] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [5462] cache allocate: default cache size: 2048K
stress-ng: info: [5462] successful run completed in 263.77s (4 mins, 23.77 secs)
return_code is 0
Running stress-ng dir stressor for 240 seconds....
stress-ng: info: [5633] dispatching hogs: 160 dir
stress-ng: info: [5633] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [5633] cache allocate: default cache size: 2048K
stress-ng: info: [5633] successful run completed in 615.29s (10 mins, 15.29 secs)
return_code is 0
Running stress-ng fallocate stressor for 240 seconds....
stress-ng: info: [5800] dispatching hogs: 160 fallocate
stress-ng: info: [5800] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [5800] cache allocate: default cache size: 2048K
stress-ng: info: [5800] successful run completed in 242.09s (4 mins, 2.09 secs)
return_code is 0
Running stress-ng fiemap stressor for 240 seconds....
stress-ng: info: [5962] dispatching hogs: 160 fiemap
stress-ng: info: [5962] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [5962] cache allocate: default cache size: 2048K
stress-ng: info: [5962] successful run completed in 277.54s (4 mins, 37.54 secs)
return_code is 0
Running stress-ng filename stressor for 240 seconds....
stress-ng: info: [6769] dispatching hogs: 160 filename
stress-ng: info: [6769] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [6769] cache allocate: default cache size: 2048K
stress-ng: info: [6769] successful run completed in 259.07s (4 mins, 19.07 secs)
return_code is 0
Running stress-ng flock stressor for 240 seconds....
stress-ng: info: [6935] dispatching hogs: 160 flock
stress-ng: info: [6935] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [6935] cache allocate: default cache size: 2048K
stress-ng: info: [6935] successful run completed in 240.02s (4 mins, 0.02 secs)
return_code is 0
Running stress-ng fstat stressor for 240 seconds....
stress-ng: info: [7101] dispatching hogs: 160 fstat
stress-ng: info: [7101] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [7101] cache allocate: default cache size: 2048K
stress-ng: info: [7101] successful run completed in 241.25s (4 mins, 1.25 secs)
return_code is 0
Running stress-ng hdd stressor for 240 seconds....
stress-ng: info: [7335] dispatching hogs: 160 hdd
stress-ng: info: [7335] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [7335] cache allocate: default cache size: 2048K
/usr/lib/plainbox-provider-checkbox/bin/disk_stress_ng: line 162: 7334 Killed timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --temp-path $test_dir --$1 0
return_code is 137
*****************************************************************
** stress-ng disk test timed out and was forcefully terminated!
*****************************************************************
Running stress-ng lease stressor for 240 seconds....
stress-ng: info: [7514] dispatching hogs: 160 lease
stress-ng: info: [7514] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [7514] cache allocate: default cache size: 2048K
stress-ng: info: [7514] successful run completed in 240.03s (4 mins, 0.03 secs)
return_code is 0
Running stress-ng lockf stressor for 240 seconds....
stress-ng: info: [7839] dispatching hogs: 160 lockf
stress-ng: info: [7839] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [7839] cache allocate: default cache size: 2048K
stress-ng: info: [7839] successful run completed in 240.06s (4 mins, 0.06 secs)
return_code is 0
Running stress-ng mknod stressor for 240 seconds....
stress-ng: info: [8167] dispatching hogs: 160 mknod
stress-ng: info: [8167] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [8167] cache allocate: default cache size: 2048K
stress-ng: info: [8167] successful run completed in 929.23s (15 mins, 29.23 secs)
return_code is 0
Running stress-ng readahead stressor for 240 seconds....
stress-ng: info: [8342] dispatching hogs: 160 readahead
stress-ng: info: [8342] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [8342] cache allocate: default cache size: 2048K
stress-ng: info: [8345] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8369] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8382] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8394] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8398] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8395] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8386] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8385] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8350] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8378] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8401] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8406] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8393] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8403] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8412] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8428] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8418] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8432] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8435] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8439] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8405] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8423] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8438] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8440] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8453] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8455] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8441] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8413] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8375] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8414] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8381] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8454] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8448] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8417] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8422] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8425] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8397] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8358] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8459] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8355] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8467] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8388] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8361] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8443] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8362] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8368] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8347] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8373] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8400] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8365] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8356] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8376] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8464] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8351] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8384] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8436] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8346] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8466] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8444] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8391] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8463] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8462] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8472] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8461] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8372] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8348] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8389] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8409] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8364] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8396] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8451] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8357] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8431] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8392] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8474] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8469] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8352] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8383] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8437] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8465] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8415] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8430] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8374] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8480] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8476] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8477] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8416] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8387] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8390] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8478] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8483] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8371] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8471] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8484] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8479] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8377] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8370] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8446] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8433] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8486] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8481] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8470] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8450] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8487] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8482] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8488] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8493] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8494] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8490] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8420] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8442] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8489] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8399] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8457] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8496] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8379] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8498] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8349] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8460] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8497] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8424] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8492] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8499] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8500] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8343] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8473] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8354] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8411] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8501] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8468] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8353] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8502] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8402] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8380] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8475] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8495] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8491] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8360] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8447] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8434] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8421] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8485] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8404] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8408] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8344] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8449] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8419] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8367] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8426] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8445] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8366] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8427] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8410] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8452] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8363] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8429] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8359] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8407] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8456] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8458] stress-ng-readahead: test expired during test setup (writing of data file)
stress-ng: info: [8342] successful run completed in 961.42s (16 mins, 1.42 secs)
return_code is 0
Running stress-ng seek stressor for 240 seconds....
stress-ng: info: [8514] dispatching hogs: 160 seek
stress-ng: info: [8514] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [8514] cache allocate: default cache size: 2048K
stress-ng: info: [8514] successful run completed in 280.36s (4 mins, 40.36 secs)
return_code is 0
Running stress-ng sync-file stressor for 240 seconds....
stress-ng: unrecognized option '--sync-file'
Try 'stress-ng --help' for more information.
return_code is 1
*****************************************************************
** Error 1 reported on stressor sync-file!)
*****************************************************************
Running stress-ng xattr stressor for 240 seconds....
stress-ng: info: [8680] dispatching hogs: 160 xattr
stress-ng: info: [8680] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [8680] cache allocate: default cache size: 2048K
stress-ng: info: [8680] successful run completed in 240.15s (4 mins, 0.15 secs)
return_code is 0
*******************************************************************
** stress-ng disk test failed; most recent error was 1
*******************************************************************

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-generic 4.4.0.45.48
ProcVersionSignature: Ubuntu 4.4.0-45.66-generic 4.4.21
Uname: Linux 4.4.0-45-generic ppc64le
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Nov 9 16:39 seq
 crw-rw---- 1 root audio 116, 33 Nov 9 16:39 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: ppc64el
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Wed Nov 9 16:53:09 2016
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 astdrmfb
ProcKernelCmdLine: root=UUID=46c91b9d-cdb7-4faa-ad9f-ecd2aa40647c ro
ProcLoadAvg: 0.22 0.33 0.18 1/1215 5452
ProcLocks:
 1: POSIX ADVISORY WRITE 2660 00:14:759 0 EOF
 2: POSIX ADVISORY WRITE 1635 00:14:481 0 EOF
 3: FLOCK ADVISORY WRITE 2656 00:14:753 0 EOF
 4: POSIX ADVISORY WRITE 2662 00:14:754 0 EOF
 5: POSIX ADVISORY WRITE 3727 00:14:783 0 EOF
ProcSwaps:
 Filename Type Size Used Priority
 /swap.img file 8388544 0 -1
ProcVersion: Linux version 4.4.0-45-generic (buildd@bos01-ppc64el-030) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2) ) #66-Ubuntu SMP Wed Oct 19 14:13:11 UTC 2016
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-45-generic N/A
 linux-backports-modules-4.4.0-45-generic N/A
 linux-firmware 1.157.4
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
cpu_cores: Number of cores present = 20
cpu_coreson: Number of cores online = 20
cpu_dscr: DSCR is 0
cpu_freq:
 min: 3.959 GHz (cpu 72)
 max: 3.990 GHz (cpu 81)
 avg: 3.975 GHz
cpu_runmode:
 Could not retrieve current diagnostics mode,
 No kernel interface to firmware
cpu_smt: SMT=8

Revision history for this message
Mike Rushton (leftyfb) wrote :
tags: added: blocks-hwcert-server
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Re: Disk test fails on IBM Power S822LC (8335-GTB)

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.9 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.9-rc4

tags: added: kernel-da-key
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
importance: Undecided → High
status: New → Incomplete
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Colin Ian King (colin-king) wrote :

stress-ng: fail: [5002] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)

This indicates that the AIO linux io_setup failed with EPERM 1 Operation not permitted error, which is a bit bizarre as the man page does not list that error for that system call.

        if (io_setup(opt_aio_linux_requests, &ctx) < 0) {
                pr_fail_err(name, "io_setup");
                return EXIT_FAILURE;
        }

Can you run:

strace -f stress-ng --aiol 1 --aiol-ops 2 >& trace.log

and attach the trace.log to the bug report. Thanks!

Changed in linux (Ubuntu Xenial):
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Mike Rushton (leftyfb) wrote :

Also seeing this problem on Dell PowerEdge with a xeon phi processor and 4.8 kernel on Ubuntu 16.10.

See attached.

Mike Rushton (leftyfb)
summary: - Disk test fails on IBM Power S822LC (8335-GTB)
+ stress-ng tests failing
Revision history for this message
Colin Ian King (colin-king) wrote : Re: stress-ng tests failing

OK, I can see one big problem, the timeout command is being used to SIGKILL stress-ng - that means all the child processes get SIGKILL'd and hence abort without cleaning up. One ends up with a file system full of garbage files which will cause subsequent tests to break.

I suggest:

1. Sending SIGALRM to the child processes - this will tell them to stop and clean up
2. Remove any temp files after each test in case they are killed or die without cleaning up

Mike Rushton (leftyfb)
no longer affects: linux (Ubuntu)
Changed in plainbox-provider-checkbox (Ubuntu):
status: New → Confirmed
Revision history for this message
Colin Ian King (colin-king) wrote :

Ah, looks like for large multiprocessor machines, /proc/sys/fs/aio-max-nr is too low and causing the setup to fail because of the low default for so many concurrent stressors.

I recommend writing 1000 * number of CPUs into /proc/sys/fs/aio-max-nr before running the aiol test. That should give it enough async i/o events in the kernel to run to completion.

Revision history for this message
Colin Ian King (colin-king) wrote :

NOTE: I'll fix up stress-ng to detect this and report it is a problem, but I do think the failure is correct - stress-ng hit an upper resource bound set by the system and failed. However, it should be reporting that the errno meant that there were no resources left.

Revision history for this message
Colin Ian King (colin-king) wrote :

Added some more smarts to make this a non-failure issue, just a resource limiting exit on each stressor:

http://kernel.ubuntu.com/git/cking/stress-ng.git/commit/?id=d3db4119cdd3bee439a4f2a6cc36d27ff3ce6c4b

Changed in stress-ng:
importance: Undecided → High
status: New → Fix Committed
assignee: nobody → Colin Ian King (colin-king)
Changed in linux (Ubuntu Xenial):
assignee: Colin Ian King (colin-king) → nobody
Jeff Lane  (bladernr)
summary: - stress-ng tests failing
+ stress-ng based disk tests failing
no longer affects: linux (Ubuntu Xenial)
Changed in plainbox-provider-checkbox (Ubuntu Xenial):
status: New → Triaged
Changed in plainbox-provider-checkbox (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → High
Changed in plainbox-provider-checkbox (Ubuntu Xenial):
importance: Undecided → High
Revision history for this message
Colin Ian King (colin-king) wrote :

FYI, the fix landed in stress-ng 0.07.04 over the weekend.

Revision history for this message
Mike Rushton (leftyfb) wrote :

Still getting the same issues with stress-ng 0.07.04

ubuntu@fesenkov:~$ apt-cache policy stress-ng
stress-ng:
  Installed: 0.07.04-1
  Candidate: 0.07.04-1
  Version table:
 *** 0.07.04-1 500
        500 http://ports.ubuntu.com/ubuntu-ports zesty/universe ppc64el Packages
        100 /var/lib/dpkg/status
     0.05.23-1ubuntu2 500
        500 http://ports.ubuntu.com/ubuntu-ports xenial-updates/universe ppc64el Packages
     0.05.23-1 500
        500 http://ports.ubuntu.com/ubuntu-ports xenial/universe ppc64el Packages

Below it thesbottom of the console messages from running the test suite. Currently it is frozen in this state and cannot CTRL+C out of it. I can ssh in to other sessions.

stress-ng: info: [18442] stress-ng-sync-file: this stressor is not implemented on this system: ppc64le Linux 4.4.0-47-generic
stress-ng: info: [18441] stress-ng-sync-file: this stressor is not implemented on this system: ppc64le Linux 4.4.0-47-generic
stress-ng: info: [18443] stress-ng-sync-file: this stressor is not implemented on this system: ppc64le Linux 4.4.0-47-generic
stress-ng: info: [18444] stress-ng-sync-file: this stressor is not implemented on this system: ppc64le Linux 4.4.0-47-generic
stress-ng: info: [18284] successful run completed in 240.03s (4 mins, 0.03 secs)
return_code is 0
Running stress-ng xattr stressor for 240 seconds....
stress-ng: info: [18450] dispatching hogs: 160 xattr
stress-ng: info: [18450] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [18450] cache allocate: default cache size: 2048K
stress-ng: info: [18450] successful run completed in 240.15s (4 mins, 0.15 secs)
return_code is 0
*******************************************************************
** stress-ng disk test failed; most recent error was 137
*******************************************************************

Revision history for this message
Colin Ian King (colin-king) wrote :

It would be useful if one could find out what is running, so output from ps -ax would be useful to see what's hanging.

Revision history for this message
Colin Ian King (colin-king) wrote :

And output from dmesg too please.

Mike Rushton (leftyfb)
Changed in stress-ng:
status: Fix Committed → In Progress
Revision history for this message
Colin Ian King (colin-king) wrote :

Any chance of getting some of that ps / dmesg output?

Revision history for this message
Mike Rushton (leftyfb) wrote :

I had to run the test again. It's been running for about an hour now. I will post the logs and full screen output when it is finished.

Revision history for this message
Mike Rushton (leftyfb) wrote :
Revision history for this message
Mike Rushton (leftyfb) wrote :
Revision history for this message
Mike Rushton (leftyfb) wrote :

It seems to be stuck on:
root 10664 1 0 18:28 pts/3 00:00:00 stress-ng --aggressive --verify --timeout 240 --temp-path /tmp/disk_stress_ng --fstat 0

Though /tmp/disk_stress_ng doesn't exist

Revision history for this message
Jeff Lane  (bladernr) wrote : Re: [Bug 1640547] Re: stress-ng based disk tests failing
Download full text (40.7 KiB)

I wonder if it's left over and the script isn't cleaning up
/tmp/disk_stress_ng before that process has a chance to cleanly die.

On Thu, Nov 17, 2016 at 4:18 PM, Mike Rushton
<email address hidden> wrote:
> It seems to be stuck on:
> root 10664 1 0 18:28 pts/3 00:00:00 stress-ng --aggressive --verify --timeout 240 --temp-path /tmp/disk_stress_ng --fstat 0
>
> Though /tmp/disk_stress_ng doesn't exist
>
> --
> You received this bug notification because you are a member of Checkbox
> Bug Wranglers, which is subscribed to plainbox-provider-checkbox in
> Ubuntu.
> https://bugs.launchpad.net/bugs/1640547
>
> Title:
> stress-ng based disk tests failing
>
> Status in Stress-ng:
> In Progress
> Status in plainbox-provider-checkbox package in Ubuntu:
> Triaged
> Status in plainbox-provider-checkbox source package in Xenial:
> Triaged
>
> Bug description:
> When running the stress-ng disk test, we are seeing the following
> failures as well as I/O lockup:
>
> /dev/sda is a block device
> Found largest partition: "/dev/sda2"
> Test will use /dev/sda2, mounted at "/", using "ext4"
> test_dir is /tmp/disk_stress_ng
> Estimated total run time is 4560 seconds
>
> Running stress-ng aio stressor for 240 seconds....
> stress-ng: info: [4305] dispatching hogs: 160 aio
> stress-ng: info: [4305] cache allocate: using built-in defaults as unable to determine cache details
> stress-ng: info: [4305] cache allocate: default cache size: 2048K
> stress-ng: info: [4305] successful run completed in 240.27s (4 mins, 0.27 secs)
> return_code is 0
> Running stress-ng aiol stressor for 240 seconds....
> stress-ng: info: [4917] dispatching hogs: 160 aio-linux
> stress-ng: info: [4917] cache allocate: using built-in defaults as unable to determine cache details
> stress-ng: info: [4917] cache allocate: default cache size: 2048K
> stress-ng: fail: [5002] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
> stress-ng: fail: [5012] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
> stress-ng: fail: [5020] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
> stress-ng: fail: [5018] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
> stress-ng: fail: [5019] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
> stress-ng: fail: [5013] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
> stress-ng: fail: [4959] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
> stress-ng: fail: [5032] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
> stress-ng: fail: [5041] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
> stress-ng: fail: [5017] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
> stress-ng: fail: [5022] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
> stress-ng: fail: [5016] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not permitted)
> stress-ng: fail: [5038] stress-ng-aio-linux: io_setup failed, errno=1 (Operation not perm...

Revision history for this message
Colin Ian King (colin-king) wrote :

It could be that. I've also added some forceful yield points to the fstat stressor on a SIGALRM to try even hard to abort this stressor.

For the aiol test, it seems I was using the system call interface rather than the aiol library wrapper. I've fixed this up. I will upload a fixed version ASAP but it will take a day or so it land in Ubuntu.

Revision history for this message
Colin Ian King (colin-king) wrote :

It may be worthwhile running each test with a random tmp director, e.g.

TMPDIR=/tmp/disk_stress_ng_$(uuidgen) or
TMPDIR=/tmp/disk_stress_ng_$(cat /proc/sys/kernel/random/uuid)

and that way we don't trash previous test instance temp dir data

Revision history for this message
Mike Rushton (leftyfb) wrote :

Attached is another run of the test with test_dir="/tmp/disk_stress_ng_$(uuidgen)"

The test was hung with the following on the screen:

Running stress-ng sync-file stressor for 240 seconds....
stress-ng: unrecognized option '--sync-file'
Try 'stress-ng --help' for more information.
return_code is 1
*****************************************************************
** Error 1 reported on stressor sync-file!)
*****************************************************************
Running stress-ng xattr stressor for 240 seconds....
stress-ng: info: [16158] dispatching hogs: 160 xattr
stress-ng: info: [16158] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [16158] cache allocate: default cache size: 2048K
stress-ng: info: [16158] successful run completed in 240.17s (4 mins, 0.17 secs)
return_code is 0
*******************************************************************
** stress-ng disk test failed; most recent error was 1
*******************************************************************

This was still running after I hit CTRL+C on the test to grab the logs:

root 12102 1 0 Dec01 pts/1 00:00:00 stress-ng --aggressive --verify --timeout 240 --temp-path /tmp/disk_stress_ng_4788

Revision history for this message
Colin Ian King (colin-king) wrote :

Which version of stress-ng are you using? With the latest V0.07.08 I get:

./stress-ng --sync-file 1
stress-ng: info: [25692] defaulting to a 86400 second run per stressor
stress-ng: info: [25692] dispatching hogs: 1 sync-file
stress-ng: info: [25692] cache allocate: default cache size: 3072K
^C (SIGINT)
stress-ng: info: [25692] successful run completed in 47.32s

Revision history for this message
Mike Rushton (leftyfb) wrote :

stress-ng was at 0.05.23-1ubuntu2 for the above tests.

I have now upgraded stress-ng from the package in Xenial to that from Zesty:

$ sudo apt-cache policy stress-ng
stress-ng:
  Installed: 0.07.08-1
  Candidate: 0.07.08-1
  Version table:
 *** 0.07.08-1 500
        500 http://ports.ubuntu.com/ubuntu-ports zesty/universe ppc64el Packages
        100 /var/lib/dpkg/status
     0.05.23-1ubuntu2 500
        500 http://ports.ubuntu.com/ubuntu-ports xenial-updates/universe ppc64el Packages
     0.05.23-1 500
        500 http://ports.ubuntu.com/ubuntu-ports xenial/universe ppc64el Packages

$ sudo stress-ng --sync-file 1
stress-ng: info: [2626] defaulting to a 86400 second run per stressor
stress-ng: info: [2626] dispatching hogs: 1 sync-file
stress-ng: info: [2626] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [2626] cache allocate: default cache size: 2048K
stress-ng: info: [2627] stress-ng-sync-file: this stressor is not implemented on this system: ppc64le Linux 4.4.0-51-generic
stress-ng: info: [2626] successful run completed in 0.00s

Mind you, this sync-file stressor is more of a red herring and isn't the cause of the system lockups.

I have also found that this bug shows up on all 3 open power servers running petitboot that I have access to at the moment. If I had access to a Tuleta, I'm going to guess this would also affect that as well. I have a good feeling this is just like the memory test[0] (still not resolved). As of now, the stress_ng memory and disk tests both fail with really similar results (system/session lockup).

[0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062

Revision history for this message
Mark W Wenning (mwenning) wrote :
Download full text (41.9 KiB)

Mike, will this fix the stress-related errors I'm seeing in Xeon Phi on
Dell C6320p?
https://certification.canonical.com/hardware/201611-25216/

Thanks,

Mark Wenning
Technical Partner Manager, Cloud Alliances
Canonical, Ltd
<email address hidden>
-----
"We will encourage you to develop the three great virtues of a programmer:
laziness, impatience, and hubris." -- Larry Wall, Programming Perl (1st
edition), Oreilly And Associates

On Mon, Dec 5, 2016 at 10:17 AM, Mike Rushton <email address hidden>
wrote:

> stress-ng was at 0.05.23-1ubuntu2 for the above tests.
>
> I have now upgraded stress-ng from the package in Xenial to that from
> Zesty:
>
> $ sudo apt-cache policy stress-ng
> stress-ng:
> Installed: 0.07.08-1
> Candidate: 0.07.08-1
> Version table:
> *** 0.07.08-1 500
> 500 http://ports.ubuntu.com/ubuntu-ports zesty/universe ppc64el
> Packages
> 100 /var/lib/dpkg/status
> 0.05.23-1ubuntu2 500
> 500 http://ports.ubuntu.com/ubuntu-ports xenial-updates/universe
> ppc64el Packages
> 0.05.23-1 500
> 500 http://ports.ubuntu.com/ubuntu-ports xenial/universe ppc64el
> Packages
>
> $ sudo stress-ng --sync-file 1
> stress-ng: info: [2626] defaulting to a 86400 second run per stressor
> stress-ng: info: [2626] dispatching hogs: 1 sync-file
> stress-ng: info: [2626] cache allocate: using built-in defaults as unable
> to determine cache details
> stress-ng: info: [2626] cache allocate: default cache size: 2048K
> stress-ng: info: [2627] stress-ng-sync-file: this stressor is not
> implemented on this system: ppc64le Linux 4.4.0-51-generic
> stress-ng: info: [2626] successful run completed in 0.00s
>
>
> Mind you, this sync-file stressor is more of a red herring and isn't the
> cause of the system lockups.
>
> I have also found that this bug shows up on all 3 open power servers
> running petitboot that I have access to at the moment. If I had access
> to a Tuleta, I'm going to guess this would also affect that as well. I
> have a good feeling this is just like the memory test[0] (still not
> resolved). As of now, the stress_ng memory and disk tests both fail with
> really similar results (system/session lockup).
>
> [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1640547
>
> Title:
> stress-ng based disk tests failing
>
> Status in Stress-ng:
> In Progress
> Status in plainbox-provider-checkbox package in Ubuntu:
> Triaged
> Status in plainbox-provider-checkbox source package in Xenial:
> Triaged
>
> Bug description:
> When running the stress-ng disk test, we are seeing the following
> failures as well as I/O lockup:
>
> /dev/sda is a block device
> Found largest partition: "/dev/sda2"
> Test will use /dev/sda2, mounted at "/", using "ext4"
> test_dir is /tmp/disk_stress_ng
> Estimated total run time is 4560 seconds
>
> Running stress-ng aio stressor for 240 seconds....
> stress-ng: info: [4305] dispatching hogs: 160 aio
> stress-ng: info: [4305] cache allocate: using built-in defaults as
> unable to determine cache...

Revision history for this message
Jeff Lane  (bladernr) wrote :

On Mon, Dec 5, 2016 at 11:09 PM, Mark W Wenning
<email address hidden> wrote:
> Mike, will this fix the stress-related errors I'm seeing in Xeon Phi on
> Dell C6320p?
> https://certification.canonical.com/hardware/201611-25216/

It's possible? But no idea until you try it. We don't have any Phi
hardware in house. I know there's a system sitting at Dell, but I
still don't have any way to get it it since windows is required (still
no msdn account).

--
"Entropy isn't what it used to be."

Jeff Lane -
Server Certification Lead, Warrior Poet, Biker, Lover of Pie
Phone: 919-442-8649
Ubuntu Ham: W4KDH Freenode IRC: bladernr or bladernr_
gpg: 1024D/3A14B2DD 8C88 B076 0DD7 B404 1417 C466 4ABD 3635 3A14 B2DD

Revision history for this message
Mike Rushton (leftyfb) wrote :

@mark when/if it eventually gets fixed, it might be the same issue. We won't know until we get it resolved.

@colin confirmed stress-ng 0.07.08-1 on a different power 8 server still has the issue.

Revision history for this message
Colin Ian King (colin-king) wrote :

sync_file_range is not implemented on this architecture, it uses a different system call sync_file_range2() which I need to get around to plugging into this stressor instead. So ignore that failure for now. I'll fix that up before the next release of stress-ng.

Revision history for this message
Colin Ian King (colin-king) wrote :

OK, so checkbox is also nuking the files under the --fstat stressor and causing it to get stuck in a loop; I've pushed to fixes to make it more robust in checking for "files that have disappeared" while doing fstats on them. But this is a minor issue as checkbox kills stress-ng anyhow, so this is just minor fix.

Revision history for this message
Colin Ian King (colin-king) wrote :

When heavily loaded I'm seeing a bunch of these USB related kworker thread delay warnings:

[ 5282.423179] INFO: task kworker/136:1:1165 blocked for more than 120 seconds.
[ 5282.423457] Not tainted 4.4.0-47-generic #68-Ubuntu
[ 5282.423498] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5282.423556] kworker/136:1 D 0000000000000000 0 1165 2 0x00000800
[ 5282.423563] Workqueue: events __usb_queue_reset_device
[ 5282.423566] Call Trace:
[ 5282.423570] [c00000793e6c7770] [c000000000015d38] __switch_to+0x1f8/0x350
[ 5282.423574] [c00000793e6c77c0] [c000000000af2ddc] __schedule+0x30c/0x990
[ 5282.423577] [c00000793e6c7890] [c000000000af34a8] schedule+0x48/0xc0
[ 5282.423580] [c00000793e6c78c0] [c000000000823be4] usb_kill_urb+0xc4/0x130
[ 5282.423584] [c00000793e6c7940] [c000000000821988] usb_hcd_flush_endpoint+0x1a8/0x310
[ 5282.423587] [c00000793e6c7a10] [c000000000826310] usb_disable_endpoint+0x80/0x110
[ 5282.423591] [c00000793e6c7a50] [c000000000826410] usb_disable_interface+0x70/0xa0
[ 5282.423594] [c00000793e6c7a90] [c000000000816490] usb_reset_and_verify_device+0x230/0x540
[ 5282.423597] [c00000793e6c7b70] [c000000000816964] usb_reset_device+0x1c4/0x3b0
[ 5282.423601] [c00000793e6c7c10] [c000000000825ef8] __usb_queue_reset_device+0x58/0x90
[ 5282.423604] [c00000793e6c7c50] [c0000000000dd930] process_one_work+0x1e0/0x5a0
[ 5282.423607] [c00000793e6c7ce0] [c0000000000dde84] worker_thread+0x194/0x680
[ 5282.423611] [c00000793e6c7d80] [c0000000000e6980] kthread+0x110/0x130
[ 5282.423615] [c00000793e6c7e30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4

Revision history for this message
Colin Ian King (colin-king) wrote :

Looks like the fstat stressor sticks on the open of /dev/urandom even with O_NONBLOCK when as root, I'm going to skip that for now in the stressor when running with euid of zero.

Revision history for this message
Colin Ian King (colin-king) wrote :

I've fixed up the sync_file_range() call for this architecture.

Revision history for this message
Colin Ian King (colin-king) wrote :

After a couple of runs the readahead stressor had multiple processes stuck on system call #6, close() and required several kill -9 kills on the processes to kill them. This is unexpected behaviour.

Revision history for this message
Colin Ian King (colin-king) wrote :

I'm still not happy about the /usr/lib/plainbox-provider-checkbox/bin/disk_stress_ng using timeout with a -9 (SIGKILL) to terminate stress-ng stressors. Stress-ng stressors can be *cleanly* terminated with a SIGALRM signal, this triggers all the processes to terminate once they have freed resources. Sending a SIGKILL will leave cruft everywhere and won't clean up shared memory segments, which isn't pleasant. It may even lead to deadlocking in some of the stressors that are waiting for an unlock because the unlocking parent gets nuked with a SIGKILL.

Note that some i/o related stressors generate a lot of I/O writes that need to be flushed out before a write and/or a close complete, so sending a SIGALRM or SIGKILL may not do anything immediately as the blocked system call is waiting for I/O to fully flush out.

I'm starting to think that SIGKILL'ing and reaping temp files while stressors are still running could be perilous; for example locking files and killing off processes while locks are open and then reaping files is not a good idea.

So:

1. checkbox script should run a test for a specified amount of time AND I suggest a maximum number of bogo ops, which ever comes first.

2. Stop stress-ng with SIGALRM and not SIGKILL

3. Don't reap files while a stressor is running. That's really ugly thing to do.

Once we have this fixed I am then happy to checkout any kernel related hangs. Meanwhile I'll push some trivial stress-ng changes out tomorrow in another bug fix release to address some of the corner cases I've spotted today.

Revision history for this message
Rod Smith (rodsmith) wrote :

Thanks for the work on this so far, Colin. One point:

> I'm still not happy about the /usr/lib/plainbox-provider-checkbox/bin/disk_stress_ng using timeout with a -9 (SIGKILL) to terminate stress-ng stressors. Stress-ng stressors can be *cleanly* terminated with a SIGALRM signal, this triggers all the processes to terminate once they have freed resources.

We tried this, and unfortunately, SIGALRM didn't work; it doesn't terminate the test in the case of some failures (the one described in this bug report, for instance). For our purposes, if one test fails, the test as a whole has failed, so we aren't too concerned with any subsequent failures that might be a result of using SIGKILL to terminate the first failure. We ARE concerned, though, with the test suite hanging, which could take hours to discover. If the disk corruption affects subsequent tests, then that is of course a concern, but so far we haven't encountered a problem with that.

Revision history for this message
Colin Ian King (colin-king) wrote :

OK, I understand what your requirements are now. With that in mind, I guess the best thing to do is:

1. run stress-ng with the -k flag, this keeps all the process names as "stress-ng" rather than "stress-ng-${stressor-name} - this way we can nuke them using killall -9 stress-ng later on.
2. Initially send a SIGALRM to the stress-ng parent
3. Give it say 60s or so to try to terminate cleanly, this still may not occur if we have lots of pending I/O still waiting on a write()/read() or close().
3. Kill all stress-ng stressors thereafter using killall -9 stress-ng

Revision history for this message
Colin Ian King (colin-king) wrote :

I've made some modifications to the script (see attached), the changes include:

1. kill with ALRM first, then kill with KILL if this does not work after a small grace period. Also report on unkillable stressors
2. bump up async I/O threshold for machines with lots of CPUs
3. force hdd to do sync writes, that way we don't backlog with gazillions of pending I/Os on machines with a lot of memory and many CPUs
4. limit readahead file size so that this stressor does not spend most of it's time generating a test file before it can start testing readaheads

I've run this through several times with the latest stress-ng and it runs through to completion.

So I think we were suffering from issues where loads of pending I/Os from stressors plus bad cleanup on nuked stressors were causing massive I/O backlogs which caused the system to clag up.

Revision history for this message
Colin Ian King (colin-king) wrote :

Attached: updated script

Revision history for this message
Colin Ian King (colin-king) wrote :

And progress on this for checkbox?

Revision history for this message
Colin Ian King (colin-king) wrote :

Oops, I meant "Any progress..."

Changed in stress-ng:
status: In Progress → Fix Committed
assignee: Colin Ian King (colin-king) → nobody
assignee: nobody → Colin Ian King (colin-king)
status: Fix Committed → Fix Released
Revision history for this message
Mark W Wenning (mwenning) wrote :

Got some different errors this time, attaching the test output.

ubuntu@ubuntu-DSS1500:~/good-hawk$ cat stress-ng.txt
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============-=============-============-==================================
ii stress-ng 0.07.16-1ppa1 amd64 tool to load and stress a computer
ubuntu@ubuntu-DSS1500:~/good-hawk$

Revision history for this message
Mark W Wenning (mwenning) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :

@Mark, can you open a new bug report as this is a different bug. Also, can you give a summary of what the failure is as I'm not sure from the data you provided what failure issue you are reporting.

Revision history for this message
Jeff Lane  (bladernr) wrote :

@Colin, one final question... I ran an updated stress-ng (0.07.16-1ppa1) on a Xeon Phi system with a TON of cores that was able to reproduce these bugs that we found on power. Note, that version is where we copied stress-ng 0.07.16 to the cert PPA and built it for Xenial since that version doesn't currently exist in Xenial but is needed to resolve these issues for Xenial certs.

Anyway, using a combination of that and your modifications to disk_stress_ng, I was able to run the tests on the Phi system with no failures except for one that LOOKS like a failure, but appears to succeed anyway:

http://pastebin.ubuntu.com/23996480/

The disk in question is:
Disk /dev/sdb: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xb732291a

Device Boot Start End Sectors Size Id Type
/dev/sdb1 2048 234440703 234438656 111.8G 83 Linux

It's formatted w/ ext3.

This is sync-file run manually with -v enabled:

sudo stress-ng --aggressive --verify --timeout 240 --temp-path /mnt --sync-file 0 --hdd-opts dsync --readahead-bytes 16M -k -v --log-file stress-test

http://paste.ubuntu.com/23997177/

So my concern is whether all those error messages indicate a failure that isn't being treated as such by stress-ng, or if they're benign and expected (and if so, they're confusing because it looks like something bad happened to me).

Revision history for this message
Colin Ian King (colin-king) wrote :

Hi Jeff,

Yep, this issue was spotted on another system in the last week, it's due to a fallocate() option not being supported by some kernels on some file systems. To workaround this, I added two layers of emulation in an abstraction layer to fallocate to resolve this:

commit 74c7f40b63ec0c0e9321480d372b76b437188a23
Author: Colin Ian King <email address hidden>
Date: Thu Feb 9 10:17:06 2017 +0000

    shim: add emulation for failed fallocate(2)

    Add two layers of workarounds for fallocate() failures: firstly
    try zero mode flags if mode causes EOPNOTSUPP, secondly, use
    slow emulation mode if fallocate with zero flag fails with EOPNOTSUPP.

Can you try the latest release, V0.07.20 that has that fix.

Colin

Revision history for this message
Mike Rushton (leftyfb) wrote :

Looks like this one is finally resolved. These are the results from my recent testing:

Result,Timestamp,Test Name,Duration,kernel,Package version,CPU's,Memory,Package,Note
PASS,2017-02-21-19-39-20,disk_stress_ng,01:48:24,4.4.0-63-generic-84-Ubuntu,0.07.16-1ppa1,128,31G,Stock stress-ng, stock kernel, colins changes to disk_stress_ng script
PASS,2017-02-21-21-45-09,disk_stress_ng,01:43:45,4.4.0-63-generic-84-Ubuntu,0.07.16-1ppa1,128,31G,Stock stress-ng, stock kernel, colins changes to disk_stress_ng script
PASS,2017-02-22-15-22-25,disk_stress_ng,01:44:35,4.4.0-64-generic-85-Ubuntu,0.07.16-1ppa1,128,31G,Stock stress-ng, stock kernel, colins changes to disk_stress_ng script
PASS,2017-02-27-17-51-38,disk_stress_ng,01:51:26,4.4.0-64-generic-85-Ubuntu,0.07.21-1~ppa,128,31G,Fresh Deployment
PASS,2017-02-27-20-04-49,disk_stress_ng,01:51:38,4.4.0-64-generic-85-Ubuntu,0.07.21-1~ppa,128,31G,fresh deployment
PASS,2017-02-27-22-44-36,disk_stress_ng,01:54:38,4.4.0-64-generic-85-Ubuntu,0.07.21-1~ppa,128,31G,fresh deployment

Mike Rushton (leftyfb)
tags: removed: blocks-hwcert-server
Changed in plainbox-provider-checkbox (Ubuntu):
status: Triaged → Fix Released
Changed in plainbox-provider-checkbox (Ubuntu Xenial):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.