It might not be good to stir up such an old bug, but it gets regularly updated and new complains so maybe a new approach might help. So let us make one thing clear, IMHO if something overloads your machine with disk I/O it has to stall it. So the solutions paths are more like this: a) beat it with more Processsing / IO HW b) mitigate the effect as far as possible c) avoid the overload before it starts The issue is a common one - so I'll keep my explanations general and not specific to trackerd or any other case that was mentioned before. ### a) beat it with more Processsing / IO HW ### There are way more expensive machines out there which can handle way more I/O without being slown down. The reason is that they have more I/O Cards, virtual functions to spread over CPUs handling that and at the high end servers with totally different I/O IRQ designs. We should agree that on cheap/slow or even medium machines I/O overload just *IS* an issue to responsiveness. But that isn't important - the question is what can a normal user do about it and spending x000000 $ on a machine isn't the solution. ### b) mitigate the effect as far as possible ### So regarding mitigation there were already some approaches in this bug discussion. Like using ionice and several dirty ratio tunings, but all these don't prevent the I/O overload. E.g. if you overload the system with only "Best Effort" I/O class, the only difference it makes is that "other I/O" might pass faster, but your system is still fairly busy => unresponsive Also dirty ratios come down to spending the process remaining time slice to clean up dirty memory as soon as a certain level is reached, now while you can configure higher ratios (at the price of endangering integrity) it also won't stop the burst of I/O. No instead it will allow to submit more data to dirty the page cache and thereby indirectly more I/O overloading the system again. ### c) avoid the overload before it starts ### It must be said, since this bug starts back in 2007 and a lot of the reports are related to I/O+*sync that just for sync&journaling various filesystem and general kernel improvements have been mad. Several posts in this bug confirm this already. Now what I didn't see people trying throttle the processes that overload the system. Throttling at => https://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt As any - this approach has certain limitations, but it is a new way to tackle the overall issue. It also need certain cgroup and filesystem features (like accounting writeback through pagecache) which might only be available in modern ubuntu releases. ### Experiment ### As an experiment to prove the solution I use the tools fio and latencytop to compare: 1. no background load checking latencytop 2. running a random read/write mutlithread fio in background checking latencytop 3. running a throttled random read/write mutlithread fio in background checking latencytop # Background Load # A fio job file like this: [global] ioengine=libaio rw=randrw bssplit=1k/25:4k/50:64k/25 size=512m directory=/home/paelzer/latencytest iodepth=8 [dio] direct=1 numjobs=8 [pgc] direct=0 numjobs=8 # Case 1 - No background load => almost no latency Cause Maximum Percentage Waiting for event (select) 5,0 msec 39,7 % Waiting for event (poll) 5,0 msec 33,9 % Userspace lock contention 4,8 msec 25,7 % [do_wait] 2,7 msec 0,4 % [ep_poll] 2,4 msec 0,2 % Reading from file 0,9 msec 0,0 % Reading EXT3 directory htree 0,2 msec 0,0 % [hrtimer_nanosleep] 0,1 msec 0,0 % # Case 2 - Unrestricted background load overloading the I/O subsystem shows massive impact - ext4 data/log writes - memory management due to trashing page cache ... => Fast Jobs: 16 (f=16): [m(16)] [6.7% done] [92482KB/99.50MB/0KB /s] [6302/6483/0 iops] [eta 01m:51s] Cause Maximum Percentage [ext4_file_write_iter] 91,8 msec 0,3 % [wait_transaction_locked] 63,4 msec 0,1 % Marking inode dirty 61,2 msec 0,9 % [SyS_io_destroy] 46,3 msec 0,3 % [lru_add_drain_all] 18,0 msec 0,1 % [__block_write_begin] 16,8 msec 38,5 % [__lock_page_killable] 16,2 msec 34,7 % [read_events] 5,0 msec 21,2 % Waiting for event (poll) 5,0 msec 1,9 % # Case 3 - Now the same workload but contained in a blkio throttled cgroup mkdir /sys/fs/cgroup/blkio/limitbgload lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 29,3G 0 disk ├─sda1 8:1 0 28,3G 0 part / ├─sda2 8:2 0 1K 0 part └─sda5 8:5 0 1021M 0 part # Limit to 4MB/s write and 8 MB/s write speed echo 8:0 $((1024*1024*4)) > /sys/fs/cgroup/blkio/limitbgload/blkio.throttle.write_bps_device echo 8:0 $((1024*1024*8)) > /sys/fs/cgroup/blkio/limitbgload/blkio.throttle.read_bps_device cgexec -g blkio:limitbgload fio causelatency.fiojob The workload shows throttling is working: Jobs: 16 (f=16): [m(16)] [22.0% done] [6724KB/8915KB/0KB /s] [577/598/0 iops] [eta 09m:25s] But we can also see its desired effect avoiding to overload the system with I/O. Cause Maximum Percentage [__lock_page_killable] 132,2 msec 46,5 % [__block_write_begin] 131,4 msec 47,9 % fsync() on a file (type 'F' for details) 30,7 msec 0,0 % Marking inode dirty 21,5 msec 0,1 % [ext4_file_write_iter] 5,2 msec 0,0 % Waiting for event (select) 5,0 msec 1,4 % Userspace lock contention 5,0 msec 1,0 % Waiting for event (poll) 5,0 msec 1,7 % [read_events] 4,9 msec 1,3 % => this shows almost only the stalls due to throttling itself which are wanted => the dirtying and filesystem latencies are way smaller now => the system "feels" right regarding responsiveness ### TL;DR ### - huge machines just beat I/O overload with more HW or better I/O Architecture - Code improves to mitigate effects but can never be perfect for *ALL* users at once (especially in the default config) - try throttling your processes overloading I/O if you are not requiring its results asap => Let us discuss if that would be an option and if so let us close this bug and open a separate one requesting configurable throttling for each component applicable like trackerd and so many other I/O heavy background tasks