Comment 11 for bug 1709889

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-08-17 08:42 EDT-------
Those three patches, at least in the kernel I am running, actually make things worse. The characteristics have changed, in what appears to be a general slow-down of disk I/O (it took over 12 hours to hit the first set of sever stalls), but the delays - when they do occur - or much worse. I saw I/Os getting delayed for over 40 minutes.

I have double-checked that the patches are installed. But in spite of having the patch for the delay length (5be6b75610cefd1e21b98a218211922c2feb6e08) the behavior is back to what I was seeing before that patch alone.

I'm attaching the combined diff of the changes I made to the kernel. Note, the only difference between the "worse" run and the previous "better" one was the addition of these two patches:

4d608baac5f4e72b033a122b2d6d9499532c3afc "block: Initialize cfqq->ioprio_class in cfq_get_queue()"
142bbdfccc8b3e9f7342f2ce8422e76a3b45beae "cfq: Disable writeback throttling by default"

Which I can't explain, as I don't see how either of those should have made this worse.

Maybe I need the actual source for your test kernel so I can add my debug-monitoring code and run. With 40-minute delays the debug-monitoring code is technically not needed, as HTX will complain. But if, as I was seeing on the previous kernel, the delays are below 10 minutes then HTX will never notice and there will be no obvious indication of the more subtle issue.