Comment 20 for bug 1666260

Revision history for this message
Iain Buclaw (iainb) wrote :

As original poster, if I didn't continue to post oom dumps, perhaps things started to peter out on 4.8.0-39 or later.

What was particular about the load that triggered this bug was heavy IO putting cache pressure on ext4 on a system where there's zero locality of reference in anything read from or written to disk (ssd backed storage).

In any case, by May these data storage servers that had been triggering this issue had been decommissioned and IO strategy had changed. Now writes are written to a raw block device before being flushed to filesystem periodically using O_DSYNC, taking ext4 disk cache out of the equation.

The HWE kernel is now 4.10, and judging by the edge packages soon to be 4.13, so maybe its been fixed in that time. However I'm no longer able to confirm or deny that, as there's no possible way for me to reproduce it anyway. As per Rasmus' comment, its something that only happened on production workloads.