Comment 0 for bug 2002039

Revision history for this message
M. Vefa Bicakci (vbicakci) wrote :

Brief Description
-----------------
Applications carrying out file I/O on nohz_full CPUs may cause the vm.dirty_bytes sysctl threshold to be reached.

We have received a report that running a test application on nohz_full CPUs causes the Dirty field in /proc/meminfo to eventually reach the threshold value set by the vm.dirty_bytes threshold, and this causes all applications carrying out disk I/O to eventually block.

The issue was found to be in the handling of the vm_node_stat array, which is updated from multiple contexts:

* hard IRQ contexts (such as via quiet_vmstat, which is called from hard IRQ context)
* other contexts (such as via __mod_node_page_state, which is called from numerous other parts of the kernel)

We found that __mod_node_page_state and its sibling functions update vm_node_stat (and other arrays) in a non-IRQ-safe manner. When combined with the fact that quiet_vmstat is called from hard IRQ context, this appears to cause vm_node_stat and other statistics arrays to be incorrectly updated.

This bug is opened as a placeholder so that a fix can be merged.

Severity
--------
Major: Certain workloads eventually result in system hangs

Steps to Reproduce
------------------
Applications that carry out logging to files from nohz_full CPUs and rotating the log files appear to trigger this issue. This description is admittedly vague. If there is interest, I can publish a cleaned-up test application.

Expected Behavior
------------------
Dirty field in /proc/meminfo should not increase without bounds.

Actual Behavior
----------------
Dirty field increases gradually, and eventually reaches the threshold set by vm.dirty_bytes sysctl. The value does not decrease even if the problematic/triggering user-space application is killed.

Reproducibility
---------------
Reliably reproducible

System Configuration
--------------------
Reproduced on all-in-one-simplex and duplex with low-latency/preempt-rt kernel.

Branch/Pull Time/Commit
-----------------------
Not applicable.

Last Pass
---------
StarlingX versions with 3.10-based kernels are not affected, as the issue was "introduced" with a commit that was merged in the v4.15-rc1 development time frame:

commit 62cb1188ed86a9cf082fd2f757d4dd9b54741f24
Author: Peter Zijlstra <email address hidden>
Date: Tue Aug 29 15:07:54 2017 +0200

    sched/idle: Move quiet_vmstate() into the NOHZ code

    quiet_vmstat() is an expensive function that only makes sense when we
    go into NOHZ.

Timestamp/Logs
--------------
Not applicable.

Test Activity
-------------
Normal use.

Workaround
----------
None.