Comment 1 for bug 2002039

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kernel (master)

Reviewed: https://review.opendev.org/c/starlingx/kernel/+/869382
Committed: https://opendev.org/starlingx/kernel/commit/436c7067d0e022d2053272e8b9c3d9c18473de5e
Submitter: "Zuul (22348)"
Branch: master

commit 436c7067d0e022d2053272e8b9c3d9c18473de5e
Author: M. Vefa Bicakci <email address hidden>
Date: Thu Jan 5 15:47:33 2023 +0000

    kernel: Do not call quiet_vmstat from IRQ context

    We received a bug report indicating that the "Dirty" field in
    /proc/meminfo was increasing without bounds, to the point that the
    number of dirty file pages would eventually reach what is enforced by
    the vm.dirty_bytes threshold (which is set to 800_000_000 bytes in
    StarlingX) and cause any task attempting to carry out disk I/O to get
    blocked.

    Upon further debugging, we noticed that this issue occurred on nohz_full
    CPUs where a user application was carrying out disk I/O by writing to
    and rotating log files. The issue was reproducible with the preempt-rt
    patch set very reliably.

    This commit addresses the issue in question, by reverting commit
    62cb1188ed86 ("sched/idle: Move quiet_vmstate() into the NOHZ code"),
    which was merged in the v4.15-rc1 time frame. The revert, in effect,
    moves the quiet_vmstat function call from hard IRQ context back to the
    start of the idle loop. Please see the patch description for a more
    detailed overview.

    Note that this commit does not introduce a "novel" change, as the
    4.14.298-rt140 kernel, released on 2022-11-04 does not have the reverted
    commit either, which should preclude the need for regression testing in
    terms of functionality and performance.

    I would like to acknowledge the extensive help and guidance provided by
    Jim Somerville <email address hidden> during the debugging and
    investigation of this issue.

    Verification

    - The issue was reproduced with an older CentOS-based StarlingX-based
      system, running a StarlingX/linux-yocto preempt-rt kernel based on
      v5.10.112-rt61 by running a test application for about 4~5 hours. In
      this configuration, the issue becomes apparent within 1 hour or so,
      where the Dirty field in /proc/meminfo reaches the threshold sysctl
      vm.dirty_background_bytes (set to 600_000_000 bytes in StarlingX). By
      the end of the test, the Dirty field was very close to the
      vm.dirty_bytes threshold sysctl (800_000_000 bytes).

      Afterwards, a kernel patched with this commit was found to no longer
      reproduce the issue, by running the same test application for ~12.5
      hours. (Note that the second test had Meltdown/Spectre mitigations
      enabled by accident, but we are confident that this does not affect
      the test results.) The Dirty value in /proc/meminfo stayed around
      180_000 KiB for the duration of the test. A test re-run with the
      Meltdown/Spectre mitigations disabled, for a duration of 1.75 hours,
      had similar results.

      The test application that reproduces this issue writes to and rotates
      log files in a rapid manner, with a usleep(0) call between every log
      file rotation. The issue is reproduced on nohz_full CPUs with the
      preempt-rt kernel, more reliably at least.

    - A Debian-based StarlingX ISO image was successfully built with this
      commit.

    - The ISO image was successfully installed into a qemu/KVM-based virtual
      machine using the All-in-One Simplex, low-latency profile, and the
      Ansible bootstrap procedure was successful.

    - The issue was confirmed to no longer exist with this commit, by
      running multiple concurrent instances of a simplified test application
      for about 30 minutes (with the installation resulting from the
      Debian-based StarlingX ISO image built with this commit). Without a
      patched kernel, the issue becomes apparent within 10 minutes of test
      runtime in this configuration.

    Closes-Bug: 2002039
    Change-Id: I818d8bd751f4b1941a26530a99a4a635e98d5c54
    Signed-off-by: M. Vefa Bicakci <email address hidden>