Comment 3 for bug 1924686

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/786599
Committed: https://opendev.org/starlingx/integ/commit/ccfeeef59d39e42b2775bb5a216732c4999f6e42
Submitter: "Zuul (22348)"
Branch: master

commit ccfeeef59d39e42b2775bb5a216732c4999f6e42
Author: Li Zhou <email address hidden>
Date: Mon Apr 12 02:15:25 2021 -0400

    systemd: Prevent excessive /proc/1/mountinfo reparsing

    Backport the patches for this issue:
    https://bugzilla.redhat.com/show_bug.cgi?id=1819868

    We met such an issue:
    When testing a large number of pods (> 230), occasionally observed a
    number of issues related to systemd process:
        systemd ran continually 90-100% cpu usage
        systemd memory usage started increasing rapidly (20GB/hour)
        systemctl commands would always timeout (Failed to get properties:
            Connection timed out)
        sm services failed and can't recover: open-ldap,
            registry-token-server, docker-distribution, etcd
        new pods can't start, and got stuck in state ContainerCreating

    Those patches work to prevent excessive /proc/1/mountinfo reparsing.
    It has been verified that those patches can improve this performance
    greatly.

    16 commits are listed in sequence (from [1] to [16]) at below link
    for the issue:
    https://github.com/systemd-rhel/rhel-8/pull/154/commits

    [16](10)core: prevent excessive /proc/self/mountinfo parsing
    [15][Dropped-6]test: add ratelimiting test
    [14](9)sd-event: add ability to ratelimit event sources
    [13](8)sd-event: increase n_enabled_child_sources just once
    [12](7)sd-event: update state at the end in event_source_enable
    [11](6)sd-event: remove earliest_index/latest_index into common part of
    event source objects
    [10][Dropped-5]sd-event: follow coding style with naming return
    parameter
    [9] [Dropped-4]sd-event: ref event loop while in sd_event_prepare() ot
    sd_event_run()
    [8] (5)sd-event: refuse running default event loops in any other thread
    than the one they are default for
    [7] [Dropped-3]sd-event: let's suffix last_run/last_log with "_usec"
    [6] [Dropped-2]sd-event: fix delays assert brain-o (#17790)
    [5] (4)sd-event: split out code to add/remove timer event sources to
    earliest/latest prioq
    [4] (3)sd-event: split clock data allocation out of sd_event_add_time()
    [3] [Dropped-1]sd-event: mention that two debug logged events are
    ignored
    [2] (2)sd-event: split out enable and disable codepaths from
    sd_event_source_set_enabled()
    [1] (1)sd-event: split out helper functions for reshuffling prioqs

    I ported 10 of them back (from (1) to (10)) to fix this issue
    and dropped the other 6 (from [Dropped-1] to [Dropped-6]) for those
    reasons:
    [Dropped-1]Only changes error log.
    [Dropped-2]Fixes a bug introduced in a commit which doesn't exist in
    this version.
    [Dropped-3]Only changes vars' names and there is no functional change.
    [Dropped-4]More commits are needed for merging it, while I don't see
    any help on adding the rate-limiting ability.
    [Dropped-5]Change coding style for a function which isn't really used
    by anyone.
    [Dropped-6]Add test cases.

    Closes-Bug: #1924686
    Signed-off-by: Li Zhou <email address hidden>
    Change-Id: Ia4c8f162cb1a47b40d1b26cf4d604976b97e92d6