Comment 38 for bug 1872979

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/797509
Committed: https://opendev.org/starlingx/stx-puppet/commit/ebcbf953bf82a49f545fc8de01b68ce547e78d6d
Submitter: "Zuul (22348)"
Branch: master

commit ebcbf953bf82a49f545fc8de01b68ce547e78d6d
Author: Eric MacDonald <email address hidden>
Date: Tue Jun 22 12:32:44 2021 -0400

    Reduce collectd write_threads from 5 to 1

    StarlingX currently uses collectd version 5.8.1
    with 5 write threads. This version of collectd is
    seen to coredump in its network plugin 1-2 times
    out of 100 process restarts. This means that
    every time a node is rebooted there is a 1-2 %
    chance it will coredump.

    The opensource collectd version 5.12.0 includes
    the following change which addresses a race
    condition by implementing a mutex pthread lock
    around the sendto network call to prevent the
    race condition and avoid the coredump.

    https://github.com/collectd/collectd/commit
    /c44c159065daf8bc7ab6c03287f281d317b1d5fd

    StarlingX is not yet prepared to migrate to this
    new version. Instead and until then this update
    reduces number of write_threads to 1, as
    recommended by the collectd update author, until
    StarlingX successfully integrates a version of
    collectd -ge 5.12.0

    Test Plan:

    PASS: Verify no collectd coredumps in over 5000
          process restarts across multiple servers

    Regression:

    PASS: Verify collectd logging
    PASS: Verify collectd sampling
    PASS: Verify alarming and degrade handling

    Closes-Bug: 1872979
    Signed-off-by: Eric MacDonald <email address hidden>
    Change-Id: Ie9297f596d30c2754142a5237608ebb227898ecb