commit ebcbf953bf82a49f545fc8de01b68ce547e78d6d
Author: Eric MacDonald <email address hidden>
Date: Tue Jun 22 12:32:44 2021 -0400
Reduce collectd write_threads from 5 to 1
StarlingX currently uses collectd version 5.8.1
with 5 write threads. This version of collectd is
seen to coredump in its network plugin 1-2 times
out of 100 process restarts. This means that
every time a node is rebooted there is a 1-2 %
chance it will coredump.
The opensource collectd version 5.12.0 includes
the following change which addresses a race
condition by implementing a mutex pthread lock
around the sendto network call to prevent the
race condition and avoid the coredump.
StarlingX is not yet prepared to migrate to this
new version. Instead and until then this update
reduces number of write_threads to 1, as
recommended by the collectd update author, until
StarlingX successfully integrates a version of
collectd -ge 5.12.0
Test Plan:
PASS: Verify no collectd coredumps in over 5000
process restarts across multiple servers
Reviewed: https:/ /review. opendev. org/c/starlingx /stx-puppet/ +/797509 /opendev. org/starlingx/ stx-puppet/ commit/ ebcbf953bf82a49 f545fc8de01b68c e547e78d6d
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit ebcbf953bf82a49 f545fc8de01b68c e547e78d6d
Author: Eric MacDonald <email address hidden>
Date: Tue Jun 22 12:32:44 2021 -0400
Reduce collectd write_threads from 5 to 1
StarlingX currently uses collectd version 5.8.1
with 5 write threads. This version of collectd is
seen to coredump in its network plugin 1-2 times
out of 100 process restarts. This means that
every time a node is rebooted there is a 1-2 %
chance it will coredump.
The opensource collectd version 5.12.0 includes
the following change which addresses a race
condition by implementing a mutex pthread lock
around the sendto network call to prevent the
race condition and avoid the coredump.
https:/ /github. com/collectd/ collectd/ commit daf8bc7ab6c0328 7f281d317b1d5fd
/c44c159065
StarlingX is not yet prepared to migrate to this
new version. Instead and until then this update
reduces number of write_threads to 1, as
recommended by the collectd update author, until
StarlingX successfully integrates a version of
collectd -ge 5.12.0
Test Plan:
PASS: Verify no collectd coredumps in over 5000
process restarts across multiple servers
Regression:
PASS: Verify collectd logging
PASS: Verify collectd sampling
PASS: Verify alarming and degrade handling
Closes-Bug: 1872979 754142a5237608e bb227898ecb
Signed-off-by: Eric MacDonald <email address hidden>
Change-Id: Ie9297f596d30c2