tripleO-common healthcheck constantly spikes CPU by lsof
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
In Progress
|
High
|
Cédric Jeanneret |
Bug Description
On completely idle controllers, no user traffic, no load, the biggest CPU hog is containers' healthcheck which constantly spikes CPU by executing lsof to find open ports.
And what's worst we execute lsof twice:
1. https:/
2. https:/
This is an example how it looks likes:
top - 11:47:35 up 3 days, 19:26, 2 users, load average: 4.96, 4.74, 5.08
Tasks: 3207 total, 7 running, 3200 sleeping, 0 stopped, 0 zombie
%Cpu(s): 6.4 us, 5.2 sy, 0.0 ni, 88.3 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 19651235+total, 57689756 free, 78429920 used, 60392676 buff/cache
KiB Swap: 16777212 total, 16777212 free, 0 used. 11715618+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
909344 42403 20 0 13360 696 556 R 45.3 0.0 0:01.43 lsof
909211 42402 20 0 16768 932 784 S 41.8 0.0 0:01.32 lsof
909309 42403 20 0 16768 932 784 S 41.1 0.0 0:01.30 lsof
909307 root 20 0 16768 936 784 S 40.8 0.0 0:01.29 lsof
909424 42402 20 0 16768 172 0 R 26.9 0.0 0:00.85 lsof
909438 root 20 0 16768 180 0 R 17.4 0.0 0:00.55 lsof
909468 42403 20 0 16768 176 0 R 12.0 0.0 0:00.38 lsof
1 root 20 0 209632 20952 4276 S 10.8 0.0 418:43.47 systemd
We have 7 lsof commands running in the same time, in total we have 223% of CPU consumed just by lsof.
It seems to be rather inefficient and this is not going to scale on busy controllers with large overcloud deployed.
Changed in tripleo: | |
status: | New → Triaged |
importance: | Undecided → High |
milestone: | none → wallaby-rc1 |
tags: | added: train-backport-potential |
Changed in tripleo: | |
milestone: | wallaby-rc1 → xena-1 |
Changed in tripleo: | |
milestone: | xena-1 → xena-2 |
Changed in tripleo: | |
milestone: | xena-2 → xena-3 |
I think having a single cached view of lsof should be enough for evaluation of all healthchecks, in this round of execition.
We could look into https:/ /opendev. org/openstack/ tripleo- ansible/ src/branch/ master/ tripleo_ ansible/ roles/tripleo_ container_ manage/ templates/ systemd- service. j2#L22 option to install some cache create/purge hooks for systemd units calling healthchecks