skydive_agent container not reaping zombies
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kolla |
Fix Released
|
Medium
|
Michal Nasiadka |
Bug Description
The skydive_agent container is creating zombie processes, which are not being reaped. It is possible for it to create so many zombies that it exhausts the PID space - and you can now not start any new processes.
Trying to run a command at the bash prompt in this state will just give you the (inaccurate) error message "fork: Cannot allocate memory".
The zombies are all "ovs-ofctl".
"docker stop skydive_agent" will result in all the zombie processes being reaped, fixing up the machine (assuming you got there before the PID space is all used up!).
It sounds a lot like this problem: https:/
kolla-ansible 7.0.0
CentOS 7.5.1804 GenericCloud for the hardware
CentOS 7.5.1804 for the containers
kolla-build run locally, for centos binary -> 7.0.0 (I have made no changes to the skydive or openvswitch related containers)
Changed in kolla-ansible: | |
milestone: | none → rocky-3 |
assignee: | nobody → Michal Nasiadka (mnasiadka) |
importance: | Undecided → Medium |
milestone: | rocky-3 → none |
status: | New → Confirmed |
affects: | kolla-ansible → kolla |
Changed in kolla: | |
milestone: | none → 8.0.0 |
Everyone should NOT use skydive at this time if you don't rebuild skydive image from source to get the fix from upstream skydive project[1].
We met with this problem when we deployed kolla-ansible with master branch image on testbed env, all compute nodes down after around 1 or 2 hours with more than 4k zombie processes.
[1] https:/ /github. com/skydive- project/ skydive/ issues/ 1541