A lot of processes '/usr/bin/openstack' stuck on controllers during deployment, that leads to OOM
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Released
|
High
|
Andriy Kurilin | ||
7.0.x |
Invalid
|
High
|
Dennis Dmitriev |
Bug Description
During cluster deployment, puppet manifests use shell command '/usr/bin/python /usr/bin/openstack ...' to communicate with OpenStack components.
These processes never end, consuming a lot of memory and as the result invoking oom-killer.
For example:
- controller with 3Gb of memory and 3Gb of swap,
- after deploy finished (with failure), all memory and swap was filled; 3.5Gb was taken by processes /usr/bin/openstack.
Reproduced on CI: https:/
Scenario:
1. Create cluster
2. Add 1 controller node
3. Deploy the cluster
4. Add 2 controller nodes
5. Deploy changes
Result: re-deploy of primary controller on step 5 failed.
Here is memory consumption on primary controller right after step 5: http://
Here is memory consumption after killing the processes /usr/bin/openstack: http://
-------
Such issue can lead to failures like described in https:/
Here is an error.log from apache on primary controller, when there was no free memory: http://
[Sun Oct 04 22:29:31.686995 2015] [core:notice] [pid 13201:tid 140384935454592] AH00052: child pid 27972 exit signal Segmentation fault (11)
[Sun Oct 04 22:29:31.687026 2015] [core:error] [pid 13201:tid 140384935454592] AH00546: no record of generation 0 of exiting child 27972
...
See 'atop' logs for node-5 in the diagnostic snapshot attached to the bug.
Changed in fuel: | |
status: | New → Confirmed |
tags: | added: swarm-blocker |
Changed in fuel: | |
assignee: | MOS Packaging Team (mos-packaging) → Artem Silenkov (asilenkov) |
tags: | added: area-build |
This really needs to get addressed on the openstackclient itself. It doesn't ever die if it hangs and doesn't support a timeout.