Deployment fails because controllers are overloaded (CPU): Failed to execute hook 'ceilometer-radosgw-user'. puppet timeout error: execution expired
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Invalid
|
Critical
|
Oleksiy Molchanov | ||
8.0.x |
Fix Committed
|
Critical
|
Oleksiy Molchanov | ||
Mitaka |
Invalid
|
Critical
|
Oleksiy Molchanov |
Bug Description
System test 'huge_ha_
2016-02-09 00:03:11 ERROR [803] Error running RPC method granular_deploy: Failed to execute hook 'ceilometer-
---
uids:
- '1'
- '3'
- '2'
parameters:
puppet_modules: /etc/puppet/modules
puppet_manifest: /etc/puppet/
timeout: 300
cwd: /
priority: 1900
fail_on_error: true
type: puppet
id: ceilometer-
, trace:
["/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
"/usr/
2016-02-09 00:03:11 ERROR [803] b97d40c3-
Here is a part of atop logs on 1 of controllers:
http://
http://
http://
As you can see it was overloaded, puppet, ceilometer and rabbitmq utilized all CPU resources. Also a lot of swap (~50%) was used, RAM was mostly utilized by OpenStack services (nova, neutron, heat).
Here are HW characteristics of VMs used for controller nodes:
root@node-1:~# grep -P 'processor|model name|^\s*$' /proc/cpuinfo
processor : 0
model name : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
processor : 1
model name : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
root@node-1:~# free -m
total used free shared buffers cached
Mem: 3009 2903 106 24 5 106
-/+ buffers/cache: 2791 218
Swap: 3071 1306 1765
In that test controller nodes have an additional 'ceph-osd' role and ceilometer is enabled. I think we have to increase RAM/CPU values for VMs in such tests, but we need a confirmation from deployment engineers that lack of resources is a root cause of deployment failure.
Changed in fuel: | |
assignee: | Fuel Library (Deprecated) (fuel-library) → Fuel Sustaining (fuel-sustaining-team) |
milestone: | 9.0 → 10.0 |
status: | New → Confirmed |
Changed in fuel: | |
assignee: | Andrii Petrenko (aplsms) → Oleksiy Molchanov (omolchanov) |
Diagnostic snapshot doesn't contain remote logs (see bug #1541390), so attaching archive with full /var/log folder https:/ /drive. google. com/file/ d/0BzaZINLQ8- xkb1NyQ2FRTFZKa jA/view? usp=sharing