CI: not enough memory to run HA jobs with current services configuration

Bug #1626483 reported by Gabriele Cerami
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

latest ha master periodic jobs showed errors related to memory shortage

here:

http://logs.openstack.org/periodic/periodic-tripleo-ci-centos-7-ovb-ha/9d0c76f/logs/postci.txt.gz#_2016-09-22_07_53_34_000

previous job also hung unexpectedly.
Also, local test on a 32G machine 8 cores with same configuration gave similar results with OOM killer removing heat-engine.
8G undercloud now is not enough to get a successful deployment.

We could try (as shardy suggested):
- changing the ha into a minimal topology (1 controller 1 compute) with pacemaker enables
- analyzing memory usage of the activated services, and remove services from the default, possibily splitting the jobs into various job that activates in turn different services to get full coverage.

Tags: ci
Revision history for this message
Gabriele Cerami (gcerami) wrote :

last memory increase for the undercloud was done less than a month ago, and bumped the overcloud to 8G.

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

I counted about 25 failing because of memory jobs yesterday, about 10 already today.

Changed in tripleo:
importance: Undecided → Critical
status: New → Confirmed
Revision history for this message
Emilien Macchi (emilienm) wrote :

Steven reported a bug in Heat, https://bugs.launchpad.net/heat/+bug/1626675
probably highly related.

Revision history for this message
Gabriele Cerami (gcerami) wrote :

Yes, this bug probably caused Steven to start the analysis, and for the last two days the builds appear not to have memory problems anymore. Closing for now

Changed in tripleo:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.