DB errors for large/nested stacks with convergence enabled
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Heat |
In Progress
|
High
|
Thomas Herve |
Bug Description
I re-ran this reproducer https:/
heat event-list stress
| resource_name | id | resource_
+---------------+
| stress | f6cb0c4f-
| NovaComputes | 8828f0a4-
| NovaComputes | feaa933f-
| stress | 61a39dc3-
Without convergence, it works OK and completes in a little over 2 minutes.
With convergence enabled (convergence_
$ heat event-list stress_conv
| resource_name | id | resource_
+---------------+
| stress_conv | dd04a4fa-
| NovaComputes | 4ff9141d-
| NovaComputes | 2d37a864-
| stress_conv | 8a4d70de-
Also the failure appears to leak memory - this is the top output for one of the workers (4 workers on devstack) - we see the memory usage go from ~76M up to way over 200, and this isn't released after the stack goes CREATE_FAILED.
29890 stack 20 0 306036 78056 4244 S 1.7 0.5 0:00.49 heat-engine
29890 stack 20 0 306036 78056 4244 S 2.0 0.5 0:00.55 heat-engine
29890 stack 20 0 306036 78056 4244 S 2.0 0.5 0:00.61 heat-engine
29890 stack 20 0 306036 78056 4244 S 2.3 0.5 0:00.68 heat-engine
29890 stack 20 0 306036 78056 4244 S 1.7 0.5 0:00.73 heat-engine
29890 stack 20 0 327148 99660 4628 R 53.8 0.6 0:02.35 heat-engine
29890 stack 20 0 349072 121408 4628 R 87.7 0.8 0:04.99 heat-engine
29890 stack 20 0 377520 150060 4628 R 79.1 0.9 0:07.38 heat-engine
29890 stack 20 0 420844 192204 8436 R 82.4 1.2 0:09.86 heat-engine
29890 stack 20 0 448996 220336 8436 R 87.1 1.4 0:12.49 heat-engine
29890 stack 20 0 472304 243884 8436 R 83.7 1.5 0:15.01 heat-engine
29890 stack 20 0 490224 261632 8436 R 78.5 1.6 0:17.38 heat-engine
29890 stack 20 0 502576 273912 8436 R 59.9 1.7 0:19.19 heat-engine
29890 stack 20 0 506192 277552 8436 S 54.6 1.7 0:20.84 heat-engine
29890 stack 20 0 519808 291472 8436 S 49.2 1.8 0:22.32 heat-engine
29890 stack 20 0 519808 291472 8436 S 39.9 1.8 0:23.52 heat-engine
29890 stack 20 0 519808 291472 8436 R 59.3 1.8 0:25.31 heat-engine
29890 stack 20 0 519808 291472 8436 R 54.6 1.8 0:26.96 heat-engine
29890 stack 20 0 519808 291472 8436 R 75.8 1.8 0:29.25 heat-engine
There's a bunch of errors like this in the heat logs:
File "/usr/lib/
So, I guess we have to bump the connection limit, but that doesn't resolve the memory usage/leak.
Contrasting this to the non-convergence case, I see memory usage roughly static at about 100MB per engine process (which still seems quite high).
Changed in heat: | |
milestone: | none → ocata-2 |
Changed in heat: | |
assignee: | nobody → Crag Wolfe (cwolfe) |
status: | New → In Progress |
Changed in heat: | |
assignee: | Crag Wolfe (cwolfe) → Thomas Herve (therve) |
Changed in heat: | |
importance: | Critical → High |
milestone: | ocata-2 → ocata-3 |
Changed in heat: | |
milestone: | ocata-3 → ocata-rc1 |
Changed in heat: | |
milestone: | ocata-rc1 → pike-1 |
Changed in heat: | |
milestone: | pike-1 → pike-2 |
Changed in heat: | |
milestone: | pike-2 → pike-3 |
Changed in heat: | |
milestone: | pike-3 → pike-rc1 |
Changed in heat: | |
milestone: | pike-rc1 → pike-rc2 |
After moving we ubuntu xenial for the gate jobs, we've started seeing this error for the integration tests.
http:// logs.openstack. org/36/ 394736/ 3/check/ gate-heat- dsvm-functional -convg- mysql-lbaasv2- ubuntu- xenial/ 90f5b5b/ console. html#_2016- 11-18_09_ 51_11_502246
http:// logs.openstack. org/36/ 394736/ 3/check/ gate-heat- dsvm-functional -convg- mysql-lbaasv2- ubuntu- xenial/ 90f5b5b/ logs/screen- h-eng.txt. gz?level= ERROR