Config-download deployment consumes a lot of memeory during the update /etc/hosts task

Bug #1860146 reported by Sai Sindhur Malleni on 2020-01-17
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Medium
Luke Short

Bug Description

Profiling ansible memory consumption during config-download as documented in https://thesaitech.wordpress.com/2019/11/17/profiling-ansible-memory-usage-per-task/ we see that the update /etc/hosts task consumes a lot of memory. We should look at optimizing this.

https://gist.githubusercontent.com/smalleni/b14cac7674784fd3b84b42c99a7d5ee5/raw/3c785813e7dca988ac5c978fe3449dfb9c1ed32f/gistfile1.txt

Emilien Macchi (emilienm) wrote :

Sai, I'm wondering if this will help:

https://review.opendev.org/#/c/700451/
https://review.opendev.org/#/c/700453/

Luke did the backports, please synchronize each other on that effort.

Changed in tripleo:
status: New → Triaged
milestone: none → ussuri-3
tags: added: train-backport-potential
Bogdan Dobrelya (bogdando) wrote :

Thank you, that profiling method is cool and provides a great insight.

Some other places to look into:

tripleo-hieradata : Render hieradata from template (0cc47afa-1a12-f1af-b1bb-000000022aaf): 2627.27MB
Run NetworkConfig script (0cc47afa-1a12-f1af-b1bb-000000000096): 1564.31MB
tripleo-hosts-entries : Update /etc/hosts (0cc47afa-1a12-f1af-b1bb-00000002356e): 4058.31MB (this)
tripleo-kernel : Set extra sysctl options (0cc47afa-1a12-f1af-b1bb-0000000252b3): 1425.06MB
make sure libvirt services are disabled and masked (0cc47afa-1a12-f1af-b1bb-0000000001e5): 1177.00MB
tuned : Enable tuned profile (0cc47afa-1a12-f1af-b1bb-00000002ecad): 1021.14MB
Write container config scripts (0cc47afa-1a12-f1af-b1bb-00000003a917): 1553.98MB
Write per-step container startup configs (0cc47afa-1a12-f1af-b1bb-00000003a91b): 1720.14MB
Write kolla config json files (0cc47afa-1a12-f1af-b1bb-00000003a91f): 2039.88MB
Wait for container-puppet tasks (generate config) to finish (0cc47afa-1a12-f1af-b1bb-00000003c3fc): 2073.91MB

Changed in tripleo:
importance: Undecided → Medium
tags: added: tech-debt
Bogdan Dobrelya (bogdando) wrote :

And patches from https://review.opendev.org/#/q/topic:scale-and-performance topic may have addressed some of those

Sai Sindhur Malleni (smalleni) wrote :

I can confirm that this was observed in spite of having those changes locally.

Sai Sindhur Malleni (smalleni) wrote :

Another update, when scaling out to 250ish nodes from 200 nodes, we see ansible consume 50G+ of RSS memory.
https://snapshot.raintank.io/dashboard/snapshot/Xujs6L7FAsCM8Kpzc63khcOUi8RBRkbj?orgId=2

Fix proposed to branch: master
Review: https://review.opendev.org/704070

Changed in tripleo:
assignee: nobody → Luke Short (ekultails)
status: Triaged → In Progress

Reviewed: https://review.opendev.org/704070
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=dac34d302c336192f9f5fa13d2723b991d78ec12
Submitter: Zuul
Branch: master

commit dac34d302c336192f9f5fa13d2723b991d78ec12
Author: Luke Short <email address hidden>
Date: Thu Jan 23 17:22:22 2020 -0500

    Generate the /etc/hosts content once.

    This resolves a performance penalty of rendering many
    hosts files for large deployments. Since they should
    all match, we now generate the hosts block once and then
    push it out to all of the nodes.

    Change-Id: Iff6db6a520b9ff7fbb737a1ddb69c66cc2008ea7
    Closes-Bug: #1860146
    Signed-off-by: Luke Short <email address hidden>

Changed in tripleo:
status: In Progress → Fix Released

Reviewed: https://review.opendev.org/704152
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=213072ca638d166afcf7c674e5f484480b8b0e67
Submitter: Zuul
Branch: stable/train

commit 213072ca638d166afcf7c674e5f484480b8b0e67
Author: Luke Short <email address hidden>
Date: Thu Jan 23 17:22:22 2020 -0500

    Generate the /etc/hosts content once.

    This resolves a performance penalty of rendering many
    hosts files for large deployments. Since they should
    all match, we now generate the hosts block once and then
    push it out to all of the nodes.

    Change-Id: Iff6db6a520b9ff7fbb737a1ddb69c66cc2008ea7
    Closes-Bug: #1860146
    Signed-off-by: Luke Short <email address hidden>
    (cherry picked from commit dac34d302c336192f9f5fa13d2723b991d78ec12)

tags: added: in-stable-train

This issue was fixed in the openstack/tripleo-ansible 1.1.0 release.

This issue was fixed in the openstack/tripleo-ansible 0.5.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers