Config-download deployment consumes a lot of memeory during the update /etc/hosts task

Bug #1860146 reported by Sai Sindhur Malleni
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Luke Short

Bug Description

Profiling ansible memory consumption during config-download as documented in https://thesaitech.wordpress.com/2019/11/17/profiling-ansible-memory-usage-per-task/ we see that the update /etc/hosts task consumes a lot of memory. We should look at optimizing this.

https://gist.githubusercontent.com/smalleni/b14cac7674784fd3b84b42c99a7d5ee5/raw/3c785813e7dca988ac5c978fe3449dfb9c1ed32f/gistfile1.txt

Revision history for this message
Emilien Macchi (emilienm) wrote :

Sai, I'm wondering if this will help:

https://review.opendev.org/#/c/700451/
https://review.opendev.org/#/c/700453/

Luke did the backports, please synchronize each other on that effort.

Changed in tripleo:
status: New → Triaged
milestone: none → ussuri-3
tags: added: train-backport-potential
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Thank you, that profiling method is cool and provides a great insight.

Some other places to look into:

tripleo-hieradata : Render hieradata from template (0cc47afa-1a12-f1af-b1bb-000000022aaf): 2627.27MB
Run NetworkConfig script (0cc47afa-1a12-f1af-b1bb-000000000096): 1564.31MB
tripleo-hosts-entries : Update /etc/hosts (0cc47afa-1a12-f1af-b1bb-00000002356e): 4058.31MB (this)
tripleo-kernel : Set extra sysctl options (0cc47afa-1a12-f1af-b1bb-0000000252b3): 1425.06MB
make sure libvirt services are disabled and masked (0cc47afa-1a12-f1af-b1bb-0000000001e5): 1177.00MB
tuned : Enable tuned profile (0cc47afa-1a12-f1af-b1bb-00000002ecad): 1021.14MB
Write container config scripts (0cc47afa-1a12-f1af-b1bb-00000003a917): 1553.98MB
Write per-step container startup configs (0cc47afa-1a12-f1af-b1bb-00000003a91b): 1720.14MB
Write kolla config json files (0cc47afa-1a12-f1af-b1bb-00000003a91f): 2039.88MB
Wait for container-puppet tasks (generate config) to finish (0cc47afa-1a12-f1af-b1bb-00000003c3fc): 2073.91MB

Changed in tripleo:
importance: Undecided → Medium
tags: added: tech-debt
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

And patches from https://review.opendev.org/#/q/topic:scale-and-performance topic may have addressed some of those

Revision history for this message
Sai Sindhur Malleni (smalleni) wrote :

I can confirm that this was observed in spite of having those changes locally.

Revision history for this message
Sai Sindhur Malleni (smalleni) wrote :

Another update, when scaling out to 250ish nodes from 200 nodes, we see ansible consume 50G+ of RSS memory.
https://snapshot.raintank.io/dashboard/snapshot/Xujs6L7FAsCM8Kpzc63khcOUi8RBRkbj?orgId=2

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/704070

Changed in tripleo:
assignee: nobody → Luke Short (ekultails)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (master)

Reviewed: https://review.opendev.org/704070
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=dac34d302c336192f9f5fa13d2723b991d78ec12
Submitter: Zuul
Branch: master

commit dac34d302c336192f9f5fa13d2723b991d78ec12
Author: Luke Short <email address hidden>
Date: Thu Jan 23 17:22:22 2020 -0500

    Generate the /etc/hosts content once.

    This resolves a performance penalty of rendering many
    hosts files for large deployments. Since they should
    all match, we now generate the hosts block once and then
    push it out to all of the nodes.

    Change-Id: Iff6db6a520b9ff7fbb737a1ddb69c66cc2008ea7
    Closes-Bug: #1860146
    Signed-off-by: Luke Short <email address hidden>

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/704152

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (stable/train)

Reviewed: https://review.opendev.org/704152
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=213072ca638d166afcf7c674e5f484480b8b0e67
Submitter: Zuul
Branch: stable/train

commit 213072ca638d166afcf7c674e5f484480b8b0e67
Author: Luke Short <email address hidden>
Date: Thu Jan 23 17:22:22 2020 -0500

    Generate the /etc/hosts content once.

    This resolves a performance penalty of rendering many
    hosts files for large deployments. Since they should
    all match, we now generate the hosts block once and then
    push it out to all of the nodes.

    Change-Id: Iff6db6a520b9ff7fbb737a1ddb69c66cc2008ea7
    Closes-Bug: #1860146
    Signed-off-by: Luke Short <email address hidden>
    (cherry picked from commit dac34d302c336192f9f5fa13d2723b991d78ec12)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-ansible 1.1.0

This issue was fixed in the openstack/tripleo-ansible 1.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-ansible 0.5.0

This issue was fixed in the openstack/tripleo-ansible 0.5.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.