large amount of memory utilization slows down ansible execution as part of the deployment

Bug #1915761 reported by Alex Schultz on 2021-02-15
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
High
Alex Schultz

Bug Description

When running ansible against a large number of hosts and you gather all the facts, the task execution of the deployment will slow down. The more memory used, the slower the deployment taks. This can be seen even with the most basic of playbooks:

- hosts: all:!compute-d-057:!compute-d-028
  gather_facts: false
  name: clear cached facts
  tasks:
    - meta: clear_facts

- hosts: all:!compute-d-057:!compute-d-028
  name: gather facts
  gather_facts: false
  any_errors_fatal: true
  become: false
  tasks:
    - setup:
        gather_subset:
           - '!all'
           - 'min'

- hosts: all:!compute-d-057:!compute-d-028
  gather_facts: false
  tasks:
    - name: Sleep task 1
      shell: sleep 1

In order to reduce this impact in TripleO, we should gather more information than required to perform the actions. Additionally we should not use any of the package or service fact methods. A compute node that has a running workload can have network facts that exceed 1 MB for a single host. When you multiple this by 200 compute nodes, the ansible usage just for facts in memory would be 200M. The same set of hosts with only collecting !all,min results in ~2MB in facts for all hosts instead.

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
Rabi Mishra (rabi) wrote :

> Additionally we should not use any of the package or service fact methods.

Are we saying that any kind of fact gathering should be avoided (or just service/package facts), as it's multiplied by number of nodes and adds to ansible controller memory usage (~32GB RAM is not enough)? I can't see anything in service_facts[1] module that can lead to high memory usage though.

[1] https://github.com/ansible/ansible/blob/devel/lib/ansible/modules/service_facts.py

Revision history for this message
Alex Schultz (alex-schultz) wrote :

Yes. But the problem is not the amount of memory total but rather once you exceed like 1G, ansible comes to a crawl. I'm still looking into the details but I believe it's related to fork() and python's GIL as it relates to volume of large memory objects that must be copied when a task is spawned. The reality is that our code doesn't use a fraction of the facts that we gather and things like package facts and service facts pull in a bunch of information about unrelated items which causes more problems than simply doing a targeted check

Revision history for this message
Alex Schultz (alex-schultz) wrote :

How to find vars:
grep ansible_ * -r | egrep -v "(facts|become|user|host|limit|check_mode|python|ssh|connection|min_|managed|ceph_ansible|verbos|hieradata|job|_home|_log|loop|async|limit|filter|_version|calling_|_inventory|\.py|group_vars|diff_mode)" | grep ansible_

bulk fix vars:
# THT
for DIR in common deployment; do
# tripleo-ansible
#for DIR in tripleo_ansible/roles; do
  find $DIR -type f | xargs sed -i \
    -e "s/ansible_hostname\([\ |\.]\)/ansible_facts['hostname']\1/g" \
    -e "s/ansible_fqdn\([\ |\.]\)/ansible_facts['fqdn']\1/g" \
    -e "s/ansible_nodename\([\ |\.]\)/ansible_facts['nodename']\1/g" \
    -e "s/ansible_distribution\([\ |\.]\)/ansible_facts['distribution']\1/g" \
    -e "s/ansible_distribution_version\([\ |\.]\)/ansible_facts['distribution_version']\1/g" \
    -e "s/ansible_distribution_major_version\([\ |\.]\)/ansible_facts['distribution_major_version']\1/g" \
    -e "s/ansible_os_family\([\ |\.]\)/ansible_facts['os_family']\1/g" \
    -e "s/ansible_devices\([\ |\.]\)/ansible_facts['devices']\1/g"
done

Changed in tripleo:
milestone: wallaby-3 → wallaby-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ansible (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ansible/+/786159

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/786219

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/786220

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/785918
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/e1998a8e58b68c67979d30e4213c772169624484
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit e1998a8e58b68c67979d30e4213c772169624484
Author: Dave Wilde (d34dh0r53) <email address hidden>
Date: Mon Apr 12 10:38:17 2021 -0500

    Ensure ansible_fqdn is set

    The ipaclient ansible role requires that ansible_fqdn is defined but
    due to [1] we don't have ansible_fqdn inside of ansible_facts. This
    uses the 'fqdn' ansible fact for ansible_fqdn which is equivalent.

    [1]: https://opendev.org/openstack/tripleo-heat-templates/commit/4e79336d69e
    6b7fa4b026922bac7953bafeee96d

    Related-Bug: 1915761
    Closes-Bug: 1923248
    Change-Id: I0a740e86588c96fff24fa09698c35e492d1c64db

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ansible (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-ansible/+/786159
Committed: https://opendev.org/openstack/tripleo-ansible/commit/30df6f54375bb3ffa0cdd40ca219905b2fa92cf5
Submitter: "Zuul (22348)"
Branch: master

commit 30df6f54375bb3ffa0cdd40ca219905b2fa92cf5
Author: yatinkarel <email address hidden>
Date: Wed Apr 14 10:29:42 2021 +0530

    Use ansible_facts in tripleo_ssh_known_hosts role

    It was missed in original patch to switch to
    ansible_facts[1].

    For providing test value for 'ssh_host_key_rsa_public' overriding
    it in 'ansible_facts' with set_fact.

    [1] https://review.opendev.org/c/openstack/tripleo-ansible/+/776666

    Closes-Bug: #1923403
    Related-Bug: #1915761
    Change-Id: I9d5ae576c57eefe5496f9dda71e8eac23e45e89f

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/786321

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ansible (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/tripleo-ansible/+/786233

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ansible (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/tripleo-ansible/+/786234

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ansible (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/tripleo-ansible/+/786355

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ansible (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/tripleo-ansible/+/786233
Committed: https://opendev.org/openstack/tripleo-ansible/commit/c582b6584e78922403e296701c0ade252af7a8fe
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit c582b6584e78922403e296701c0ade252af7a8fe
Author: yatinkarel <email address hidden>
Date: Wed Apr 14 10:29:42 2021 +0530

    Use ansible_facts in tripleo_ssh_known_hosts role

    It was missed in original patch to switch to
    ansible_facts[1].

    For providing test value for 'ssh_host_key_rsa_public' overriding
    it in 'ansible_facts' with set_fact.

    [1] https://review.opendev.org/c/openstack/tripleo-ansible/+/776666

    Closes-Bug: #1923403
    Related-Bug: #1915761
    Change-Id: I9d5ae576c57eefe5496f9dda71e8eac23e45e89f
    (cherry picked from commit 30df6f54375bb3ffa0cdd40ca219905b2fa92cf5)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ansible (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/tripleo-ansible/+/786234
Committed: https://opendev.org/openstack/tripleo-ansible/commit/b52dc1e7fd69fdc360f1281c53ac92abb7dddfee
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit b52dc1e7fd69fdc360f1281c53ac92abb7dddfee
Author: yatinkarel <email address hidden>
Date: Wed Apr 14 10:29:42 2021 +0530

    Use ansible_facts in tripleo_ssh_known_hosts role

    It was missed in original patch to switch to
    ansible_facts[1].

    For providing test value for 'ssh_host_key_rsa_public' overriding
    it in 'ansible_facts' with set_fact.

    [1] https://review.opendev.org/c/openstack/tripleo-ansible/+/776666

    Closes-Bug: #1923403
    Related-Bug: #1915761
    Change-Id: I9d5ae576c57eefe5496f9dda71e8eac23e45e89f
    (cherry picked from commit 30df6f54375bb3ffa0cdd40ca219905b2fa92cf5)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ansible (stable/train)

Reviewed: https://review.opendev.org/c/openstack/tripleo-ansible/+/786355
Committed: https://opendev.org/openstack/tripleo-ansible/commit/a29918a4aff5802f8c86d8f79b3846da755c4aa2
Submitter: "Zuul (22348)"
Branch: stable/train

commit a29918a4aff5802f8c86d8f79b3846da755c4aa2
Author: yatinkarel <email address hidden>
Date: Wed Apr 14 10:29:42 2021 +0530

    Use ansible_facts in tripleo_ssh_known_hosts role

    It was missed in original patch to switch to
    ansible_facts[1].

    For providing test value for 'ssh_host_key_rsa_public' overriding
    it in 'ansible_facts' with set_fact.

    [1] https://review.opendev.org/c/openstack/tripleo-ansible/+/776666

    Closes-Bug: #1923403
    Related-Bug: #1915761
    Change-Id: I9d5ae576c57eefe5496f9dda71e8eac23e45e89f
    (cherry picked from commit 30df6f54375bb3ffa0cdd40ca219905b2fa92cf5)
    (cherry picked from commit b52dc1e7fd69fdc360f1281c53ac92abb7dddfee)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/786219
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/1325566bad778b0d2d174833698ff41f46524948
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 1325566bad778b0d2d174833698ff41f46524948
Author: Dave Wilde (d34dh0r53) <email address hidden>
Date: Mon Apr 12 10:38:17 2021 -0500

    Ensure ansible_fqdn is set

    The ipaclient ansible role requires that ansible_fqdn is defined but
    due to [1] we don't have ansible_fqdn inside of ansible_facts. This
    uses the 'fqdn' ansible fact for ansible_fqdn which is equivalent.

    [1]: https://opendev.org/openstack/tripleo-heat-templates/commit/4e79336d69e
    6b7fa4b026922bac7953bafeee96d

    Related-Bug: 1915761
    Closes-Bug: 1923248
    Change-Id: I0a740e86588c96fff24fa09698c35e492d1c64db

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/786220
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/f24840a56133f8862a521391c1cf14b1f29a0211
Submitter: "Zuul (22348)"
Branch: stable/train

commit f24840a56133f8862a521391c1cf14b1f29a0211
Author: Dave Wilde (d34dh0r53) <email address hidden>
Date: Mon Apr 12 10:38:17 2021 -0500

    Ensure ansible_fqdn is set

    The ipaclient ansible role requires that ansible_fqdn is defined but
    due to [1] we don't have ansible_fqdn inside of ansible_facts. This
    uses the 'fqdn' ansible fact for ansible_fqdn which is equivalent.

    [1]: https://opendev.org/openstack/tripleo-heat-templates/commit/4e79336d69e
    6b7fa4b026922bac7953bafeee96d

    Related-Bug: 1915761
    Closes-Bug: 1923248
    Change-Id: I0a740e86588c96fff24fa09698c35e492d1c64db

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/786321
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/0de9ea84ff7a144c89c5a6cee0c68dabb571583a
Submitter: "Zuul (22348)"
Branch: stable/train

commit 0de9ea84ff7a144c89c5a6cee0c68dabb571583a
Author: Dave Wilde (d34dh0r53) <email address hidden>
Date: Wed Apr 14 15:21:21 2021 -0500

    [Train Only] Ensure novajoin code is setting ansible_fqdn

    Novajoin tls-e code is still available in train and is affected by
    the same removal of the fact passing as [1]. This uses
    ansible_facts['fqdn'] as the ansible_fqdn for the tripleo-ipa
    registration play.

    [1]: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/785909

    Related-Bug: 1915761
    Related-Bug: 1923248
    Change-Id: I5b1514b4ba9bb22bcef63e74b0400cd9332516ca

Changed in tripleo:
milestone: wallaby-rc1 → xena-1
Changed in tripleo:
milestone: xena-1 → xena-2
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers