overcloud deploy can add undercloud hosts entries that keep growing

Bug #1887165 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Michele Baldessari

Bug Description

First seen via https://bugzilla.redhat.com/show_bug.cgi?id=1842919

1) Undercloud is deployed successfully and has the following in /etc/hosts:
[stack@undercloud-0 ~]$ grep undercloud-0 /etc/hosts
192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane
127.0.0.1 undercloud-0.redhat.local undercloud-0
192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane

Let's leave out for a second that there are two identical lines.

2) User calls overcloud deploy
3) The parameter 'UndercloudHostsEntries' gets populated via https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/overcloud_deploy.py#L100
4) UndercloudHostsEntries gets populated with the following pseudo-code:
getent hosts "$(hostname -s).ctlplane"
Now the output of that command is actually the following:
192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane undercloud-0.ctlplane
192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane undercloud-0.ctlplane

Notice how getent hosts adds another undercloud-0.ctlplane towards the ends.
5) The overcloud deploy uses UndercloudHostsEntries -> undercloud_hosts_entries -> ends up being used to populate /etc/hosts and we keep growing our lines.

As an additional test I manually added another undercloud-0.ctplane entry on all relevant lines in /etc/hosts like the following (so we went from one to two in /etc/hosts):
[stack@undercloud-0 ~]$ grep undercloud-0.ctlplane /etc/hosts
192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane undercloud-0.ctlplane
192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane undercloud-0.ctlplane

Now look at getent hosts:
[stack@undercloud-0 ~]$ getent hosts "$(hostname -s).ctlplane"
192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane undercloud-0.ctlplane undercloud-0.ctlplane undercloud-0.ctlplane
192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane undercloud-0.ctlplane undercloud-0.ctlplane undercloud-0.ctlplane

So two undercloud-0.ctplane entries in /etc/hosts end up being 4 in the output of getent hosts, which is making this thing grow quadratically and explains why we quickly end up hitting the json limit in heat.
Now the reason for getent hosts to keep adding host entries is due to the duplicate line in /etc/hosts.
Here is the proof:
[root@undercloud ~]# echo '1.2.3.4 pippo.localdomain pippo.ctlplane' >> /etc/hosts
[root@undercloud ~]# getent hosts pippo.ctlplane
# Normal
1.2.3.4 pippo.localdomain pippo.ctlplane
[root@undercloud ~]# echo '1.2.3.4 pippo.localdomain pippo.ctlplane' >> /etc/hosts
# Two lines automatically add the hosts
[root@undercloud ~]# getent hosts pippo.ctlplane
1.2.3.4 pippo.localdomain pippo.ctlplane pippo.ctlplane
1.2.3.4 pippo.localdomain pippo.ctlplane pippo.ctlplane

I think a few things need to happen to fix this properly. Namely we need to:
A) in 3) we should prune identical multiple lines
B) in 3) we should also prune identical entries
C) We should ideally also figure out who/what on earth adds 192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane twice to /etc/hosts

Now even though C) is not ideal, we should still be more robust in the face of an /etc/hosts file that has duplicate entries.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (master)

Fix proposed to branch: master
Review: https://review.opendev.org/740457

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
Michele Baldessari (michele) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/740543

Changed in tripleo:
milestone: victoria-1 → victoria-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (master)

Reviewed: https://review.opendev.org/740457
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=2873dd4df62f104bef6a9ce36bdf5385b75b96b5
Submitter: Zuul
Branch: master

commit 2873dd4df62f104bef6a9ce36bdf5385b75b96b5
Author: Michele Baldessari <email address hidden>
Date: Fri Jul 10 15:19:45 2020 +0200

    Cleanup UndercloudHostsEntries

    When we update the UndercloudHostsEnries we basically
    take the output of 'getent hosts "$(hostname -s).ctlplane"'
    and push it into a parameter, so tripleo-ansible can
    make sure it adds the hostentry for the undercloud on
    the whole overcloud.

    The problem is that getent hosts can return multiple
    entries, which will then be injected into the parameter
    and then written into /etc/hosts. This contstantly adds the
    undercloud.ctlplane string and we end up adding it at
    everydeploy making it grow quadratically.

    This eventually leads to a too large json file and the deploys
    start failing with:
        heat.common.exception.RequestLimitExceeded:
          Request limit exceeded: JSON body size (4396634 bytes)
            exceeds maximum allowed size (4194304 bytes).

    Tested this by deploying an environment and then running
    a few redeploys to make sure that the undercloud entries
    in /etc/hosts on the undercloud itself do not grow at
    each redeploy.

    Change-Id: I37d75600825f48be9e15470cacba7b3a0371a3e2
    Closes-Bug: #1887165

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/745888

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (stable/ussuri)

Reviewed: https://review.opendev.org/745888
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=a06c7194c0027a144fce0551870d85f27c6848bd
Submitter: Zuul
Branch: stable/ussuri

commit a06c7194c0027a144fce0551870d85f27c6848bd
Author: Michele Baldessari <email address hidden>
Date: Fri Jul 10 15:19:45 2020 +0200

    Cleanup UndercloudHostsEntries

    When we update the UndercloudHostsEnries we basically
    take the output of 'getent hosts "$(hostname -s).ctlplane"'
    and push it into a parameter, so tripleo-ansible can
    make sure it adds the hostentry for the undercloud on
    the whole overcloud.

    The problem is that getent hosts can return multiple
    entries, which will then be injected into the parameter
    and then written into /etc/hosts. This contstantly adds the
    undercloud.ctlplane string and we end up adding it at
    everydeploy making it grow quadratically.

    This eventually leads to a too large json file and the deploys
    start failing with:
        heat.common.exception.RequestLimitExceeded:
          Request limit exceeded: JSON body size (4396634 bytes)
            exceeds maximum allowed size (4194304 bytes).

    Tested this by deploying an environment and then running
    a few redeploys to make sure that the undercloud entries
    in /etc/hosts on the undercloud itself do not grow at
    each redeploy.

    Change-Id: I37d75600825f48be9e15470cacba7b3a0371a3e2
    Closes-Bug: #1887165
    (cherry picked from commit 2873dd4df62f104bef6a9ce36bdf5385b75b96b5)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (stable/train)

Reviewed: https://review.opendev.org/740543
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=29686a6564b333c8814e081233418228eb234a86
Submitter: Zuul
Branch: stable/train

commit 29686a6564b333c8814e081233418228eb234a86
Author: Michele Baldessari <email address hidden>
Date: Fri Jul 10 15:19:45 2020 +0200

    Cleanup UndercloudHostsEntries

    When we update the UndercloudHostsEnries we basically
    take the output of 'getent hosts "$(hostname -s).ctlplane"'
    and push it into a parameter, so tripleo-ansible can
    make sure it adds the hostentry for the undercloud on
    the whole overcloud.

    The problem is that getent hosts can return multiple
    entries, which will then be injected into the parameter
    and then written into /etc/hosts. This contstantly adds the
    undercloud.ctlplane string and we end up adding it at
    everydeploy making it grow quadratically.

    This eventually leads to a too large json file and the deploys
    start failing with:
        heat.common.exception.RequestLimitExceeded:
          Request limit exceeded: JSON body size (4396634 bytes)
            exceeds maximum allowed size (4194304 bytes).

    Tested this by deploying an environment and then running
    a few redeploys to make sure that the undercloud entries
    in /etc/hosts on the undercloud itself do not grow at
    each redeploy.

    NB: Non 100% clean backport due to different context around imports

    Change-Id: I37d75600825f48be9e15470cacba7b3a0371a3e2
    Closes-Bug: #1887165
    (cherry picked from commit 2873dd4df62f104bef6a9ce36bdf5385b75b96b5)
    (cherry picked from commit a06c7194c0027a144fce0551870d85f27c6848bd)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient 12.4.0

This issue was fixed in the openstack/python-tripleoclient 12.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.