Inode of /etc/hosts changes when tripleo_hosts_entries updates the file

Bug #1882290 reported by Damien Ciabrini
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Damien Ciabrini

Bug Description

Prior to using ansible for managing the contents of /etc/hosts, we had an implicit guarantee that the inode of /etc/hosts never changes when the file is updated.

This is particularly since we've containerized our deployment: every container is configured to bind-mount /etc/hosts directly, so as soon as the inode of /etc/hosts changes on the host, the container gets out of sync and cannot see further changes to the file unless it's restarted.

In Ansible however, actions on files are all designed to expose changes atomically, and they all ultimately use core module atomic_move [1]. The latter implements atomic change by using the equivalent of rename() syscall internally, which makes the inode of a file change.

The changing inode breaks the container assumption and impacts various use cases, for example:

  . When replacing a controller node in the control plane, the new controller ip is never seen by running containers. This breaks e.g. galera that cannot connect to the newly added node

  . If for any reason an existing FQDN need to change its IP, no running container will pick it up: for example changing the address of a VIP cannot work without restarting all containers.

[1] https://docs.ansible.com/ansible/latest/reference_appendices/module_utils.html#ansible.module_utils.basic.AnsibleModule.atomic_move

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/733921

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Please do not reduce the priority of this issue, it impacts critically any scale-up/down day2 actions

Changed in tripleo:
importance: Undecided → Critical
tags: added: containers queens-backport-potential train-backport-potential ussuri-backport-potential
tags: removed: queens-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (master)

Reviewed: https://review.opendev.org/733921
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=7761249774b8a6e9bebc60db69e0bc56ad9f7622
Submitter: Zuul
Branch: master

commit 7761249774b8a6e9bebc60db69e0bc56ad9f7622
Author: Damien Ciabrini <email address hidden>
Date: Mon Jun 8 21:30:43 2020 +0200

    Preserve inode when updating /etc/hosts

    Containers configured by tripleo bind-mount /etc/hosts directly,
    which means any change to that file has to preserve the original
    inode, otherwise the containers will get out of sync with the
    host and will not see updates.

    Change tripleo_host_entries to not depend on atomic semantics of
    ansible, which changes inodes on update. Instead, perform a
    non-atomic update to preserve inodes, and rely on the retry
    logics of openstack containers to recover from unexpected
    behaviour in case /etc/hosts is consumed while it is being updated.

    Closes-Bug: #1882290
    Change-Id: I34dd9121bbd650b79cb523e4dbed5949a0e7d52d

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/734651

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/734843

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (stable/ussuri)

Reviewed: https://review.opendev.org/734651
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=c6b06129f81bd3fd61edc261953693da6daf2ca3
Submitter: Zuul
Branch: stable/ussuri

commit c6b06129f81bd3fd61edc261953693da6daf2ca3
Author: Damien Ciabrini <email address hidden>
Date: Mon Jun 8 21:30:43 2020 +0200

    Preserve inode when updating /etc/hosts

    Containers configured by tripleo bind-mount /etc/hosts directly,
    which means any change to that file has to preserve the original
    inode, otherwise the containers will get out of sync with the
    host and will not see updates.

    Change tripleo_host_entries to not depend on atomic semantics of
    ansible, which changes inodes on update. Instead, perform a
    non-atomic update to preserve inodes, and rely on the retry
    logics of openstack containers to recover from unexpected
    behaviour in case /etc/hosts is consumed while it is being updated.

    Closes-Bug: #1882290
    Change-Id: I34dd9121bbd650b79cb523e4dbed5949a0e7d52d
    (cherry picked from commit 7761249774b8a6e9bebc60db69e0bc56ad9f7622)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (stable/train)

Reviewed: https://review.opendev.org/734843
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=34696209fda1a51e6d2f18b6439fd5451d9e59fb
Submitter: Zuul
Branch: stable/train

commit 34696209fda1a51e6d2f18b6439fd5451d9e59fb
Author: Damien Ciabrini <email address hidden>
Date: Mon Jun 8 21:30:43 2020 +0200

    Preserve inode when updating /etc/hosts

    Containers configured by tripleo bind-mount /etc/hosts directly,
    which means any change to that file has to preserve the original
    inode, otherwise the containers will get out of sync with the
    host and will not see updates.

    Change tripleo_host_entries to not depend on atomic semantics of
    ansible, which changes inodes on update. Instead, perform a
    non-atomic update to preserve inodes, and rely on the retry
    logics of openstack containers to recover from unexpected
    behaviour in case /etc/hosts is consumed while it is being updated.

    Closes-Bug: #1882290
    Change-Id: I34dd9121bbd650b79cb523e4dbed5949a0e7d52d
    (cherry picked from commit 7761249774b8a6e9bebc60db69e0bc56ad9f7622)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-ansible 0.6.0

This issue was fixed in the openstack/tripleo-ansible 0.6.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.