upgrade with TLS-e fails due to container creation conflict with redis_tls_proxy

Bug #1931145 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Michele Baldessari

Bug Description

Seen via https://bugzilla.redhat.com/show_bug.cgi?id=1965124

The likely culprit is that we moved redis_tls_proxy from step2 (order 3) to step1 via https://review.opendev.org/c/openstack/tripleo-heat-templates/+/777549:
"""
We also move the redis_tls_proxy from step_2/start_order: 3 to step_1
since it actually makes sense to have it run before we start the
redis pcmk bundle at step 2 (i.e. so the slave replica can work right
away from the start).
"""

Now the change to move it to step1 is actually correct and we want to keep it.
The problem likely stems from the fact that paunch is unable to cope with a container moving from step2 to step1, because during step1 it will try to create one without removing it because it does not have the label "config_id=step2" anymore.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/train)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/795132
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/d580a0880809419e90654dd461a4c66e0640f737
Submitter: "Zuul (22348)"
Branch: stable/train

commit d580a0880809419e90654dd461a4c66e0640f737
Author: Michele Baldessari <email address hidden>
Date: Mon Jun 7 18:14:17 2021 +0200

    [Train-only] Remove redis_tls_proxy if at step2

    When we move redis_tls_proxy from step2 to step1 (which is the correct
    thing to do) via https://review.opendev.org/c/openstack/tripleo-heat-templates/+/777549:
    """
    We also move the redis_tls_proxy from step_2/start_order: 3 to step_1
    since it actually makes sense to have it run before we start the
    redis pcmk bundle at step 2 (i.e. so the slave replica can work right
    away from the start).
    """

    The problem is that paunch by design does not manage containers moving from stepX to
    stepX-1 and so during a minor update, it will create redis_tls_proxy
    at step1 but since the previous container has tripleo_config=step2 label
    it won't detect that it has to remove it and recreate and it will fail
    with:
    podman create --name redis_tls_proxy --label config_id=tripleo_step1 --label container_name=redis_tls_proxy

    ....

    Error: error creating container storage: the container name \"redis_tls_proxy\" is already in use by \"556f965b47b044fa48ef8194f2cf0adaad46e8145f1e4afdb0af4d70977ad561\". You have to remove that container to be able to reuse that name.: that name is already in use\

    Tested by running at 16.1 -> 16.2 update and we correctly get past this
    error and redis is fully up and running.

    We need this only in Train as a fresh deployment or ffu is not affected.

    Closes-Bug: #1931145
    Change-Id: I9bea643511f90167ca7b9f0285c1eaf8211d6e7d

tags: added: in-stable-train
Changed in tripleo:
milestone: xena-1 → xena-2
Changed in tripleo:
milestone: xena-2 → xena-3
Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates train-eol

This issue was fixed in the openstack/tripleo-heat-templates train-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.