Updated image with "tripleo container image prepare " fails to be installed during update.

Bug #1861498 reported by Sofer Athlan-Guyot
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Michele Baldessari

Bug Description

Hi,

trying an update of train with modified container images systematically ends up in failure.

Environment is 3ctl/3db/2messaging/2cpt/2net, but the topology is irrelevant here.

The container images were updated using:

sudo openstack tripleo container image prepare -e /home/stack/pcmk-workarounds.yaml --output-env-file /home/stack/pcmk-workaround-params.yaml

with pcmk-workarounds.yaml being:

parameter_defaults:
  ContainerImagePrepare:
  - push_destination: true
    set:
      ceph_image: rhceph-4.0-rhel8
      ceph_namespace: docker-registry.upshift.redhat.com/ceph
      ceph_tag: latest
      name_prefix: rhosp16-openstack-
      name_suffix: ''
      namespace: rhos-qe-mirror-tlv.usersys.redhat.com:5002/rh-osbs
      neutron_driver: ovn
      tag: 20200124.1
    includes:
      - rabbitmq
      - haproxy
      - maria
      - redis
      - cinder-volume
      - ovn-northd
      - cinder-backup
    modify_role: tripleo-modify-image
    modify_append_tag: "-hotfix-4"
    modify_vars:
      tasks_from: yum_install.yml
      yum_packages:
        - http://file.rdu.redhat.com/~mbaldess/bz1776608/pacemaker-2.0.3-4.el8.1.x86_64.rpm
        - http://file.rdu.redhat.com/~mbaldess/bz1776608/pacemaker-cli-2.0.3-4.el8.1.x86_64.rpm
        - http://file.rdu.redhat.com/~mbaldess/bz1776608/pacemaker-cluster-libs-2.0.3-4.el8.1.x86_64.rpm
        - http://file.rdu.redhat.com/~mbaldess/bz1776608/pacemaker-libs-2.0.3-4.el8.1.x86_64.rpm
        - http://file.rdu.redhat.com/~mbaldess/bz1776608/pacemaker-remote-2.0.3-4.el8.1.x86_64.rpm
        - http://file.rdu.redhat.com/~mbaldess/bz1776608/pacemaker-schemas-2.0.3-4.el8.1.noarch.rpm
      yum_repos_dir_path: /etc/yum.repos.d

Systematically ends up in failure during update:

TASK [Remove previous galera images] *******************************************
Friday 31 January 2020 12:06:49 +0000 (0:00:03.085) 0:09:29.253 ********
fatal: [database-1]: FAILED! => {"changed": true, "cmd": "podman rmi -f 209e952aa6cb3c212e57e5f81693eb4776c0c4b6cf96fb4faabdaa7403b2a94d", "delta": "0:00:00.084905", "end": "2020-01-31 12:06:49.825135", "msg": "no
n-zero return code", "rc": 2, "start": "2020-01-31 12:06:49.740230", "stderr": "Error: unable to delete \"209e952aa6cb3c212e57e5f81693eb4776c0c4b6cf96fb4faabdaa7403b2a94d\" (cannot be forced) - image has dependent
 child images", "stderr_lines": ["Error: unable to delete \"209e952aa6cb3c212e57e5f81693eb4776c0c4b6cf96fb4faabdaa7403b2a94d\" (cannot be forced) - image has dependent child images"], "stdout": "", "stdout_lines":
 []}

The particular pacemaker related that fails first may change.

I put that as critical because it indicates that it's not possible to deliver hotfixes in container using that mechanism.

Changed in tripleo:
assignee: nobody → Michele Baldessari (michele)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.opendev.org/705270

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/705271

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/705270
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=9a830255b73ec42edbdc8f7c417ebddc3ba364e4
Submitter: Zuul
Branch: master

commit 9a830255b73ec42edbdc8f7c417ebddc3ba364e4
Author: Michele Baldessari <email address hidden>
Date: Fri Jan 31 19:04:16 2020 +0100

    Remove all the "container_cli rmi -f" from HA containers

    Back in the days we had added the rmi -f container calls in order
    to try and clean up any old unused container images whenever we updated
    any HA container. Nowadays this already happens via the
    tripleo_ansible/tripleo_podman/purge role which prunes any unused
    container image.

    There is no point in keeping this code around since we already purge
    images as a post upgrade/update task. We want to remove this code also
    because it fails horribly when we update the HA containers with an image
    that is based off the previously deployed image. In fact that fails
    with:
    TASK [Remove previous galera images] *******************************************
    Friday 31 January 2020 10:34:40 +0000 (0:00:02.684) 0:02:56.021 ********
    fatal: [database-0]: FAILED! => {"changed": true, "cmd": "podman rmi -f 209e952aa6cb3c212e57e5f81693eb4776c0c4b6cf96fb4faabdaa7403b2a94d", "delta": "0:00:00.110460", "end": "2020-01-31 10:34:40.772522", "msg": "non-zero return code", "rc": 2, "start": "2020-01-31 10:34:40.662062", "stderr": "Error: unable to delete \"209e952aa6cb3c212e57e5f81693eb4776c0c4b6cf96fb4faabdaa7403b2a94d\" (cannot be forced) - image has dependent child images", "stderr_lines": ["Error: unable to delete \"209e952aa6cb3c212e57e5f81693eb4776c0c4b6cf96fb4faabdaa7403b2a94d\" (cannot be forced) - image has dependent child images"], "stdout": "", "stdout_lines": []}

    This is particularly important because any hotfix container
    generated with tripleo-modify-image role will be affected by this issue.

    We tested this by doing the following:
    1) Deploying an overcloud
    2) Patching all HA containers with tripleo-modify-image
    3) Running an update

    With this change the update did not fail any longer and the correct
    images were being used by pacemaker after the update process.

    Co-Authored-By: Sofer Athlan-Guyot <email address hidden>

    Change-Id: I5346b32962b8cee5c64e4f07c0b68e2512085e83
    Closes-Bug: #1861498

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/705271
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=73bb3149fba6125fa99832af42fe1f873c9f461e
Submitter: Zuul
Branch: stable/train

commit 73bb3149fba6125fa99832af42fe1f873c9f461e
Author: Michele Baldessari <email address hidden>
Date: Fri Jan 31 19:04:16 2020 +0100

    Remove all the "container_cli rmi -f" from HA containers

    Back in the days we had added the rmi -f container calls in order
    to try and clean up any old unused container images whenever we updated
    any HA container. Nowadays this already happens via the
    tripleo_ansible/tripleo_podman/purge role which prunes any unused
    container image.

    There is no point in keeping this code around since we already purge
    images as a post upgrade/update task. We want to remove this code also
    because it fails horribly when we update the HA containers with an image
    that is based off the previously deployed image. In fact that fails
    with:
    TASK [Remove previous galera images] *******************************************
    Friday 31 January 2020 10:34:40 +0000 (0:00:02.684) 0:02:56.021 ********
    fatal: [database-0]: FAILED! => {"changed": true, "cmd": "podman rmi -f 209e952aa6cb3c212e57e5f81693eb4776c0c4b6cf96fb4faabdaa7403b2a94d", "delta": "0:00:00.110460", "end": "2020-01-31 10:34:40.772522", "msg": "non-zero return code", "rc": 2, "start": "2020-01-31 10:34:40.662062", "stderr": "Error: unable to delete \"209e952aa6cb3c212e57e5f81693eb4776c0c4b6cf96fb4faabdaa7403b2a94d\" (cannot be forced) - image has dependent child images", "stderr_lines": ["Error: unable to delete \"209e952aa6cb3c212e57e5f81693eb4776c0c4b6cf96fb4faabdaa7403b2a94d\" (cannot be forced) - image has dependent child images"], "stdout": "", "stdout_lines": []}

    This is particularly important because any hotfix container
    generated with tripleo-modify-image role will be affected by this issue.

    We tested this by doing the following:
    1) Deploying an overcloud
    2) Patching all HA containers with tripleo-modify-image
    3) Running an update

    With this change the update did not fail any longer and the correct
    images were being used by pacemaker after the update process.

    Co-Authored-By: Sofer Athlan-Guyot <email address hidden>

    Change-Id: I5346b32962b8cee5c64e4f07c0b68e2512085e83
    Closes-Bug: #1861498
    (cherry picked from commit 9a830255b73ec42edbdc8f7c417ebddc3ba364e4)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 12.1.0

This issue was fixed in the openstack/tripleo-heat-templates 12.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.4.0

This issue was fixed in the openstack/tripleo-heat-templates 11.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.