standalone upgrade job fails when upgrading mysql container

Bug #1836531 reported by Sagi (Sergey) Shnaidman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Jose Luis Franco

Bug Description

After docker-rm role moved to tripleo-ansible repo and was converged with container-rm role, standalone upgrades job started to fail.

http://logs.openstack.org/76/670276/2/check/tripleo-ci-centos-7-standalone-upgrade/c659cd3/logs/undercloud/home/zuul/standalone_upgrade.log.txt.gz#_2019-07-12_06_12_16

2019-07-12 06:12:15 | 2019-07-12 06:12:15.824 108749 WARNING tripleoclient.v1.tripleo_upgrade.Upgrade [-] TASK [Check and upgrade Mysql database after major version upgrade] ************
2019-07-12 06:12:16 | 2019-07-12 06:12:16.083 108749 WARNING tripleoclient.v1.tripleo_upgrade.Upgrade [-] fatal: [standalone]: FAILED! => {"changed": true, "cmd": ["podman", "exec", "-u", "root", "mysql", "mysql_upgrade"], "delta": "0:00:00.081067", "end": "2019-07-12 06:12:16.059565", "msg": "non-zero return code", "rc": 125, "start": "2019-07-12 06:12:15.978498", "stderr": "Error: unable to exec into mysql: no container with name or ID mysql found: no such container", "stderr_lines": ["Error: unable to exec into mysql: no container with name or ID mysql found: no such container"], "stdout": "", "stdout_lines": []}
2019-07-12 06:12:16 | 2019-07-12 06:12:16.085 108749 WARNING tripleoclient.v1.tripleo_upgrade.Upgrade [-] 
2019-07-12 06:12:16 | 2019-07-12 06:12:16.085 108749 WARNING tripleoclient.v1.tripleo_upgrade.Upgrade [-] NO MORE HOSTS LEFT *************************************************************
2019-07-12 06:12:16 | 2019-07-12 06:12:16.087 108749 WARNING tripleoclient.v1.tripleo_upgrade.Upgrade [-] 
2019-07-12 06:12:16 | 2019-07-12 06:12:16.088 108749 WARNING tripleoclient.v1.tripleo_upgrade.Upgrade [-] PLAY RECAP *********************************************************************
2019-07-12 06:12:16 | 2019-07-12 06:12:16.088 108749 WARNING tripleoclient.v1.tripleo_upgrade.Upgrade [-] standalone : ok=414 changed=162 unreachable=0 failed=1 skipped=155 rescued=0 ignored=0
2019-07-12 06:12:16 | 2019-07-12 06:12:16.088 108749 WARNING tripleoclient.v1.tripleo_upgrade.Upgrade [-] 
2019-07-12 06:12:16 | 2019-07-12 06:12:16.247 108749 ERROR tripleoclient.v1.tripleo_upgrade.Upgrade [-] Exception: Post Upgrade failed: DeploymentError: Post Upgrade failed
2019-07-12 06:12:16 | Traceback (most recent call last):
2019-07-12 06:12:16 | File "/usr/lib/python2.7/site-packages/tripleoclient/v1/tripleo_deploy.py", line 1292, in _standalone_deploy
2019-07-12 06:12:16 | raise exceptions.DeploymentError('Post Upgrade failed')
2019-07-12 06:12:16 | DeploymentError: Post Upgrade failed
2019-07-12 06:12:16 | 2019-07-12 06:12:16.279 108749 ERROR tripleoclient.v1.tripleo_upgrade.Upgrade [-] None: DeploymentError: Post Upgrade failed
2019-07-12 06:12:18 | 2019-07-12 06:12:18.054 108749 ERROR tripleoclient.v1.tripleo_upgrade.Upgrade [-] ** Found ansible errors for standalone deployment! **: DeploymentError: Post Upgrade failed
2019-07-12 06:12:18 | 2019-07-12 06:12:18.055 108749 ERROR tripleoclient.v1.tripleo_upgrade.Upgrade [-] [

In task:

- when: step|int == 1
  import_role:
    name: tripleo-docker-rm
  vars:
    containers_to_rm:
      - mysql
- name: Check and upgrade Mysql database after major version upgrade
  command: "{{ container_cli }} exec -u root mysql mysql_upgrade"
  when: step|int == 2
https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/database/mysql-container-puppet.yaml#L327

Before that containers removing role had condition "container_cli == 'podman'":
https://github.com/openstack/tripleo-common/blob/59f823175c29726b3cca08ea430f4b356528e434/roles/tripleo-docker-rm/tasks/main.yaml#L19

Tripleo container-rm role that supposed to work for both docker and podman didn't have such condition:
https://github.com/openstack/tripleo-common/blob/59f823175c29726b3cca08ea430f4b356528e434/roles/tripleo-container-rm/tasks/docker.yaml#L18

And seems like mysql container is removed before upgrade and it's not found later.

P.S. Now we have only container-rm role in tripleo-ansible repo, where "docker-rm" it's just a link to it.

Tags: ci upgrade
Changed in tripleo:
importance: Undecided → Critical
description: updated
tags: added: ci
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.opendev.org/670796

Changed in tripleo:
assignee: nobody → Jose Luis Franco (jfrancoa)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ansible (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/670971

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ansible (master)

Reviewed: https://review.opendev.org/670971
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=d2d53ab69c23f988b800e0f0faacc683b0b41b61
Submitter: Zuul
Branch: master

commit d2d53ab69c23f988b800e0f0faacc683b0b41b61
Author: Jose Luis Franco Arza <email address hidden>
Date: Tue Jul 16 09:45:58 2019 +0200

    Cover the case when the container engine is not available.

    In the case the container engine passed in tripleo_container_cli
    isn't installed (for example: if we try to remove a docker container
    when the instance has podman installed and docker isn't present) then
    the role would fail. This patch includes a guard which in the case
    the container engine passed in tripleo_container_cli doesn't exist
    then nothing is done.
    Besides, a check for the container being run in the docker-rm part
    was missing, this check is being added too.

    Change-Id: Ib5501321fef1a3921fbb17bcfabfb2f5c7c96c41
    Related-Bug: #1836531

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ansible (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/671698

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ansible (master)

Reviewed: https://review.opendev.org/671698
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=f56915eae7d5122706c9201fe6e7d261272fff07
Submitter: Zuul
Branch: master

commit f56915eae7d5122706c9201fe6e7d261272fff07
Author: Jose Luis Franco Arza <email address hidden>
Date: Fri Jul 19 12:09:40 2019 +0200

    Remove docker_container task from tripleo-container-rm.

    The docker_container Ansible module requires to have installed
    python2-docker in the system, which isn't the case in our current
    CI environments. Therefore, when we try to remove some docker
    container with this role we end up getting an error.

    Turning the docker_container task into two command taks ensures
    that no new problems will occurr due to missing dependencies.

    Change-Id: I8801875ca21b16de9b92d7091b6923447370a36c
    Related-Bug: #1836531

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/670796
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=d1035703b79545fb018e6bdc5b4c0c392c3dcc7a
Submitter: Zuul
Branch: master

commit d1035703b79545fb018e6bdc5b4c0c392c3dcc7a
Author: Jose Luis Franco Arza <email address hidden>
Date: Mon Jul 15 11:01:39 2019 +0200

    Force removal of docker container in tripleo-docker-rm.

    The tripleo-docker-rm role has been replaced by tripleo-container-rm [0].
    This role will identify the docker engine via the container_cli variable
    and perform a deletion of that container. However, these tasks inside the
    post_upgrade_tasks section were thought to remove the old docker containers
    after upgrading from rocky to stein, in which podman starts to be the
    container engine by default.

    For that reason, we need to ensure that the container engine in which the
    containers are removed is docker, as otherwise we will be removing the
    podman container and the deployment steps will fail.

    Closes-Bug: #1836531
    [0] - https://github.com/openstack/tripleo-ansible/commit/2135446a351eb6f6d57d86eca548d583d4c8bfb1

    Depends-On: https://review.opendev.org/#/c/671698/
    Change-Id: Ib139a1d77f71fc32a49c9878d1b4a6d07564e9dc

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.2.0

This issue was fixed in the openstack/tripleo-heat-templates 11.2.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.