train containerized-undercloud-upgrades failing with "error creating container the container name is already in use "

Bug #1853812 reported by Marios Andreou on 2019-11-25
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Critical
Carlos Camacho

Bug Description

after the fix for https://bugs.launchpad.net/tripleo/+bug/1853183 the containerized undecloud upgrade is failing at [1] after the upgrade tasks are complete and during the deploy tasks with trace like:

        2019-11-23 18:17:49 | PLAY [Upgrade tasks for step 5]
...
        2019-11-23 18:17:49 | PLAY RECAP *********************************************************************
2019-11-23 18:17:49 | undercloud : ok=33 changed=18 unreachable=0 failed=0 skipped=22 rescued=0 ignored=2
2019-11-23 18:17:49 |
        2019-11-23 18:17:49 | ** Running ansible deploy tasks **
...
        2019-11-23 18:28:08 | "Error: error creating container storage: the container name \"rabbitmq_init_logs\" is already in use by \"5e03ad52e1b683e28a48ed471c958881af7b918245cca43f9659faa52e62589f\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-11-23 18:28:08 | "Error: error creating container storage: the container name \"mysql_init_logs\" is already in use by \"f1849d2ae3c95297fed6e9ce20d37e887595efb6abf699ab33698f394de7f370\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-11-23 18:28:08 | "Error: error creating container storage: the container name \"memcached\" is already in use by \"ec1ebc6b0f807bbb8951d5d906074ee3e95a8e66d2049b6814b0084e93bf19ba\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-11-23 18:28:08 | "Error: error creating container storage: the container name \"keepalived\" is already in use by \"fd55688e8d00b42352c9841d393895597f8c151e1be7a0be835db9781eb02262\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-11-23 18:28:08 | "Error: error creating container storage: the container name \"haproxy\" is already in use by \"4cfb530f920c0045120b3927f56d059b3bbde153e37e8a9d91923c52e0944f79\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-11-23 18:28:08 | "Error: error creating container storage: the container name \"rabbitmq_bootstrap\" is already in use by \"005039b296ffebaf34db0a7a637d1a12aa5dc2d11bfaa4cd9340e4cc23696cfd\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-11-23 18:28:08 | "Error: error creating container storage: the container name \"rabbitmq\" is already in use by \"558d19eec782dbfde3eb5a9858a87f90ada4bb8cb21ca3fdc6fe821dabdda8d2\". You have to remove that container to be able to reuse that name.: that name is already in use"

[1] https://f675568a69743b4b6142-5f4fcc390c732de527a6558cf26ba7f3.ssl.cf5.rackcdn.com/695803/1/check/tripleo-ci-centos-7-containerized-undercloud-upgrades/a0f55a7/logs/undercloud/home/zuul/undercloud_upgrade.log.txt.gz

Changed in tripleo:
milestone: none → ussuri-2
milestone: ussuri-2 → ussuri-1
tags: added: idempotency upgrade
Bogdan Dobrelya (bogdando) wrote :

I can see no log records made by paunch about "Renaming" or "Deleting" a container. That means neither ephemeral names was involved, nor there was triggers, like config data changes, identified by paunch. So it hadn't recreate the containers reported as already existing, and just failed

Jose Luis Franco (jfrancoa) wrote :

So, we did observe such an error during the OSP15 hackfest, but that time was during the overcloud upgrade. However, it is worth mentioning that the Undercloud's upgrade job has been broken for the last three releases. When upgrading we do not update the registry in the containers-prepare-paraemters.yaml, therefore we perform the upgrade to the N+1 release with N+1 t-h-t/t-c/t-v packages but N version containers.
Carlos, Mathieu and I are trying to change the job to perform the Undercloud upgrade as is expected.

Bogdan Dobrelya (bogdando) wrote :

Here is that paunch logs:

2019-11-21 20:29:56.310 570111 WARNING paunch [ ] Did not find container with "['podman', 'ps', '-a', '--filter', 'label=container_name=rabbitmq', '--filter', "label=config_id=['tripleo_step1']", '--format', '{{.Names}}']" - retrying without config_id

While the correct command should be:

[zuul@undercloud ~]$ sudo podman ps -a --filter label=container_name=rabbitmq --filter label=config_id=tripleo_step1
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
08a3bb987333 docker.io/tripleostein/centos-binary-rabbitmq:current-tripleo dumb-init --singl... 6 hours ago Up 6 hours ago rabbitmq
[zuul@undercloud ~]$ less /var/log/paunch.log

[3]+ Stopped less /var/log/paunch.log
[zuul@undercloud ~]$ sudo podman ps -a --filter label=container_name=rabbitmq --filter label=config_id=['tripleo_step1']
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

Changed in tripleo:
assignee: nobody → Bogdan Dobrelya (bogdando)
Marios Andreou (marios-b) wrote :

added alert and promotion-blocker just now during the CIX call so we can get a card. we want one even if this isn't in promotion blocker

tags: added: alert
tags: added: promotion-blocker
Bogdan Dobrelya (bogdando) wrote :

Raised to critical as this affects all the logic in paunch, which has yet support for the lists passed as config-id

Changed in tripleo:
importance: High → Critical
Marios Andreou (marios-b) wrote :

o/ folks thanks for working here I am trying to understand what we need.
I just checked matbu patch https://review.opendev.org/#/c/695889/ but the job is still red there.

So do we need also https://review.opendev.org/#/c/695929/ and the revert https://review.opendev.org/#/c/695904/1 i.e. all of the above?

Or still not clear?

Marios Andreou (marios-b) wrote :

13:46 < bogdando> marios|ruck: https://review.opendev.org/#/c/695929/ replaces revert

Changed in tripleo:
assignee: Bogdan Dobrelya (bogdando) → mathieu bultel (mat-bultel)
status: Triaged → In Progress
Bogdan Dobrelya (bogdando) wrote :

The root cause is paunch CLI does not honor the passed --managed-by arguments, while when invoked via ansible playbook (as of train) - it does. For the latter case, it discovers the container and deletes it, so the deployment continues normally. For the former case, it cannot find the container because of mismatch in its lables=managed-by. It is not clear also, why inspecting the container under test also shows the wrong managed-by=paunch value, while we expect it to be tripleo-Undercloud, for example.

Here is the reproduce scenario:

- hosts: Undercloud
  tasks:
 - name: Start containers for step 1 using paunch
   environment:
  TRIPLEO_MINOR_UPDATE: false
   paunch:
  config: /var/lib/tripleo-config/hashed-container-startup-config-step_1.json
  config_id: tripleo_step1
  action: apply
  container_cli: podman
  container_log_stdout_path: /tmp/out
  healthcheck_disabled: true
  managed_by: "tripleo-Undercloud"
  debug: True

ansible-playbook -v -b -i undercloud-ansible-DFoHef/inventory.yaml test.yaml

log:
$ podman ps -a --filter label=managed_by=tripleo-Undercloud --filter label=config_id=tripleo_step1 --format .Names .Labels.container_name

sudo paunch --log-file /tmp/foo --debug apply --file /var/lib/tripleo-config/container-startup-config-step_1.json --container rabbitmq --config-id=tripleo_step1 --managed-by=tripleo_Undercloud

log:
$ podman ps -a --filter label=managed_by=paunch --filter label=config_id=tripleo_step1 --format .Names .Labels.container_name
rabbitmq rabbitmq

Changed in tripleo:
assignee: mathieu bultel (mat-bultel) → Bogdan Dobrelya (bogdando)
mathieu bultel (mat-bultel) wrote :

Hey Bogdan,

So yep, during my tests I discovered that all the container is managed by Paunch.
That's why I tested with this review:
https://review.opendev.org/#/c/695848/

I'm retesting commpletly now on a new env.

The weird thing is that:
1/ with the ansible module, I reproduced the bug
2
/ with a python script that calls Paunch apply in the exact same way than Ansible module, I passed successfully (with this review at least: https://review.opendev.org/#/c/695889/)

mathieu bultel (mat-bultel) wrote :

So according to my test, Train to Master is working with:
https://review.opendev.org/#/c/695921/
https://review.opendev.org/#/c/695889/

I'm will try stein to train now.

Change abandoned by Bogdan Dobrelya (bogdando) (<email address hidden>) on branch: stable/train
Review: https://review.opendev.org/695904
Reason: no need to revert, we have a fix https://review.opendev.org/#/c/696589

Fix proposed to branch: master
Review: https://review.opendev.org/696678

Changed in tripleo:
assignee: Bogdan Dobrelya (bogdando) → Carlos Camacho (ccamacho)

Change abandoned by Carlos Camacho (<email address hidden>) on branch: master
Review: https://review.opendev.org/696678

Change abandoned by Carlos Camacho (<email address hidden>) on branch: stable/train
Review: https://review.opendev.org/696304

Changed in tripleo:
assignee: Carlos Camacho (ccamacho) → Bogdan Dobrelya (bogdando)

Reviewed: https://review.opendev.org/695921
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=e88785f124ffd255bb191f8489ece977625a9ab2
Submitter: Zuul
Branch: master

commit e88785f124ffd255bb191f8489ece977625a9ab2
Author: Bogdan Dobrelya <email address hidden>
Date: Mon Nov 25 15:47:46 2019 +0100

    Cast single value list as a string for config id

    When passing containers config-id into the paunch module,
    convert it into a single value (string) for backwards compatibility
    of paunch, and unlocking upgrade paths as well.

    Paunch as a library only "understands" such single values for
    config ids yet. This can be fixed later although.

    Related-bug: #1853812

    Change-Id: Id8985795fc8fac5a10466486d404799e9c65cc65
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Reviewed: https://review.opendev.org/695929
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=4a039e6a812b830cb77f59713befc54a2bc6fb51
Submitter: Zuul
Branch: stable/train

commit 4a039e6a812b830cb77f59713befc54a2bc6fb51
Author: Bogdan Dobrelya <email address hidden>
Date: Mon Nov 25 15:47:46 2019 +0100

    Cast single value list as a string for config id

    When passing containers config-id into the paunch module,
    convert it into a single value (string) for backwards compatibility
    of paunch, and unlocking upgrade paths as well.

    Paunch as a library only "understands" such single values for
    config ids yet. This can be fixed later although.

    Related-bug: #1853812

    Change-Id: Id8985795fc8fac5a10466486d404799e9c65cc65
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 5bc81a826d11a4f6b1e8bea778f89257d6cbbab7)

tags: added: in-stable-train

Reviewed: https://review.opendev.org/696589
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=12ccf36b0e7f9b233b8b71a21fe24b241ecb639b
Submitter: Zuul
Branch: master

commit 12ccf36b0e7f9b233b8b71a21fe24b241ecb639b
Author: Bogdan Dobrelya <email address hidden>
Date: Thu Nov 28 15:31:55 2019 +0100

    Fix action Apply ignoring managed-by arg

    From the very beginning (06036fd6dba3ba4b9a13052ea95a08ebcc97501e), the
    action apply was ignoring the passed --managed-by values and was always
    taking defaults ('paunch').

    Fix this and provide no upgrade/update impact, which is for whatever
    --managed-by value given to paunch, perform all checks and searches
    also for that historically "wrong" value 'paunch':

    * if a container needs to be (re)started by a new 'managed-by', make sure
      it can be found by the default 'paunch' value as well, then reset its
      managed-by to the desired value.

    Closes-bug: #1853812

    Change-Id: If129bbc1ff32941d06ff480f26870b10840591e0
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in tripleo:
status: In Progress → Fix Released

Reviewed: https://review.opendev.org/696673
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=1620395464c95bc90c97f470a8faedc6a549d52d
Submitter: Zuul
Branch: stable/train

commit 1620395464c95bc90c97f470a8faedc6a549d52d
Author: Bogdan Dobrelya <email address hidden>
Date: Thu Nov 28 15:31:55 2019 +0100

    Fix action Apply ignoring managed-by arg

    From the very beginning (06036fd6dba3ba4b9a13052ea95a08ebcc97501e), the
    action apply was ignoring the passed --managed-by values and was always
    taking defaults ('paunch').

    Fix this and provide no upgrade/update impact, which is for whatever
    --managed-by value given to paunch, perform all checks and searches
    also for that historically "wrong" value 'paunch':

    * if a container needs to be (re)started by a new 'managed-by', make sure
      it can be found by the default 'paunch' value as well, then reset its
      managed-by to the desired value.

    Closes-bug: #1853812

    Change-Id: If129bbc1ff32941d06ff480f26870b10840591e0
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 12ccf36b0e7f9b233b8b71a21fe24b241ecb639b)

Marios Andreou (marios-b) wrote :

o/ folks,

 trying to catch up on status I just left a comment on https://review.opendev.org/#/c/696679/ there are 3 different related fixes afaics but none of them makes the job green? Sorry if i am missing something please add more info if you have it thanks

Bogdan Dobrelya (bogdando) wrote :

IIUC, we need all dependencies of https://review.opendev.org/#/c/695242/ to close this

Changed in tripleo:
status: Fix Released → In Progress
assignee: Bogdan Dobrelya (bogdando) → Carlos Camacho (ccamacho)
Carlos Camacho (ccamacho) wrote :

Hi Marios, yeahp, as Bogdan said, we already have a green run of the job here https://review.opendev.org/#/c/695242/16. I'm tracking down all patches pending to be merged, so eventually, we should have it stabilized and voting.

Marios Andreou (marios-b) wrote :

thanks @ccamacho and @bogdan Carlos please consider including the bug at https://review.opendev.org/#/c/695242/ so it shows up here (well its already here now 3 times :D) but anyway for the git log on tht thanks

Marios Andreou (marios-b) wrote :

posted the changes to make it voting at https://review.opendev.org/#/q/topic:containerized-undercloud-upgrades-voting and they depend on the 'topmost' review https://review.opendev.org/#/c/695242/ where the job is green

Reviewed: https://review.opendev.org/696674
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=40aba310e8a6fe0c2d0d4bd9e6c437c4ede25ba8
Submitter: Zuul
Branch: stable/stein

commit 40aba310e8a6fe0c2d0d4bd9e6c437c4ede25ba8
Author: Bogdan Dobrelya <email address hidden>
Date: Thu Nov 28 15:31:55 2019 +0100

    Fix action Apply ignoring managed-by arg

    From the very beginning (06036fd6dba3ba4b9a13052ea95a08ebcc97501e), the
    action apply was ignoring the passed --managed-by values and was always
    taking defaults ('paunch').

    Fix this and provide no upgrade/update impact, which is for whatever
    --managed-by value given to paunch, perform all checks and searches
    also for that historically "wrong" value 'paunch':

    * if a container needs to be (re)started by a new 'managed-by', make sure
      it can be found by the default 'paunch' value as well, then reset its
      managed-by to the desired value.

    Closes-bug: #1853812

    Change-Id: If129bbc1ff32941d06ff480f26870b10840591e0
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 12ccf36b0e7f9b233b8b71a21fe24b241ecb639b)

tags: added: in-stable-stein

Reviewed: https://review.opendev.org/696675
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=965e3cfecb5d20bf1929adbe3f5ef5adc069705f
Submitter: Zuul
Branch: stable/rocky

commit 965e3cfecb5d20bf1929adbe3f5ef5adc069705f
Author: Bogdan Dobrelya <email address hidden>
Date: Thu Nov 28 15:31:55 2019 +0100

    Fix action Apply ignoring managed-by arg

    From the very beginning (06036fd6dba3ba4b9a13052ea95a08ebcc97501e), the
    action apply was ignoring the passed --managed-by values and was always
    taking defaults ('paunch').

    Fix this and provide no upgrade/update impact, which is for whatever
    --managed-by value given to paunch, perform all checks and searches
    also for that historically "wrong" value 'paunch':

    * if a container needs to be (re)started by a new 'managed-by', make sure
      it can be found by the default 'paunch' value as well, then reset its
      managed-by to the desired value.

    Closes-bug: #1853812

    Change-Id: If129bbc1ff32941d06ff480f26870b10840591e0
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 12ccf36b0e7f9b233b8b71a21fe24b241ecb639b)

tags: added: in-stable-rocky
Bogdan Dobrelya (bogdando) wrote :

Just to note, https://review.opendev.org/695921 only addresses upgrades, whereis there is no "bad" code in the older N-x release. So the source to destination upgrade executed by old paunch code goes smooth. But for minor updates, the issue remains: "old" (not yet updated) patch still executes config-ids as lists and fails to filter anything. So the minor update from the "bad" state of paunch to the "fixed" will still fail being executed ... by the still adfected paunch.

Bogdan Dobrelya (bogdando) wrote :

A correction, the comment above is actually about the "pre-updated" containers state, see details in bug 1855090

Reviewed: https://review.opendev.org/695242
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=a364005e03d7b5e44740354bde21d2fc185f6fab
Submitter: Zuul
Branch: stable/train

commit a364005e03d7b5e44740354bde21d2fc185f6fab
Author: Mathieu Bultel <email address hidden>
Date: Thu Nov 14 15:02:37 2019 +0100

    Check if snmpd is enabled for upgrade_tasks

    This check should prevent some error when the upgrade
    tasks checks if the snmpd is stopped in step 1

    Depends-On: https://review.opendev.org/#/c/695234/
    Depends-On: https://review.opendev.org/#/c/695419/
    Depends-On: https://review.opendev.org/#/c/695562/
    Depends-On: https://review.opendev.org/#/c/696855/
    Depends-On: https://review.opendev.org/#/c/695929/
    Depends-On: https://review.opendev.org/#/c/696673/

    Closes-Bug: #1853812

    (cherry picked from commit 709a6b78bbe3ed181ab53b6055964c8b4332946d)

    Change-Id: I5cdc9ef6a20b7d18aaa802927959b81a08334753

Marios Andreou (marios-b) wrote :

I think we can call this fix-released since https://review.opendev.org/695242 merged.

Trying to make the job voting now by merging those (ongoing recheck&gate dance etc) https://review.opendev.org/#/q/topic:containerized-undercloud-upgrades-voting but moving to fix-released

Changed in tripleo:
status: In Progress → Fix Released

Reviewed: https://review.opendev.org/696870
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=3414b0108f0f1739880056cb8b2cffb1577909ce
Submitter: Zuul
Branch: stable/train

commit 3414b0108f0f1739880056cb8b2cffb1577909ce
Author: Marios Andreou <email address hidden>
Date: Mon Dec 2 14:11:16 2019 +0200

    Make containerized-undercloud-upgrades vote on train

    The Depends-On makes the job green for the related bug.
    We should immediately make that voting for train.

    Depends-On: https://review.opendev.org/695242
    Change-Id: Id992e3dafde98e504ade532bf2594c88cf49d49b
    Related-Bug: 1853812

Reviewed: https://review.opendev.org/696874
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=6fa29bbd3e82a3192e5c3ac75f9b4c7e551b52d4
Submitter: Zuul
Branch: stable/train

commit 6fa29bbd3e82a3192e5c3ac75f9b4c7e551b52d4
Author: Marios Andreou <email address hidden>
Date: Mon Dec 2 14:15:34 2019 +0200

    Make containerized-undercloud-upgrades vote on train

    The Depends-On makes the job green for the related bug.
    We should immediately make that voting for train.

    Depends-On: https://review.opendev.org/695242
    Change-Id: I15a1ceb756d6b9f393ff193213325316b998140c
    Related-Bug: 1853812

Reviewed: https://review.opendev.org/696868
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=655a0e4ce373de1e60cccb0600659c1a3b32a798
Submitter: Zuul
Branch: stable/train

commit 655a0e4ce373de1e60cccb0600659c1a3b32a798
Author: Marios Andreou <email address hidden>
Date: Mon Dec 2 13:52:34 2019 +0200

    Make containerized-undercloud-upgrades vote on train

    The Depends-On makes the job green for the related bug.
    We should immediately make that voting for train.

    Depends-On: https://review.opendev.org/695242
    Change-Id: I1268f0ccefccd0a82564792a7d62f9a7ddad4173
    Related-Bug: 1853812

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers