Error: error removing container - device or resource busy

Bug #1876893 reported by Harald Jensås
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Sagi (Sergey) Shnaidman

Bug Description

Seeing this issue in upgrade check and gate jobs:

https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_d1c/724295/2/gate/tripleo-ci-centos-7-containerized-undercloud-upgrades/d1cdbaf/logs/undercloud/home/zuul/undercloud_upgrade.log

https://85e835dca339c7a48156-f1e10e895ca9a37e1229881907aff07c.ssl.cf2.rackcdn.com/724802/3/gate/tripleo-ci-centos-7-containerized-undercloud-upgrades/903d1be/logs/undercloud/home/zuul/undercloud_upgrade.log

2020-05-05 08:40:33 | TASK [tripleo-container-rm : Stop and remove container] ************************
2020-05-05 08:40:33 | Tuesday 05 May 2020 08:40:33 +0000 (0:00:00.605) 0:24:17.359 ***********
2020-05-05 08:40:34 | fatal: [undercloud]: FAILED! => {"changed": true, "cmd": ["podman", "container", "rm", "--force", "memcached"], "delta": "0:00:00.793761", "end": "2020-05-05 08:40:34.191716", "msg": "non-zero return code", "rc": 125, "start": "2020-05-05 08:40:33.397955", "stderr": "Error: error removing container 1120e8289ff337886dd9bbf35eba50f63090b9825dbd1c2e511d75a8c86a3222 root filesystem: unlinkat /var/lib/containers/storage/overlay/6c13e7d5cdb2bb0613b51c787f02ccbf292103df59474dda9e316832a901bacb/merged: device or resource busy", "stderr_lines": ["Error: error removing container 1120e8289ff337886dd9bbf35eba50f63090b9825dbd1c2e511d75a8c86a3222 root filesystem: unlinkat /var/lib/containers/storage/overlay/6c13e7d5cdb2bb0613b51c787f02ccbf292103df59474dda9e316832a901bacb/merged: device or resource busy"], "stdout": "", "stdout_lines": []}

tags: added: alert promotion-blocker
Revision history for this message
Harald Jensås (harald-jensas) wrote :

Is it running docker command for some reason?
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_d1c/724295/2/gate/tripleo-ci-centos-7-containerized-undercloud-upgrades/d1cdbaf/logs/undercloud/var/log/paunch.log

2020-05-05 08:14:11.606 79628 WARNING paunch [ ] docker runtime is deprecated in Stein and will be removed in Train.
2020-05-05 08:14:11.616 79628 ERROR paunch [ ] [Errno 2] No such file or directory

Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

is it something happening on a regular basis or "always"? podman does have some nice "things" with locks and sometimes loses its things and is unable to drop a container if it's not completely stopped... But as far as I know, it's more a transient error than an "always reproducible"....

As for the docker thing, I guess it's due to the upgrade - base is deployed with docker and we move to podman during the job.. Not sure about the final "No such file or directory" though :/.

At least, we clearly see the podman call in the first error snipped, and the "device or resource busy" is that weird, transient issue we did hit from time to time in some really weird case. We didn't see it back in master though, and it didn't show up in train for a long time... Or at least we didn't get any report about them.

Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

note: the first issue is located in tripleo-ansible/roles/tripleo_container_rm/tasks/tripleo_podman_container_rm.yml tasks.

train doesn't call paunch anymore. I don't see any sort of retry in master.

For the records, the node is running with the following podman version:
conmon.x86_64 2:2.0.8-1.el7 @quickstart-centos-extras
container-selinux.noarch 2:2.119.1-1.c57a6f9.el7 @quickstart-centos-extras
containernetworking-plugins.x86_64
                           0.8.1-4.el7.centos @quickstart-centos-extras
containers-common.x86_64 1:0.1.40-7.el7_8 @quickstart-centos-extras
podman.x86_64 1.6.4-16.el7_8 @quickstart-centos-extras

While on master, we have:
conmon.x86_64 2:2.0.6-1.module_el8.1.0+298+41f9343a @quickstart-centos-appstreams
container-selinux.noarch 2:2.124.0-1.module_el8.1.0+298+41f9343a @quickstart-centos-appstreams
containernetworking-plugins.x86_64 0.8.3-4.module_el8.1.0+298+41f9343a @quickstart-centos-appstreams
containers-common.x86_64 1:0.1.40-8.module_el8.1.0+298+41f9343a @quickstart-centos-appstreams
podman.x86_64 1.6.4-4.module_el8.1.0+298+41f9343a @quickstart-centos-appstreams

Maybe a regression?

Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

According to Jess on IRC, latest successful run was on https://zuul.opendev.org/t/openstack/build/74799bb9b964446b899a1d49087b31dd

Here, we have another version of podman. From a different source.
container-selinux.noarch 2:2.107-3.el7 @quickstart-centos-extras
containernetworking-plugins.x86_64
                           0.8.1-4.el7.centos @quickstart-centos-extras
containers-common.x86_64 1:0.1.37-3.el7.centos @quickstart-centos-extras
podman.x86_64 1.5.1-3.el7 @delorean-train-deps

and apparently no conmon package...

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

the 1st failure after podman version had been poked

https://099d8872cf39be9493f8-056bfb946e355d1a1a86dc411a70c5ec.ssl.cf1.rackcdn.com/724823/2/gate/tripleo-ci-centos-7-containerized-undercloud-upgrades/4812f2d/logs/undercloud/var/log/extra/package-list-installed.txt

podman.x86_64 1.6.4-16.el7_8 @quickstart-centos-extras

(upgraded from podman-1.4.4-4.el7.centos.x86_64)

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

podman-1.5.1-3.el7.x86_64.rpm - https://trunk.rdoproject.org/centos7-train/deps/latest/x86_64/
podman-1.6.4-16.el7_8.x86_64.rpm - http://mirror.bhs1.ovh.openstack.org/centos/7/extras/x86_64/Packages/

podman in @extras won

wes hayutin (weshayutin)
Changed in tripleo:
importance: High → Critical
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart (master)

Fix proposed to branch: master
Review: https://review.opendev.org/725636

Changed in tripleo:
assignee: nobody → wes hayutin (weshayutin)
status: Triaged → In Progress
Changed in tripleo:
assignee: wes hayutin (weshayutin) → Sagi (Sergey) Shnaidman (sshnaidm)
Changed in tripleo:
assignee: Sagi (Sergey) Shnaidman (sshnaidm) → wes hayutin (weshayutin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart (master)

Reviewed: https://review.opendev.org/725636
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart/commit/?id=636c3943f889e4fd7bfa6a44b0893faacd73728d
Submitter: Zuul
Branch: master

commit 636c3943f889e4fd7bfa6a44b0893faacd73728d
Author: Wes Hayutin <email address hidden>
Date: Tue May 5 09:43:04 2020 -0600

    exclude podman from centos-extras

    use podman from dlrn-deps as it's at
    a version we can control

    Closes-Bug: #1876893
    Change-Id: Ifbca398a05064f5d8aaee58c6890b7f5b6c42789

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
wes hayutin (weshayutin) wrote :

2020-05-06 14:38:13 | TASK [tripleo-container-rm : Stop and remove container] ************************
2020-05-06 14:38:13 | Wednesday 06 May 2020 14:38:13 +0000 (0:00:01.563) 0:27:38.002 *********
2020-05-06 14:38:14 | fatal: [undercloud]: FAILED! => {"changed": true, "cmd": ["podman", "container", "rm", "--force", "memcached"], "delta": "0:00:00.746729", "end": "2020-05-06 14:38:14.095980", "msg": "non-zero return code", "rc": 125, "start": "2020-05-06 14:38:13.349251", "stderr": "Error: error removing container 97dc3f43cf45a4d85256d13a58c532724a9edd03288546abda70426166475d8e root filesystem: remove /var/lib/containers/storage/overlay/9a8d06f4cc400d70372e424088da4a1bf865abd52465dad4f05ad27ea2d2c300/merged: device or resource busy", "stderr_lines": ["Error: error removing container 97dc3f43cf45a4d85256d13a58c532724a9edd03288546abda70426166475d8e root filesystem: remove /var/lib/containers/storage/overlay/9a8d06f4cc400d70372e424088da4a1bf865abd52465dad4f05ad27ea2d2c300/merged: device or resource busy"], "stdout": "", "stdout_lines": []}
2020-05-06 14:38:14 |

Still an issue after ensuring podman is at podman-1.5.1-3.el7.x86_64

Changed in tripleo:
status: Fix Released → Triaged
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
Sofer Athlan-Guyot (sofer-athlan-guyot) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :

Stopping memcached container...
May 06 14:06:10 undercloud.localdomain podman[190538]: 2020-05-06 14:06:10.605944732 +0000 UTC m=+0.330685216 container died 3a3f84b3e71edc2ee2146651d6a7e4f0a1b96e1ce7579f973ea681f79330ca84 (image=192.168.24.1:8787/tripleostein/centos-binary-memcached:687919e2bdd9e558da8af67c434a9c0e068aa4e5_9aca342f-updated-20200506122331, name=memcached)
May 06 14:06:10 undercloud.localdomain podman[190538]: 2020-05-06 14:06:10.608385991 +0000 UTC m=+0.333126371 container stop 3a3f84b3e71edc2ee2146651d6a7e4f0a1b96e1ce7579f973ea681f79330ca84 (image=192.168.24.1:8787/tripleostein/centos-binary-memcached:687919e2bdd9e558da8af67c434a9c0e068aa4e5_9aca342f-updated-20200506122331, name=memcached)
May 06 14:06:10 undercloud.localdomain podman[190538]: 3a3f84b3e71edc2ee2146651d6a7e4f0a1b96e1ce7579f973ea681f79330ca84
May 06 14:06:10 undercloud.localdomain systemd[1]: Stopped memcached container.
May 06 14:06:10 undercloud.localdomain podman[24920]: 2020-05-06 14:06:10.763622113 +0000 UTC m=+0.217426308 container cleanup 3a3f84b3e71edc2ee2146651d6a7e4f0a1b96e1ce7579f973ea681f79330ca84 (image=192.168.24.1:8787/tripleostein/centos-binary-memcached:687919e2bdd9e558da8af67c434a9c0e068aa4e5_9aca342f-updated-20200506122331, name=memcached)
May 06 14:06:11 undercloud.localdomain ansible-file[190593]: Invoked with directory_mode=None force=False remote_src=None _original_basename=None path=/etc/systemd/system/tripleo_memcached.service owner=None follow=True group=None unsafe_writes=None state=absent content=NOT_LOGGING_PARAMETER serole=None selevel=None setype=None access_time=None access_time_format=%Y%m%d%H%M.%S modification_time=None regexp=None src=None seuser=None recurse=False _diff_peek=None delimiter=None mode=None modification_time_format=%Y%m%d%H%M.%S attributes=None backup=None
May 06 14:06:11 undercloud.localdomain ansible-stat[190599]: Invoked with checksum_algorithm=sha1 get_checksum=True follow=False path=/etc/systemd/system/tripleo_memcached.service.requires get_md5=None get_mime=True get_attributes=True

https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_26d/724782/8/check/tripleo-ci-centos-7-containerized-undercloud-
upgrades/26de0d3/logs/undercloud/var/log/extra/journal.txt

perhaps this is an issue w/ the ansible file module? perhaps we can use force flag?

Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

Curiously, this review passed the test recently: https://review.opendev.org/#/c/722284/

Unfortunately I don't seem to find any data to understand the version of podman used after the upgrade.

Revision history for this message
wes hayutin (weshayutin) wrote :

Jesse check undercloud/var/log/extras/rpmlist.txt
podman is delivered via the baseos so it's version does NOT change as the distro is not changing.
should be 1.5.1 before and after

Changed in tripleo:
assignee: wes hayutin (weshayutin) → Sagi (Sergey) Shnaidman (sshnaidm)
status: Triaged → In Progress
Revision history for this message
wes hayutin (weshayutin) wrote :

Ideal fix after back porting to train of course https://review.opendev.org/#/c/725939/
Back up review: https://review.opendev.org/#/c/725940/

Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/725944

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ansible (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/725991

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/726003

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/726004

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/726005

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/726008

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ansible (stable/train)

Change abandoned by Sagi Shnaidman (<email address hidden>) on branch: stable/train
Review: https://review.opendev.org/725939
Reason: in favor of https://review.opendev.org/#/c/726008/1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ansible (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.opendev.org/725991
Reason: https://review.opendev.org/#/c/725942/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (master)

Reviewed: https://review.opendev.org/725942
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=14dc45d7cd20c9bc77817284d8f0c7720a92e5de
Submitter: Zuul
Branch: master

commit 14dc45d7cd20c9bc77817284d8f0c7720a92e5de
Author: Sagi Shnaidman <email address hidden>
Date: Wed May 6 20:24:53 2020 +0300

    Stop and remove container with module

    For stopping and removing podman container use podman_container
    modules which is idempotent. Retry it a few times if it doesn't
    pass first time.

    Partial-Bug: #1876893
    Change-Id: Ic9f063eac866b25f980f20f86502653289321592

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (stable/train)

Reviewed: https://review.opendev.org/725944
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=1f423fb45032fc9b373bfe987a11a29d26621372
Submitter: Zuul
Branch: stable/train

commit 1f423fb45032fc9b373bfe987a11a29d26621372
Author: Wes Hayutin <email address hidden>
Date: Wed May 6 11:35:33 2020 -0600

    standalone-upgrade was not triggering

    add upgrade coverage to tripleo-ansbile
    standalone-upgrade fires on master only
    standalone-upgrade-train is branched properly
    for stein->train

    Partial-Bug: #1876893
    Change-Id: I3720f0dc5b34d0cd451ca55508834dbe68189a9c

tags: added: in-stable-train
Revision history for this message
Sofer Athlan-Guyot (sofer-athlan-guyot) wrote :

Hi,

just a though here, but for train for

 - tripleo-ci-centos-7-containerized-undercloud-upgrades

maybe we should use paunch instead of ansible role, because the default, at least downstream, is paunch.

I'm saying maybe, because I don't know what the default are currently for tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates and tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades (maybe it's paunch)

Not sure where it is currently, but it should be the inclusion of disable-paunch.yaml somewhere in the deployment file.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/726008
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=f9ae624da10794c2072e4670c2e905b2a77d5e6b
Submitter: Zuul
Branch: stable/train

commit f9ae624da10794c2072e4670c2e905b2a77d5e6b
Author: Sagi Shnaidman <email address hidden>
Date: Wed May 6 20:18:39 2020 +0300

    Stop and remove container with module

    For stopping and removing podman container use podman_container
    modules which is idempotent. Retry it a few times if it doesn't
    pass first time.
    For --systemd flag should be defined true or false: "--systemd true"
    By default it's true, so can be removed.

    Partial-Bug: #1876893
    (cherry picked from commit 14dc45d7cd20c9bc77817284d8f0c7720a92e5de)
    Change-Id: Ic9f063eac866b25f980f20f86502653289321592

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on python-tripleoclient (stable/train)

Change abandoned by wes hayutin (<email address hidden>) on branch: stable/train
Review: https://review.opendev.org/726004
Reason: https://review.opendev.org/#/c/726008/2

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/train)

Change abandoned by wes hayutin (<email address hidden>) on branch: stable/train
Review: https://review.opendev.org/726003
Reason: https://review.opendev.org/#/c/726008/2

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-common (stable/train)

Change abandoned by wes hayutin (<email address hidden>) on branch: stable/train
Review: https://review.opendev.org/726005

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/726003
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=908280a058c445cae32122b9d46f96a061af3483
Submitter: Zuul
Branch: stable/train

commit 908280a058c445cae32122b9d46f96a061af3483
Author: Wes Hayutin <email address hidden>
Date: Wed May 6 15:30:02 2020 -0600

    tripleo-ci-centos-7-containerized-undercloud-upgrades -> NV

    Until we can work out the issues, we mark this test and non-voting
    so that we unblock other work.

    Partial-Bug: #1876893
    Change-Id: I2f43df301a64f82afe9c9b24f0177df0522306a9

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-upgrade (stable/train)

Change abandoned by Jesse Pretorius (odyssey4me) (<email address hidden>) on branch: stable/train
Review: https://review.opendev.org/725614
Reason: Superceded by https://review.opendev.org/#/c/726849/1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/train)

Reviewed: https://review.opendev.org/726005
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=7367d9399870d69ca0af49913b04a46c0794b7f4
Submitter: Zuul
Branch: stable/train

commit 7367d9399870d69ca0af49913b04a46c0794b7f4
Author: Wes Hayutin <email address hidden>
Date: Wed May 6 15:35:14 2020 -0600

    tripleo-ci-centos-7-containerized-undercloud-upgrades -> NV

    Partial-Bug: #1876893
    Change-Id: Ic2ffbfb06fd1fac3547edfba5bb82843928bba18

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (stable/train)

Reviewed: https://review.opendev.org/726004
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=3b52fa2cf1f8151db8858efbee43d42e3eb6b466
Submitter: Zuul
Branch: stable/train

commit 3b52fa2cf1f8151db8858efbee43d42e3eb6b466
Author: Wes Hayutin <email address hidden>
Date: Wed May 6 15:33:11 2020 -0600

    tripleo-ci-centos-7-containerized-undercloud-upgrades -> NV

    Partial-Bug: #1876893
    Change-Id: I5647f6e3bf5ffdfb65e5c5019c4fcf4c8098f6fc

Changed in tripleo:
milestone: victoria-1 → victoria-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ansible (stable/train)

Change abandoned by wes hayutin (<email address hidden>) on branch: stable/train
Review: https://review.opendev.org/725940
Reason: too_old

Changed in tripleo:
milestone: victoria-3 → wallaby-1
Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.