upgrades fails while restarting some containers

Bug #1662598 reported by Eduardo Gonzalez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
Undecided
Jeffrey Zhang
Ocata
Fix Released
Undecided
Jeffrey Zhang

Bug Description

While upgrading Newton to Ocata.
kolla_toolbox and nova_libvirt are not recreated.
I reproduced this behaviour in all upgrade attempts with both centos binary and source.

Instead of recreated are removed and fails.
This is a hard issue because nova_libvirt is not present until the next upgrade attempt.

Errors:

TASK [common : Starting kolla-toolbox container] *******************************
fatal: [localhost]: FAILED! => {"changed": true, "failed": true, "msg": "'Traceback (most recent call last):\\n File \"/tmp/ansible_Xixy18/ansible_module_kolla_docker.py\", line 774, in main\\n result = bool(getattr(dw, module.params.get(\\'action\\'))())\\n File \"/tmp/ansible_Xixy18/ansible_module_kolla_docker.py\", line 592, in start_container\\n self.remove_container()\\n File \"/tmp/ansible_Xixy18/ansible_module_kolla_docker.py\", line 480, in remove_container\\n force=True\\n File \"/usr/lib/python2.7/site-packages/docker/utils/decorators.py\", line 21, in wrapped\\n return f(self, resource_id, *args, **kwargs)\\n File \"/usr/lib/python2.7/site-packages/docker/api/container.py\", line 278, in remove_container\\n self._raise_for_status(res)\\n File \"/usr/lib/python2.7/site-packages/docker/client.py\", line 174, in _raise_for_status\\n raise errors.APIError(e, response, explanation=explanation)\\nAPIError: 500 Server Error: Internal Server Error (\"{\"message\":\"Unable to remove filesystem for 0a45bbf35052607f252fd6d5ec21017086544432d5b78351632b9a42f90d6313: remove /var/lib/docker/containers/0a45bbf35052607f252fd6d5ec21017086544432d5b78351632b9a42f90d6313/shm: device or resource busy\"}\")\\n'"}

RUNNING HANDLER [nova : Restart nova-libvirt container] ************************
fatal: [localhost]: FAILED! => {"changed": true, "failed": true, "msg": "'Traceback (most recent call last):\\n File \"/tmp/ansible_M3Oedo/ansible_module_kolla_docker.py\", line 774, in main\\n result = bool(getattr(dw, module.params.get(\\'action\\'))())\\n File \"/tmp/ansible_M3Oedo/ansible_module_kolla_docker.py\", line 581, in recreate_or_restart_container\\n self.remove_container()\\n File \"/tmp/ansible_M3Oedo/ansible_module_kolla_docker.py\", line 480, in remove_container\\n force=True\\n File \"/usr/lib/python2.7/site-packages/docker/utils/decorators.py\", line 21, in wrapped\\n return f(self, resource_id, *args, **kwargs)\\n File \"/usr/lib/python2.7/site-packages/docker/api/container.py\", line 278, in remove_container\\n self._raise_for_status(res)\\n File \"/usr/lib/python2.7/site-packages/docker/client.py\", line 174, in _raise_for_status\\n raise errors.APIError(e, response, explanation=explanation)\\nAPIError: 500 Server Error: Internal Server Error (\"{\"message\":\"Unable to remove filesystem for cb03dca6fd61086a321b99646f26a31b6d79cfec6a10a4155fb4ef21a35a4014: remove /var/lib/docker/containers/cb03dca6fd61086a321b99646f26a31b6d79cfec6a10a4155fb4ef21a35a4014/shm: device or resource busy\"}\")\\n'"}

Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote :

which docker-engine are u using?
i saw this in docker 1.13. but docker 1.12.x is OK.

I guess there are some bug in docker 1.13. shm file is not release by docker or container.

Changed in kolla-ansible:
status: New → Confirmed
Revision history for this message
Eduardo Gonzalez (egonzalez90) wrote :

Faced similar issue with 1.12.6 while upgrading with an instance running.
Almost all containers are restarted, but libvirt is removed and exist with similar error:
RUNNING HANDLER [nova : Restart nova-libvirt container] ************************
fatal: [localhost]: FAILED! => {"changed": true, "failed": true, "msg": "'Traceback (most recent call last):\\n File \"/tmp/ansible_1FNRnQ/ansible_module_kolla_docker.py\", line 781, in main\\n result = bool(getattr(dw, module.params.get(\\'action\\'))())\\n File \"/tmp/ansible_1FNRnQ/ansible_module_kolla_docker.py\", line 588, in recreate_or_restart_container\\n self.remove_container()\\n File \"/tmp/ansible_1FNRnQ/ansible_module_kolla_docker.py\", line 487, in remove_container\\n force=True\\n File \"/usr/lib/python2.7/site-packages/docker/utils/decorators.py\", line 21, in wrapped\\n return f(self, resource_id, *args, **kwargs)\\n File \"/usr/lib/python2.7/site-packages/docker/api/container.py\", line 278, in remove_container\\n self._raise_for_status(res)\\n File \"/usr/lib/python2.7/site-packages/docker/client.py\", line 174, in _raise_for_status\\n raise errors.APIError(e, response, explanation=explanation)\\nAPIError: 500 Server Error: Internal Server Error (\"{\"message\":\"Driver devicemapper failed to remove root filesystem 2e9ca10f7ad0950fe867e1a2a6a67d1874543f6678f40877f51b9b2094adb76b: Device is Busy\"}\")\\n'"}

Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote :

i tried upgrade on docker 1.12.6, and it works.

weird docker ;(

Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote :

docker upstream bug: https://github.com/moby/moby/issues/17902

When this error happens, the container is removed actually. even though docker doesn't remove it on filesystem. A workaround is assume the container is removed if we can not found it in docker ps -a

Changed in kolla-ansible:
milestone: none → pike-3
assignee: nobody → Jeffrey Zhang (jeffrey4l)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.openstack.org/488346

Changed in kolla-ansible:
status: Confirmed → In Progress
Changed in kolla-ansible:
milestone: pike-3 → pike-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.openstack.org/488346
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=ea1ae405ba6255cdde2af88fb000fd3b79ea3af1
Submitter: Jenkins
Branch: master

commit ea1ae405ba6255cdde2af88fb000fd3b79ea3af1
Author: Jeffrey Zhang <email address hidden>
Date: Fri Jul 28 14:30:45 2017 +0800

    Assume the container is removed if it is not show in docker ps

    In some case, docker can not remove container and raise following error
    message:

        Unable to remove filesystem for xxx remove
        /var/lib/docker/containers/xxx/shm: device or resource busy

    But the container is removed. This patch assumes container is
    removed if only container name is not shown in docker ps.

    Closes-Bug: #1662598
    Change-Id: I079d5ec6178018403ec7a49c975f137e27eb9ad4

Changed in kolla-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/ocata)

Reviewed: https://review.openstack.org/488306
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=8cf51b3fe5b7bfc1fe6776bdeb07c607cd8baf39
Submitter: Jenkins
Branch: stable/ocata

commit 8cf51b3fe5b7bfc1fe6776bdeb07c607cd8baf39
Author: Jeffrey Zhang <email address hidden>
Date: Fri Jul 28 14:30:45 2017 +0800

    Assume the container is removed if it is not show in docker ps

    In some case, docker can not remove container and raise following error
    message:

        Unable to remove filesystem for xxx remove
        /var/lib/docker/containers/xxx/shm: device or resource busy

    But the container is removed. This patch assumes container is
    removed if only container name is not shown in docker ps.

    Closes-Bug: #1662598
    Change-Id: I079d5ec6178018403ec7a49c975f137e27eb9ad4
    (cherry picked from commit ea1ae405ba6255cdde2af88fb000fd3b79ea3af1)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 5.0.0.0rc1

This issue was fixed in the openstack/kolla-ansible 5.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 4.0.3

This issue was fixed in the openstack/kolla-ansible 4.0.3 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.