tripleo_container_stop has trouble to stop docker containers

Bug #1893099 reported by Jose Luis Franco on 2020-08-26
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
Jose Luis Franco

Bug Description

During the FFU procedure, when upgrading from Queens to Train we have experienced some issues to stop containers in the Queens controllers.

Basically, even though Queens works on docker, the podman alias is already available so you can run any docker command using the podman one. This ends up being a problem when running the tripleo_container_stop role from tripleo-ansible as the code it runs to stop the containers looks like:

    # We need to make sure that containers are stopped
    # as we might have different CLIs to interact with
    # them. I.e the container_cli might be setted to be podman
    # but we might have the containers running with docker.
    set -eu
    if command -v podman && podman exec {{ container }} /bin/true; then
        if systemctl status {{ container }}.service; then
            systemctl stop {{ container }}.service
        else
            podman kill {{ container }}
        fi
    fi
    if type docker &> /dev/null && docker exec {{ container }} /bin/true; then
        docker stop {{ container }}
    fi

According to this code, even though the container engine runnig is docker, as podman exists it will go via that path, ending the container with a "podman kill {{ container}}" while the right way to
stop the container would be with a "docker stop {{ container }}"

The problems we are seeing are the following:

2020-08-26 08:41:36,766 p=20536 u=mistral n=ansible | changed: [undercloud -> 192.168.24.42] => (item=controller-1) => {"ansible_loop_var": "tripleo_delegate_to_item", "changed": true, "cmd": "# We need to make sure that containers are stopped\n# as we might have different CLIs to interact with\n# them. I.e the container_cli might be setted to be podman\n# but we might have the containers running with docker.\nset -eu\nif command -v podman && podman exec ceilometer_agent_notification /bin/true; then\n if systemctl status ceilometer_agent_notification.service; then\n systemctl stop ceilometer_agent_notification.service\n else\n podman kill ceilometer_agent_notification\n fi\nfi\nif type docker &> /dev/null && docker exec ceilometer_agent_notification /bin/true; then\n docker stop ceilometer_agent_notification\nfi", "delta": "0:00:00.193018", "end": "2020-08-26 12:41:36.733764", "rc": 0, "start": "2020-08-26 12:41:36.540746", "stderr": "Error response from daemon: Container b0fad69b6dcdb228aeb5172d2a4ad34aa4ad4a87e92b5d35ee98e4fd3049a44c is restarting, wait until the container is running", "stderr_lines": ["Error response from daemon: Container b0fad69b6dcdb228aeb5172d2a4ad34aa4ad4a87e92b5d35ee98e4fd3049a44c is restarting, wait until the container is running"], "stdout": "/bin/podman\nrpc error: code = 2 desc = oci runtime error: exec failed: container \"b0fad69b6dcdb228aeb5172d2a4ad34aa4ad4a87e92b5d35ee98e4fd3049a44c\" does not exist", "stdout_lines": ["/bin/podman", "rpc error: code = 2 desc = oci runtime error: exec failed: container \"b0fad69b6dcdb228aeb5172d2a4ad34aa4ad4a87e92b5d35ee98e4fd3049a44c\" does not exist"], "tripleo_delegate_to_item": "controller-1"}

And if tried to kill the container manually with "podman kill <container>" the command hangs..while doing a "docker stop <container>" succeeds.

Fix proposed to branch: master
Review: https://review.opendev.org/748270

Changed in tripleo:
assignee: nobody → Jose Luis Franco (jfrancoa)
status: Triaged → In Progress

Change abandoned by Jose Luis Franco (<email address hidden>) on branch: master
Review: https://review.opendev.org/748270
Reason: change not needed

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers