kolla-ansible

CentOS 8 Failure of Restart elasticsearch container

Bug #1878412 reported by r3ap3r-d3v on 2020-05-13

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	kolla-ansible	Invalid	Undecided	Unassigned
	Train	Fix Released	High	Mark Goddard	kolla-ansible 9.2.0 "Train"

Bug Description

I'm running a Kolla-Ansible 9.1.0 deployment of Train on CentOS 8 and I'm running into the following issue. The deployment fails at the `RUNNING HANDLER [elasticsearch : Restart elasticsearch container]` play. Below is the "log" for the play itself when it fails:

RUNNING HANDLER [elasticsearch : Restart elasticsearch container] ****************************************************************************************************************************
task path: /home/nomad/openstack/share/kolla-ansible/ansible/roles/elasticsearch/handlers/main.yml:2
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: nomad
<localhost> EXEC /bin/sh -c 'echo ~nomad && sleep 0'
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /home/nomad/.ansible/tmp `"&& mkdir /home/nomad/.ansible/tmp/ansible-tmp-1589322933.7655256-5122-96496832711461 && echo ansible-tm
p-1589322933.7655256-5122-96496832711461="` echo /home/nomad/.ansible/tmp/ansible-tmp-1589322933.7655256-5122-96496832711461 `" ) && sleep 0'
Using module file /home/nomad/openstack/share/kolla-ansible/ansible/library/kolla_docker.py
<localhost> PUT /home/nomad/.ansible/tmp/ansible-local-32237tzfv1ojq/tmp8b02406h TO /home/nomad/.ansible/tmp/ansible-tmp-1589322933.7655256-5122-96496832711461/AnsiballZ_kolla_docker.py
<localhost> EXEC /bin/sh -c 'chmod u+x /home/nomad/.ansible/tmp/ansible-tmp-1589322933.7655256-5122-96496832711461/ /home/nomad/.ansible/tmp/ansible-tmp-1589322933.7655256-5122-9649683271146
1/AnsiballZ_kolla_docker.py && sleep 0'
<localhost> EXEC /bin/sh -c 'sudo -H -S -n -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-bxddilegdiwbgoxrnbkeemcurjdwzkjn ; /usr/libexec/platform-python /home/nomad/.ansible/tmp/ansible-tmp-1
589322933.7655256-5122-96496832711461/AnsiballZ_kolla_docker.py'"'"' && sleep 0'
<localhost> EXEC /bin/sh -c 'rm -f -r /home/nomad/.ansible/tmp/ansible-tmp-1589322933.7655256-5122-96496832711461/ > /dev/null 2>&1 && sleep 0'
The full traceback is:
  File "/tmp/ansible_kolla_docker_payload_o0xmgzm2/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py", line 1024, in main
  File "/tmp/ansible_kolla_docker_payload_o0xmgzm2/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py", line 747, in recreate_or_restart_container
  File "/tmp/ansible_kolla_docker_payload_o0xmgzm2/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py", line 765, in start_container
  File "/tmp/ansible_kolla_docker_payload_o0xmgzm2/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py", line 571, in pull_image
  File "/usr/local/lib/python3.6/site-packages/docker/api/image.py", line 415, in pull
    self._raise_for_status(response)
  File "/usr/local/lib/python3.6/site-packages/docker/api/client.py", line 263, in _raise_for_status
    raise create_api_error_from_http_exception(e)
  File "/usr/local/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
    raise cls(e, response=response, explanation=explanation)
fatal: [localhost]: FAILED! => {
    "changed": true,
    "invocation": {
        "module_args": {
            "action": "recreate_or_restart_container",
            "api_version": "auto",
            "auth_email": null,
            "auth_password": null,
            "auth_registry": null,
            "auth_username": null,
            "cap_add": [],
            "client_timeout": 120,
            "command": null,
            "detach": true,
            "dimensions": {},
            "environment": {
                "ES_JAVA_OPTS": "-Xms1G -Xmx1G",
                "KOLLA_CONFIG_STRATEGY": "COPY_ALWAYS"
            },
            "graceful_timeout": 10,
            "image": "kolla/centos-source-elasticsearch:train-centos8",
            "labels": {},
            "name": "elasticsearch",
            "privileged": false,
            "remove_on_exit": true,
            "restart_policy": "unless-stopped",
            "restart_retries": 10,
            "security_opt": [],
            "state": "running",
            "tls_cacert": null,
            "tls_cert": null,
            "tls_key": null,
            "tls_verify": false,
            "tty": false,
            "volumes": [
                "/etc/kolla/elasticsearch/:/var/lib/kolla/config_files/",
                "/etc/localtime:/etc/localtime:ro",
                "",
                "elasticsearch:/var/lib/elasticsearch/data"
            ],
            "volumes_from": null
        }
    },
    "msg": "'Traceback (most recent call last):\\n File \"/usr/local/lib/python3.6/site-packages/docker/api/client.py\", line 261, in _raise_for_status\\n response.raise_for_status()\\n
File \"/usr/local/lib/python3.6/site-packages/requests/models.py\", line 941, in raise_for_status\\n raise HTTPError(http_error_msg, response=self)\\nrequests.exceptions.HTTPError: 404 C
lient Error: Not Found for url: http+docker://localhost/v1.40/images/create?tag=train-centos8&fromImage=kolla%2Fcentos-source-elasticsearch\\n\\nDuring handling of the above exception, anoth
er exception occurred:\\n\\nTraceback (most recent call last):\\n File \"/tmp/ansible_kolla_docker_payload_o0xmgzm2/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py\", line
1024, in main\\n File \"/tmp/ansible_kolla_docker_payload_o0xmgzm2/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py\", line 747, in recreate_or_restart_container\\n File \"
/tmp/ansible_kolla_docker_payload_o0xmgzm2/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py\", line 765, in start_container\\n File \"/tmp/ansible_kolla_docker_payload_o0xmgzm2/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py\", line 571, in pull_image\\n File \"/usr/local/lib/python3.6/site-packages/docker/api/image.py\", line 415, in pull\\n
   self._raise_for_status(response)\\n File \"/usr/local/lib/python3.6/site-packages/docker/api/client.py\", line 263, in _raise_for_status\\n raise create_api_error_from_http_exception(
e)\\n File \"/usr/local/lib/python3.6/site-packages/docker/errors.py\", line 31, in create_api_error_from_http_exception\\n raise cls(e, response=response, explanation=explanation)\\ndocker.errors.NotFound: 404 Client Error: Not Found (\"manifest for kolla/centos-source-elasticsearch:train-centos8 not found: manifest unknown: manifest unknown\")\\n'"
}
META: ran handlers

NO MORE HOSTS LEFT ***************************************************************************************************************************************************************************

PLAY RECAP ***********************************************************************************************************************************************************************************
localhost : ok=96 changed=1 unreachable=0 failed=1 skipped=87 rescued=0 ignored=0

It appears from the message above that when Kolla-Ansible reaches out for the `kolla/centos-source-elasticsearch:train-centos8` container, it does not find it on dockerhub. I went to dockerhub and searched manually and it looks like that docker container tag does not exists. The following information is probably irrevelant but I am including it for completeless.

**Environment**
OS: CentOS 8
Kernel: Linux localhost 4.18.0-147.8.1.el8_1.x86_64 x86_64 x86_64 x86_64 GNU/Linux
Docker Client: Version: 19.03.8
Docker Server: Version: 19.03.8
Kolla-Ansible: Train: 9.1.0
Images: From Docker Hub(Source)
Ansible: 2.9.7(Tagged so I don't used 2.9.8 which appears to have issues at the moment)
Deployment: All-in-One using Python 3 virtual environment
Pip Version: pip 20.1 from python 3.6

If you need any further information, feel free to let me know. Thanks.

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2020-05-14:

I set it critical on the default series because it blocks the migration path as well.

The problem is that the E*K images for Train CentOS 8 were added with a sufffix: elasticsearch6, kibana6: https://review.opendev.org/718696

And kolla-ansible does not know about it.

This was made to allow CentOS 7 to upgrade E*K stack nicely w/o the need to upgrade everything at once.

Changed in kolla-ansible:
status:	New → Triaged
importance:	Undecided → Critical

Revision history for this message

Mark Goddard (mgoddard) wrote on 2020-05-14:

I did start working on this last week, but other things got in the way. I'll try to continue today.

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2020-05-14:

Thanks, Mark.

Revision history for this message

r3ap3r-d3v (r3ap3r-d3v) wrote on 2020-05-14:

Thanks guys, I was able to get past this by commenting out the "enable_central_logging" and the "enable_monasca" options that rely on the elasticsearch container/service. I ran into the same issue with setting the "nova_console" to "spice" instead of leaving the default "novnc". I am going to submit another bug for that option as well as soon as I can capture the logs like I did above. My tmux session I was using crapped out on me and I'm trying to figure out how to pull them, if it is even possible? I might just have to run another deployment and grab the "failure" and then make the bug here. Thanks again.

Revision history for this message

Mark Goddard (mgoddard) wrote on 2020-05-14:

We dropped support for the spice console on CentOS 8 due to lack of a package. It's in release notes.

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2020-05-25:

I believe Mark fixes this in these: https://review.opendev.org/#/q/topic:bp/elasticsearch-kibana-version-upgrade+branch:stable/train

Changed in kolla-ansible:
status:	Triaged → Invalid
importance:	Critical → Undecided

Revision history for this message

r3ap3r-d3v (r3ap3r-d3v) wrote on 2020-06-09:

Yoctozepto and Mgoddard, I applied the patches in the link by hand and I was able to successfully deploy with central_logging_enable set in globals. Sorry it took so long, it has been crazy for me the past couple of weeks. Still working out some Monasca details, it timed out the first time on tried to deploy with both central_logging_enable and monasca_enable set. I commented monasca_enable back out and had to delete the ".kibana" index from elasticsearch for it to proceed with the deployment the second time. Attached is a screenshot of the Kibana page when I get "logged in". What I mean is Kibana will give me the username and password prompt and I will input the kibana user and the password from the passwords.yml for kibana and it presents me with the attached screenshot. I did not do a "pip -U kolla-ansible" after applying the patches and I didn't apply any patches to Kolla proper itself, not sure if that is what is causing my issue or not. As for this particular bug, kolla-ansible is pulling the elasticsearch container now so that is working. For the other things, I can submit individual bugs for them later so yall have oversight into them. Thanks for you work in Kolla and Kolla-Ansible. Awesome project. If you need anymore information from me, feel free to let me know. Thanks.

PS. I had to post what was returned on the Kibana page below, had trouble "attaching" a screenshot.

{"message":"Rejecting mapping update to [.kibana] as the final mapping would have more than 1 type: [doc, config]: [illegal_argument_exception] Rejecting mapping update to [.kibana] as the final mapping would have more than 1 type: [doc, config]","statusCode":400,"error":"Bad Request"}

Revision history for this message

r3ap3r-d3v (r3ap3r-d3v) wrote on 2020-06-12:

The Kibana issue I was experience appears to have went away when I applied the patches from the stable/train merge listed here: https://bugs.launchpad.net/kolla-ansible/+bug/1799689. I stopped all of the containers, applied the patches and then ran `kolla-ansible deploy` again and now I am getting the Kibana Dashboard. If you have anything else for me to try or look into for feedback, let me know. Thanks.

Revision history for this message

Mark Goddard (mgoddard) wrote on 2020-06-13:

Thanks for trying again with the patches applied. They have now been merged to the stable/train branch.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1882245

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.