kolla-ansible

"Running MariaDB bootstrap container" fails

Bug #1748194 reported by Jonathan Nakandala on 2018-02-08

This bug report is a duplicate of: Bug #1746748: python docker 3.0 package break the kolla-ansible. Edit Remove

This bug affects 9 people

Affects		Status	Importance	Assigned to	Milestone
	kolla-ansible	New	Undecided	Unassigned

Bug Description

Over the last few days a deployment of kolla-ansible that used to work does not anymore.

I've tried it on Ubuntu 16.04 and Centos 7 and run into the same issue.
I have also tried using both source and binary options
I have tried to deploy pike.
I have tried on bare metal and virtualbox vm.

It fails at the following step in deployment:

TASK [mariadb : Running MariaDB bootstrap container] ***************************
fatal: [localhost]: FAILED! => {"changed": true, "msg": "Container exited with non-zero return code"}

Here is the output from the shell:
https://pastebin.com/vnFbQzTn

The output of
docker logs bootstrap_mariadb does not seem to indicate an error:
https://pastebin.com/WhMfDx8z

This is the output of:
docker start -a bootstrap_mariadb
https://pastebin.com/ei33sxxU

It crashes with the following error:
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)

Here is the globals.yml file I'm using:
https://pastebin.com/q7R9fFgQ

And still run into the same issue.

However if I rerun kolla-ansible -i /all-in-one deploy

The deployment manages to go further and spin up the mariadb container
But will fail on the rabbitmq deployment.
It always fails on the rabbitmq after running the deploy script again.

TASK [rabbitmq : Running RabbitMQ bootstrap container] *************************
fatal: [localhost]: FAILED! => {"changed": true, "msg": "Container exited with non-zero return code"}

Here is the output from docker logs for the crashed container:
https://pastebin.com/CsfKQcjm

Again I can try deploying the ansible playbook again.
And the containers for rabbitmq will successfully spin up.

Then the deployment will fail on the keystone bootstrap container.

TASK [keystone : Running Keystone bootstrap container] *************************
fatal: [localhost -> localhost]: FAILED! => {"changed": true, "msg": "Container exited with non-zero return code"}

Again this is the output from the docker logs and starting the container:
https://pastebin.com/9Hdgk6FM

I could go on, but essentially the deployment continues and fails randomly as you go on for a while.
But once it gets to the ceilometer container, redeploying always fails at that point.

Any ideas what could be causing it?

Revision history for this message

Etienne DUPUIS (etienned) wrote on 2018-02-08:

Hello,

I have exactly the same issue with both all-in-one and multinode deployment since Friday 2018-02-02 and I still didn't found any answer.

I look forward to an answer also

Regards
Etienne

Revision history for this message

Alexandru Bogdan Pica (dtk.me) wrote on 2018-02-08:

Did you first run kolla-genpwd ? The reason it fails right now for you both is that it does not find database_password

Revision history for this message

Jonathan Nakandala (jonathannakandala) wrote on 2018-02-08:

Hi Alexandru,

Yes. I used both a passwords.yml file that I customised and a blank one that I ran kolla-genpwd on.

The weird thing is. When the mariadb bootstrap fails, if I run the deploy command again. It'll succeed in deploying the mariadb container.

Nothing else is changed.

Best Regards,
Jonathan

Revision history for this message

Allan Krueger (klimber) wrote on 2018-02-16:

Im facing the same problem, the deployment stops after MariaDB, then Rabbitmq, then keystone, then glance, and so on. Re-running kolla-ansible deploy makes it go for the next step. After each fails it leaves a container "bootstrap_[mariadb,rabbitmq,keystone,etc]" and when re-running it creates a new one but this time just "[mariadb,rabbitmq,keystone,etc]" without the "bootstrap_".

For me it stops working on TASK [nova : Waiting for nova-compute service up], nova compute service never gets up and ends up in:

fatal: [localhost -> localhost]: FAILED! => {"attempts": 20, "changed": false, "cmd": ["docker", "exec", "kolla_toolbox", "openstack", "--os-interface", "internal", "--os-auth-url", "http://10.10.10.254:35357", "--os-identity-api-version", "3", "--os-project-domain-name", "default", "--os-tenant-name", "admin", "--os-username", "admin", "--os-password", "reM6tKenxKogaUZbJMKJmD8Aht97IlN55Hh11ZCV", "--os-user-domain-name", "default", "compute", "service", "list", "-f", "json", "--service", "nova-compute"], "delta": "0:00:01.370807", "end": "2018-02-16 13:50:08.513147", "rc": 0, "start": "2018-02-16 13:50:07.142340", "stderr": "", "stderr_lines": [], "stdout": "[]", "stdout_lines": ["[]"]}

If I try to acces the 10.10.10.254:35357 url it shows:

{"versions": {"values": [{"status": "stable", "updated": "2017-02-22T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}], "id": "v3.8", "links": [{"href": "http://10.10.10.254:35357/v3/", "rel": "self"}]}, {"status": "deprecated", "updated": "2016-08-04T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v2.0+json"}], "id": "v2.0", "links": [{"href": "http://10.10.10.254:35357/v2.0/", "rel": "self"}, {"href": "https://docs.openstack.org/", "type": "text/html", "rel": "describedby"}]}]}}

I have no more ideas

I have tried CentOS 7 and Ubuntu 16.04
I Have tried pike and ocata
I have used binary install type
I have tried multinode and all-in-one
I have tried kolla-ansible 5.0.1 and 6.0.0.0b3 (from pip)

Regards,
Allan

For me it stops working on TASK [nova : Waiting for nova-compute service up], nova compute service never gets up and ends up in:

If I try to acces the 10.10.10.254:35357 url it shows:

I have no more ideas

I have tried CentOS 7 and Ubuntu 16.04
I Have tried pike and ocata
I have used binary install type
I have tried multinode and all-in-one
I have tried kolla-ansible 5.0.1 and 6.0.0.0b3 (from pip)

Regards,
Allan

Revision history for this message

Holosian (holosian) wrote on 2018-02-22:

Hello

Exactly same for me what Allan Krueger (klimber) wrote on 2018-02-16

Regards
holo

Revision history for this message

alex1231 (alex1231) wrote on 2018-02-23:

Hello

Im exactly facing the same problem.Does anybody know the function of "kolla-ansible bootstrap-servers" ?

Regards

Alex

Revision history for this message

Jonathan Nakandala (jonathannakandala) wrote on 2018-02-23:

failure at RabbitMQ, Rerun then fail at Keystone, Rerun then fail at Glance.txt Edit (132.1 KiB, text/plain)

@alex1231
The bootstrap servers step sets up the servers by installing and configuring prerequisite software

Just installed a fresh virtual machine of Ubuntu 16.04 and I'm still having the same issue where running deploy will give me the same issue at different places.
Search through it by searching on kolla@kolla

The only things I've changed in the globals.yml file:

network_interface: "ens3"
This is the interface that kolla_internal_vip_address uses.
It's a static IP that the machine has.

This is a network with no Ip configured which should be the provider network:
neutron_external_interface: "ens9"

Only other thing I changed was disabling haproxy. Since I never got that to deploy.

It's an all-in-one deployment.

Revision history for this message

Chris L (onyx4) wrote on 2018-02-25:

Same issue here on the following baremetal server:

RHEL 7.4
Docker version 17.12.0-ce, build c97c6d6
using pike, centos, binary mode in globals
kolla-ansible 5.0.1
using br0 interface for internal, veth1 for external

I did successfully deploy an all-in-one a week ago on a different server, now I'm trying a multi-node and it fails at every bootstrap operation during the deploy stage. If I retry the deploy, it will fail on the next service until it reaches nova where then it fails everytime.

It'd be nice to identify the root cause.

errors:

TASK [mariadb : include] ****************************************************************************************************************************************************************************
included: /usr/share/kolla-ansible/ansible/roles/mariadb/tasks/bootstrap_cluster.yml for openstack4

TASK [mariadb : Running MariaDB bootstrap container] ************************************************************************************************************************************************
fatal: [openstack4]: FAILED! => {"changed": true, "msg": "Container exited with non-zero return code"}

Revision history for this message

Jonathan Nakandala (jonathannakandala) wrote on 2018-03-06:

In the end I just ran this little script:

!/bin/bash
until kolla-ansible -i ./all-in-one reconfigure; do
echo Update has failed, retrying in 3 seconds.
sleep 3
done

It just keeps retrying if it ends in an error over and over until it works.
Seems to work for me.

Revision history for this message

Allan Krueger (klimber) wrote on 2018-03-06:

#10

I'll try your script Jonathan.

But why "kolla-ansible reconfigure" instead of the "kolla-ansible deploy"?

Is there any major difference between them?

What I tried after kolla was going back to fuel 10, and I noticed it has now a few problems with running local repositories that it didnt have before. Maybe kolla is being affected by that as well.

Revision history for this message

Jonathan Nakandala (jonathannakandala) wrote on 2018-03-06:

#11

I took reconfigure from here:
https://docs.openstack.org/kolla-ansible/latest/user/operating-kolla.html

If you look in the code repo:
https://github.com/openstack/kolla-ansible/tree/master/ansible/roles

Then go into <projectname>/tasks
You'll see a reconfigure.yml for some of the projects it just runs the deploy task, but for others it checks to see if there's an already running container.

However 'deploy' doesn't seem to kill containers that have already started so it's not like it's doing a fresh deployment.

Revision history for this message

Chris L (onyx4) wrote on 2018-03-17:

#12

Any update on this bug? I also hit this when I tried on Ubuntu 16.04 following the quick start guide using an all-in-one. So it doesn't seem isolated to RedHat 7.3. Maybe it's some pip package which got updated.

What recent change could have broke this functionality?

Revision history for this message

Jeffrey Zhang (jeffrey4l) wrote on 2018-03-19:

#13

for guys who encounter this issue, could u provides

1. docker ps -a
2. pip freeze | grep docker

i guess this may related to https://bugs.launchpad.net/kolla-ansible/+bug/1746748

Revision history for this message

Chris L (onyx4) wrote on 2018-03-19:

#14

Per bug 1746748 , the following command fixes the issue for me and the build completes without this error. So it's related to the python docker 3.0 update for the wait state.

Destroy your existing build to get a clean start, and run the command below to downgrade the python docker package, then deploy again. Also make sure not to use docker-py and uninstall that.

# pip install docker==2.7.0

# kolla-ansible -i <inventory file> deploy

Revision history for this message

Jeffrey Zhang (jeffrey4l) wrote on 2018-03-22:

#15

@Chris thanks for confirming this.

mark this as duplicated