kolla-toolbox container stuck in restart loop

Bug #1680139 reported by Ben Swartzlander on 2017-04-05
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
kolla-ansible
High
Chason Chan

Bug Description

I followed the quickstart guide (deployer workflow) using CentOS 7.3 and it fails at the deployment step with a problem in the tookbox container.

# kolla-ansible deploy -i /root/all-in-one
Deploying Playbooks : ansible-playbook -i /root/all-in-one -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla -e action=deploy /usr/share/kolla-ansible/ansible/site.yml

PLAY [Gather facts for all hosts] **********************************************

TASK [setup] *******************************************************************
ok: [localhost]

PLAY [Gather facts for all hosts (if using --limit)] ***************************

TASK [setup] *******************************************************************
skipping: [localhost] => (item=localhost)

PLAY [Detect openstack_release variable] ***************************************

TASK [Get current kolla-ansible version number] ********************************
ok: [localhost -> localhost]

TASK [Set openstack_release variable] ******************************************
ok: [localhost]

PLAY [Apply role prechecks] ****************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [prechecks : Checking the api_interface is present] ***********************
skipping: [localhost]

TASK [prechecks : Checking the api_interface is active] ************************
skipping: [localhost]

TASK [prechecks : Checking the api_interface configuration] ********************
skipping: [localhost]

TASK [prechecks : Checking the api_interface ip address configuration] *********
skipping: [localhost]

TASK [prechecks : Checking Docker version] *************************************
skipping: [localhost]

TASK [prechecks : Checking empty passwords in passwords.yml. Run kolla-genpwd if this task fails] ***
skipping: [localhost]

TASK [prechecks : Checking docker-py version] **********************************
skipping: [localhost]

TASK [prechecks : Checking Ansible version] ************************************
skipping: [localhost]

PLAY [Apply role chrony] *******************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [common : include] ********************************************************
skipping: [localhost]

TASK [common : Registering common role has run] ********************************
skipping: [localhost]

TASK [chrony : include] ********************************************************
skipping: [localhost]

PLAY [Apply role collectd] *****************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [common : include] ********************************************************
skipping: [localhost]

TASK [common : Registering common role has run] ********************************
skipping: [localhost]

TASK [collectd : include] ******************************************************
skipping: [localhost]

PLAY [Apply role elasticsearch] ************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [common : include] ********************************************************
skipping: [localhost]

TASK [common : Registering common role has run] ********************************
skipping: [localhost]

TASK [elasticsearch : include] *************************************************
skipping: [localhost]

PLAY [Apply role influxdb] *****************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [common : include] ********************************************************
skipping: [localhost]

TASK [common : Registering common role has run] ********************************
skipping: [localhost]

TASK [influxdb : include] ******************************************************
skipping: [localhost]

PLAY [Apply role telegraf] *****************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [common : include] ********************************************************
skipping: [localhost]

TASK [common : Registering common role has run] ********************************
skipping: [localhost]

TASK [telegraf : include] ******************************************************
skipping: [localhost]

PLAY [Apply role haproxy] ******************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [common : include] ********************************************************
included: /usr/share/kolla-ansible/ansible/roles/common/tasks/deploy.yml for localhost

TASK [common : include] ********************************************************
included: /usr/share/kolla-ansible/ansible/roles/common/tasks/config.yml for localhost

TASK [common : Ensuring config directories exist] ******************************
ok: [localhost] => (item=fluentd)
ok: [localhost] => (item=fluentd/input)
ok: [localhost] => (item=fluentd/output)
ok: [localhost] => (item=fluentd/format)
ok: [localhost] => (item=fluentd/filter)
ok: [localhost] => (item=kolla-toolbox)
ok: [localhost] => (item=cron)
ok: [localhost] => (item=cron/logrotate)

TASK [common : Copying over config.json files for services] ********************
ok: [localhost] => (item=fluentd)
ok: [localhost] => (item=kolla-toolbox)
ok: [localhost] => (item=cron)

TASK [common : Copying over fluentd input config files] ************************
ok: [localhost] => (item=00-global)
ok: [localhost] => (item=01-syslog)
ok: [localhost] => (item=02-mariadb)
ok: [localhost] => (item=03-rabbitmq)

TASK [common : Copying over fluentd ouput config files] ************************
ok: [localhost] => (item={u'enabled': True, u'name': u'00-local'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'01-es'})

TASK [common : Copying over fluentd format config files] ***********************
ok: [localhost] => (item=apache_access)
ok: [localhost] => (item=wsgi_access)
ok: [localhost] => (item=wsgi_python)

TASK [common : Copying over fluentd filter config files] ***********************
ok: [localhost] => (item=00-record_transformer)
ok: [localhost] => (item=01-rewrite)

TASK [common : Copying over tg-agent.conf] *************************************
ok: [localhost] => (item=fluentd)

TASK [common : Copying over cron logrotate config files] ***********************
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'ansible'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'aodh'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'barbican'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'ceilometer'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'cinder'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'cloudkitty'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'designate'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'elasticsearch'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'freezer'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'glance'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'global'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'gnocchi'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'grafana'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'haproxy'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'heat'})
skipping: [localhost] => (item={u'enabled': False, u'name': u'iscsid'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'karbor'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'keepalived'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'keystone'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'magnum'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'manila'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'mariadb'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'mistral'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'murano'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'neutron'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'nova'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'octavia'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'panko'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'rabbitmq'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'sahara'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'searchlight'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'senlin'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'solum'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'swift'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'tacker'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'watcher'})

TASK [common : include] ********************************************************
included: /usr/share/kolla-ansible/ansible/roles/common/tasks/bootstrap.yml for localhost

TASK [common : Creating log volume] ********************************************
ok: [localhost]

TASK [common : include] ********************************************************
included: /usr/share/kolla-ansible/ansible/roles/common/tasks/start.yml for localhost

TASK [common : Starting fluentd container] *************************************
ok: [localhost]

TASK [common : Starting kolla-toolbox container] *******************************
changed: [localhost]

TASK [common : Initializing toolbox container using normal user] ***************
fatal: [localhost]: FAILED! => {"changed": false, "cmd": ["docker", "exec", "-t", "kolla_toolbox", "/usr/bin/ansible", "--version"], "delta": "0:00:00.013034", "end": "2017-04-05 11:17:21.518591", "failed": true, "rc": 1, "start": "2017-04-05 11:17:21.505557", "stderr": "Error response from daemon: Container b7189fd542a7f5156792012e2b5d59736612cd660787e233cff413829df7ed53 is restarting, wait until the container is running", "stdout": "", "stdout_lines": [], "warnings": []}
 to retry, use: --limit @/usr/share/kolla-ansible/ansible/site.retry

PLAY RECAP *********************************************************************
localhost : ok=25 changed=1 unreachable=0 failed=1

Command failed ansible-playbook -i /root/all-in-one -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla -e action=deploy /usr/share/kolla-ansible/ansible/site.yml

Listing the containers indicates that the toolbox container is stuck in a restart loop:

# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
776664dba696 kolla/centos-binary-kolla-toolbox:4.0.0 "kolla_start" About a minute ago Restarting (1) 7 seconds ago kolla_toolbox
42b5712ebbb7 kolla/centos-binary-fluentd:4.0.0 "kolla_start" About a minute ago Up About a minute fluentd

shaofeng cheng (shaofeng-cheng) wrote :

Please provide more information for logs.

docker logs kolla_toolbox

Ben Swartzlander (bswartz) wrote :

[root@scsor0010293001 ~]# docker logs kolla_toolbox
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?

Juise (askjuise) wrote :

Got same problem, run kolla in ubuntu, image -> kolla/centos-binary-kolla-toolbox:4.0.0

Erwan Le Bonniec (noisynoise) wrote :

I experienced this issue another time (centos/source/5.0.0), but find out that removing the /run mounting on kolla_toolbox in ansible task make the issue disappear. But still did not find the root cause leading to this behaviour.

Erwan Le Bonniec (noisynoise) wrote :

Maybe it can help someone : unscd seems to be guilty. Stopping this service on the hosts with "systemctl stop unscd" solved the issue.

afrontera (afrontera) wrote :

Same problem here. Via IRC someone suggests to disable NSCD service due to weird bug with docker. It works.

sean mooney (sean-k-mooney) wrote :

i hit this on a suse host today also with a ubuntu container

stoping NSCD on the host fixed the issue.

Changed in kolla-ansible:
status: New → Confirmed
Chason Chan (chen-xing) on 2017-12-01
Changed in kolla-ansible:
importance: Undecided → High
assignee: nobody → Chason Chan (chen-xing)
Jeffrey Zhang (jeffrey4l) wrote :

I hit this issue too. stop NSCD works. just mark this as invalid.

Changed in kolla-ansible:
status: Confirmed → Invalid

Fix proposed to branch: master
Review: https://review.openstack.org/537348

Changed in kolla-ansible:
status: Invalid → In Progress

Change abandoned by Chason Chan (<email address hidden>) on branch: master
Review: https://review.openstack.org/537348

Mark Goddard (mgoddard) on 2018-09-14
Changed in kolla-ansible:
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers