kolla-toolbox container stuck in restart loop

Bug #1680139 reported by Ben Swartzlander
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Invalid
High
Chason Chan

Bug Description

I followed the quickstart guide (deployer workflow) using CentOS 7.3 and it fails at the deployment step with a problem in the tookbox container.

# kolla-ansible deploy -i /root/all-in-one
Deploying Playbooks : ansible-playbook -i /root/all-in-one -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla -e action=deploy /usr/share/kolla-ansible/ansible/site.yml

PLAY [Gather facts for all hosts] **********************************************

TASK [setup] *******************************************************************
ok: [localhost]

PLAY [Gather facts for all hosts (if using --limit)] ***************************

TASK [setup] *******************************************************************
skipping: [localhost] => (item=localhost)

PLAY [Detect openstack_release variable] ***************************************

TASK [Get current kolla-ansible version number] ********************************
ok: [localhost -> localhost]

TASK [Set openstack_release variable] ******************************************
ok: [localhost]

PLAY [Apply role prechecks] ****************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [prechecks : Checking the api_interface is present] ***********************
skipping: [localhost]

TASK [prechecks : Checking the api_interface is active] ************************
skipping: [localhost]

TASK [prechecks : Checking the api_interface configuration] ********************
skipping: [localhost]

TASK [prechecks : Checking the api_interface ip address configuration] *********
skipping: [localhost]

TASK [prechecks : Checking Docker version] *************************************
skipping: [localhost]

TASK [prechecks : Checking empty passwords in passwords.yml. Run kolla-genpwd if this task fails] ***
skipping: [localhost]

TASK [prechecks : Checking docker-py version] **********************************
skipping: [localhost]

TASK [prechecks : Checking Ansible version] ************************************
skipping: [localhost]

PLAY [Apply role chrony] *******************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [common : include] ********************************************************
skipping: [localhost]

TASK [common : Registering common role has run] ********************************
skipping: [localhost]

TASK [chrony : include] ********************************************************
skipping: [localhost]

PLAY [Apply role collectd] *****************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [common : include] ********************************************************
skipping: [localhost]

TASK [common : Registering common role has run] ********************************
skipping: [localhost]

TASK [collectd : include] ******************************************************
skipping: [localhost]

PLAY [Apply role elasticsearch] ************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [common : include] ********************************************************
skipping: [localhost]

TASK [common : Registering common role has run] ********************************
skipping: [localhost]

TASK [elasticsearch : include] *************************************************
skipping: [localhost]

PLAY [Apply role influxdb] *****************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [common : include] ********************************************************
skipping: [localhost]

TASK [common : Registering common role has run] ********************************
skipping: [localhost]

TASK [influxdb : include] ******************************************************
skipping: [localhost]

PLAY [Apply role telegraf] *****************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [common : include] ********************************************************
skipping: [localhost]

TASK [common : Registering common role has run] ********************************
skipping: [localhost]

TASK [telegraf : include] ******************************************************
skipping: [localhost]

PLAY [Apply role haproxy] ******************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [common : include] ********************************************************
included: /usr/share/kolla-ansible/ansible/roles/common/tasks/deploy.yml for localhost

TASK [common : include] ********************************************************
included: /usr/share/kolla-ansible/ansible/roles/common/tasks/config.yml for localhost

TASK [common : Ensuring config directories exist] ******************************
ok: [localhost] => (item=fluentd)
ok: [localhost] => (item=fluentd/input)
ok: [localhost] => (item=fluentd/output)
ok: [localhost] => (item=fluentd/format)
ok: [localhost] => (item=fluentd/filter)
ok: [localhost] => (item=kolla-toolbox)
ok: [localhost] => (item=cron)
ok: [localhost] => (item=cron/logrotate)

TASK [common : Copying over config.json files for services] ********************
ok: [localhost] => (item=fluentd)
ok: [localhost] => (item=kolla-toolbox)
ok: [localhost] => (item=cron)

TASK [common : Copying over fluentd input config files] ************************
ok: [localhost] => (item=00-global)
ok: [localhost] => (item=01-syslog)
ok: [localhost] => (item=02-mariadb)
ok: [localhost] => (item=03-rabbitmq)

TASK [common : Copying over fluentd ouput config files] ************************
ok: [localhost] => (item={u'enabled': True, u'name': u'00-local'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'01-es'})

TASK [common : Copying over fluentd format config files] ***********************
ok: [localhost] => (item=apache_access)
ok: [localhost] => (item=wsgi_access)
ok: [localhost] => (item=wsgi_python)

TASK [common : Copying over fluentd filter config files] ***********************
ok: [localhost] => (item=00-record_transformer)
ok: [localhost] => (item=01-rewrite)

TASK [common : Copying over tg-agent.conf] *************************************
ok: [localhost] => (item=fluentd)

TASK [common : Copying over cron logrotate config files] ***********************
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'ansible'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'aodh'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'barbican'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'ceilometer'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'cinder'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'cloudkitty'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'designate'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'elasticsearch'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'freezer'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'glance'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'global'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'gnocchi'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'grafana'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'haproxy'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'heat'})
skipping: [localhost] => (item={u'enabled': False, u'name': u'iscsid'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'karbor'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'keepalived'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'keystone'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'magnum'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'manila'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'mariadb'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'mistral'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'murano'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'neutron'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'nova'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'octavia'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'panko'})
ok: [localhost] => (item={u'enabled': u'yes', u'name': u'rabbitmq'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'sahara'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'searchlight'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'senlin'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'solum'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'swift'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'tacker'})
skipping: [localhost] => (item={u'enabled': u'no', u'name': u'watcher'})

TASK [common : include] ********************************************************
included: /usr/share/kolla-ansible/ansible/roles/common/tasks/bootstrap.yml for localhost

TASK [common : Creating log volume] ********************************************
ok: [localhost]

TASK [common : include] ********************************************************
included: /usr/share/kolla-ansible/ansible/roles/common/tasks/start.yml for localhost

TASK [common : Starting fluentd container] *************************************
ok: [localhost]

TASK [common : Starting kolla-toolbox container] *******************************
changed: [localhost]

TASK [common : Initializing toolbox container using normal user] ***************
fatal: [localhost]: FAILED! => {"changed": false, "cmd": ["docker", "exec", "-t", "kolla_toolbox", "/usr/bin/ansible", "--version"], "delta": "0:00:00.013034", "end": "2017-04-05 11:17:21.518591", "failed": true, "rc": 1, "start": "2017-04-05 11:17:21.505557", "stderr": "Error response from daemon: Container b7189fd542a7f5156792012e2b5d59736612cd660787e233cff413829df7ed53 is restarting, wait until the container is running", "stdout": "", "stdout_lines": [], "warnings": []}
 to retry, use: --limit @/usr/share/kolla-ansible/ansible/site.retry

PLAY RECAP *********************************************************************
localhost : ok=25 changed=1 unreachable=0 failed=1

Command failed ansible-playbook -i /root/all-in-one -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla -e action=deploy /usr/share/kolla-ansible/ansible/site.yml

Listing the containers indicates that the toolbox container is stuck in a restart loop:

# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
776664dba696 kolla/centos-binary-kolla-toolbox:4.0.0 "kolla_start" About a minute ago Restarting (1) 7 seconds ago kolla_toolbox
42b5712ebbb7 kolla/centos-binary-fluentd:4.0.0 "kolla_start" About a minute ago Up About a minute fluentd

Revision history for this message
shaofeng cheng (shaofeng-cheng) wrote :

Please provide more information for logs.

docker logs kolla_toolbox

Revision history for this message
Ben Swartzlander (bswartz) wrote :

[root@scsor0010293001 ~]# docker logs kolla_toolbox
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?
sudo: unknown uid 42401: who are you?

Revision history for this message
Juise (askjuise) wrote :

Got same problem, run kolla in ubuntu, image -> kolla/centos-binary-kolla-toolbox:4.0.0

Revision history for this message
Erwan Le Bonniec (noisynoise) wrote :

I experienced this issue another time (centos/source/5.0.0), but find out that removing the /run mounting on kolla_toolbox in ansible task make the issue disappear. But still did not find the root cause leading to this behaviour.

Revision history for this message
Erwan Le Bonniec (noisynoise) wrote :

Maybe it can help someone : unscd seems to be guilty. Stopping this service on the hosts with "systemctl stop unscd" solved the issue.

Revision history for this message
afrontera (afrontera) wrote :

Same problem here. Via IRC someone suggests to disable NSCD service due to weird bug with docker. It works.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

i hit this on a suse host today also with a ubuntu container

stoping NSCD on the host fixed the issue.

Changed in kolla-ansible:
status: New → Confirmed
Chason Chan (chen-xing)
Changed in kolla-ansible:
importance: Undecided → High
assignee: nobody → Chason Chan (chen-xing)
Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote :

I hit this issue too. stop NSCD works. just mark this as invalid.

Changed in kolla-ansible:
status: Confirmed → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.openstack.org/537348

Changed in kolla-ansible:
status: Invalid → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kolla-ansible (master)

Change abandoned by Chason Chan (<email address hidden>) on branch: master
Review: https://review.openstack.org/537348

Mark Goddard (mgoddard)
Changed in kolla-ansible:
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.