[tripleo_container_manage] "Async container create/run" starts then "Check podman create status" times out

Bug #1861093 reported by Bogdan Dobrelya
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Bogdan Dobrelya

Bug Description

observed in https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_a52/704443/2/check/tripleo-ci-centos-7-undercloud-containers/a52e7de/logs/undercloud/home/zuul/undercloud_install.log

undercloud_install.log:

TASK [tripleo_container_manage : Async container create/run] *******************
Tuesday 28 January 2020 08:12:13 +0000 (0:00:00.334) 0:31:20.709 *******
changed: [undercloud] => (item={u'nova_db_sync': {u'start_order': 3, u'image': u'192.168.24.1:8787/tripleomaster/centos-binary-nova-conductor:6d67ae9682c5ab2fa6c9fc2e44970d8ad4a53673_3bda53e2-updated-20200128072021', u'environment': {u'TRIPLEO_CONFIG_HASH': u'753bcbf0bb5fcd57918211e47c880e9b-753bcbf0bb5fcd57918211e47c880e9b'}, u'command': u"/usr/bin/bootstrap_host_exec nova_conductor su nova -s /bin/bash -c '/usr/bin/nova-manage db sync'", u'user': u'root', u'volumes': [u'/etc/hosts:/etc/hosts:ro', u'/etc/localtime:/etc/localtime:ro', u'/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', u'/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', u'/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', u'/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', u'/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', u'/dev/log:/dev/log', u'/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', u'/etc/puppet:/etc/puppet:ro', u'/var/log/containers/nova:/var/log/nova', u'/var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro', u'/var/lib/config-data/nova/etc/nova/:/etc/nova/:ro'], u'net': u'host', u'detach': False}})

TASK [tripleo_container_manage : Check podman create status] *******************
Tuesday 28 January 2020 08:12:14 +0000 (0:00:00.502) 0:31:21.212 *******
FAILED - RETRYING: Check podman create status (30 retries left).
FAILED - RETRYING: Check podman create status (29 retries left).
FAILED - RETRYING: Check podman create status (28 retries left).
FAILED - RETRYING: Check podman create status (27 retries left).
FAILED - RETRYING: Check podman create status (26 retries left).
FAILED - RETRYING: Check podman create status (25 retries left).
FAILED - RETRYING: Check podman create status (24 retries left).
FAILED - RETRYING: Check podman create status (23 retries left).
FAILED - RETRYING: Check podman create status (22 retries left).
FAILED - RETRYING: Check podman create status (21 retries left).
FAILED - RETRYING: Check podman create status (20 retries left).
FAILED - RETRYING: Check podman create status (19 retries left).
FAILED - RETRYING: Check podman create status (18 retries left).
FAILED - RETRYING: Check podman create status (17 retries left).
FAILED - RETRYING: Check podman create status (16 retries left).
FAILED - RETRYING: Check podman create status (15 retries left).
FAILED - RETRYING: Check podman create status (14 retries left).
FAILED - RETRYING: Check podman create status (13 retries left).
FAILED - RETRYING: Check podman create status (12 retries left).
FAILED - RETRYING: Check podman create status (11 retries left).
FAILED - RETRYING: Check podman create status (10 retries left).
FAILED - RETRYING: Check podman create status (9 retries left).
FAILED - RETRYING: Check podman create status (8 retries left).
FAILED - RETRYING: Check podman create status (7 retries left).
FAILED - RETRYING: Check podman create status (6 retries left).
FAILED - RETRYING: Check podman create status (5 retries left).
FAILED - RETRYING: Check podman create status (4 retries left).
FAILED - RETRYING: Check podman create status (3 retries left).
FAILED - RETRYING: Check podman create status (2 retries left).
FAILED - RETRYING: Check podman create status (1 retries left).
failed: [undercloud] (item={'ansible_loop_var': u'container_data', u'ansible_job_id': u'592488619076.106340', u'started': 1, 'changed': True, 'failed': False, u'finished': 0, u'results_file': u'/root/.ansible_async/592488619076.106340', u'container_data': {u'nova_db_sync': {u'start_order': 3, u'image': u'192.168.24.1:8787/tripleomaster/centos-binary-nova-conductor:6d67ae9682c5ab2fa6c9fc2e44970d8ad4a53673_3bda53e2-updated-20200128072021', u'environment': {u'TRIPLEO_CONFIG_HASH': u'753bcbf0bb5fcd57918211e47c880e9b-753bcbf0bb5fcd57918211e47c880e9b'}, u'command': u"/usr/bin/bootstrap_host_exec nova_conductor su nova -s /bin/bash -c '/usr/bin/nova-manage db sync'", u'user': u'root', u'volumes': [u'/etc/hosts:/etc/hosts:ro', u'/etc/localtime:/etc/localtime:ro', u'/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', u'/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', u'/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', u'/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', u'/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', u'/dev/log:/dev/log', u'/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', u'/etc/puppet:/etc/puppet:ro', u'/var/log/containers/nova:/var/log/nova', u'/var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro', u'/var/lib/config-data/nova/etc/nova/:/etc/nova/:ro'], u'net': u'host', u'detach': False}}}) => {"ansible_job_id": "592488619076.106340", "ansible_loop_var": "create_async_result_item", "attempts": 30, "changed": false, "create_async_result_item": {"ansible_job_id": "592488619076.106340", "ansible_loop_var": "container_data", "changed": true, "container_data": {"nova_db_sync": {"command": "/usr/bin/bootstrap_host_exec nova_conductor su nova -s /bin/bash -c '/usr/bin/nova-manage db sync'", "detach": false, "environment": {"TRIPLEO_CONFIG_HASH": "753bcbf0bb5fcd57918211e47c880e9b-753bcbf0bb5fcd57918211e47c880e9b"}, "image": "192.168.24.1:8787/tripleomaster/centos-binary-nova-conductor:6d67ae9682c5ab2fa6c9fc2e44970d8ad4a53673_3bda53e2-updated-20200128072021", "net": "host", "start_order": 3, "user": "root", "volumes": ["/etc/hosts:/etc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro", "/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro", "/etc/puppet:/etc/puppet:ro", "/var/log/containers/nova:/var/log/nova", "/var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro", "/var/lib/config-data/nova/etc/nova/:/etc/nova/:ro"]}}, "failed": false, "finished": 0, "results_file": "/root/.ansible_async/592488619076.106340", "started": 1}, "finished": 0, "started": 1}

journal:

Jan 28 08:12:14 undercloud.localdomain ansible-async_status[106354]: Invoked with jid=592488619076.106340 mode=status _async_dir=/root/.ansible_async
Jan 28 08:12:14 undercloud.localdomain ansible-podman_container[106345]: PODMAN-CONTAINER-DEBUG: podman run --name nova_db_sync --tty=False --detach=False --cpu-shares 0 --network host --conmon-pidfile /var/run/nova_db_sync.pid --label config_data={'start_order': 3, 'image': '192.168.24.1:8787/tripleomaster/centos-binary-nova-conductor:6d67ae9682c5ab2fa6c9fc2e44970d8ad4a53673_3bda53e2-updated-20200128072021', 'environment': {'TRIPLEO_CONFIG_HASH': '753bcbf0bb5fcd57918211e47c880e9b-753bcbf0bb5fcd57918211e47c880e9b'}, 'command': "/usr/bin/bootstrap_host_exec nova_conductor su nova -s /bin/bash -c '/usr/bin/nova-manage db sync'", 'user': 'root', 'volumes': ['/etc/hosts:/etc/hosts:ro', '/etc/localtime:/etc/localtime:ro', '/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', '/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '/dev/log:/dev/log', '/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '/etc/puppet:/etc/puppet:ro', '/var/log/containers/nova:/var/log/nova', '/var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro', '/var/lib/config-data/nova/etc/nova/:/etc/nova/:ro'], 'net': 'host', 'detach': False} --label managed_by=tripleo_ansible --label config_id=tripleo_step3 --label container_name=nova_db_sync --rm=False --volume /etc/hosts:/etc/hosts:ro --volume /etc/localtime:/etc/localtime:ro --volume /etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume /etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume /etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume /etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume /etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume /dev/log:/dev/log --volume /etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume /etc/puppet:/etc/puppet:ro --volume /var/log/containers/nova:/var/log/nova --volume /var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro --volume /var/lib/config-data/nova/etc/nova/:/etc/nova/:ro --log-opt path=/var/log/containers/stdouts/nova_db_sync.log --interactive=False --stop-timeout 0 --env TRIPLEO_CONFIG_HASH=753bcbf0bb5fcd57918211e47c880e9b-753bcbf0bb5fcd57918211e47c880e9b --privileged=False --log-driver k8s-file --user root 192.168.24.1:8787/tripleomaster/centos-binary-nova-conductor:6d67ae9682c5ab2fa6c9fc2e44970d8ad4a53673_3bda53e2-updated-20200128072021 /usr/bin/bootstrap_host_exec nova_conductor su nova -s /bin/bash -c '/usr/bin/nova-manage db sync'

Jan 28 08:12:15 undercloud.localdomain podman[106404]: 2020-01-28 08:12:15.134506156 +0000 UTC m=+0.321840108 container create 2d9f98158e7464ae5aba248ade54c7dbff7a7e5ba30e37a6cfe82cbd71528fdd (image=192.168.24.1:8787/tripleomaster/centos-binary-nova-conductor:6d67ae9682c5ab2fa6c9fc2e44970d8ad4a53673_3bda53e2-updated-20200128072021, name=nova_db_sync)
Jan 28 08:12:15 undercloud.localdomain systemd[1]: Started libpod-conmon-2d9f98158e7464ae5aba248ade54c7dbff7a7e5ba30e37a6cfe82cbd71528fdd.scope.

Jan 28 08:12:15 undercloud.localdomain systemd[1]: Started libcontainer container 2d9f98158e7464ae5aba248ade54c7dbff7a7e5ba30e37a6cfe82cbd71528fdd.
Jan 28 08:12:15 undercloud.localdomain kernel: SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
Jan 28 08:12:15 undercloud.localdomain podman[106404]: 2020-01-28 08:12:15.677043634 +0000 UTC m=+0.864377105 container init 2d9f98158e7464ae5aba248ade54c7dbff7a7e5ba30e37a6cfe82cbd71528fdd (image=192.168.24.1:8787/tripleomaster/centos-binary-nova-conductor:6d67ae9682c5ab2fa6c9fc2e44970d8ad4a53673_3bda53e2-updated-20200128072021, name=nova_db_sync)
Jan 28 08:12:15 undercloud.localdomain podman[106404]: 2020-01-28 08:12:15.724718783 +0000 UTC m=+0.912052267 container start 2d9f98158e7464ae5aba248ade54c7dbff7a7e5ba30e37a6cfe82cbd71528fdd (image=192.168.24.1:8787/tripleomaster/centos-binary-nova-conductor:6d67ae9682c5ab2fa6c9fc2e44970d8ad4a53673_3bda53e2-updated-20200128072021, name=nova_db_sync)
Jan 28 08:12:15 undercloud.localdomain podman[106404]: 2020-01-28 08:12:15.724851531 +0000 UTC m=+0.912185020 container attach 2d9f98158e7464ae5aba248ade54c7dbff7a7e5ba30e37a6cfe82cbd71528fdd (image=192.168.24.1:8787/tripleomaster/centos-binary-nova-conductor:6d67ae9682c5ab2fa6c9fc2e44970d8ad4a53673_3bda53e2-updated-20200128072021, name=nova_db_sync)

Jan 28 08:14:57 undercloud.localdomain podman[106404]: 2020-01-28 08:14:57.138694102 +0000 UTC m=+162.326027620 container died 2d9f98158e7464ae5aba248ade54c7dbff7a7e5ba30e37a6cfe82cbd71528fdd (image=192.168.24.1:8787/tripleomaster/centos-binary-nova-conductor:6d67ae9682c5ab2fa6c9fc2e44970d8ad4a53673_3bda53e2-updated-20200128072021, name=nova_db_sync)
Jan 28 08:14:57 undercloud.localdomain podman[106404]: 2020-01-28 08:14:57.703203243 +0000 UTC m=+162.890536770 container remove 2d9f98158e7464ae5aba248ade54c7dbff7a7e5ba30e37a6cfe82cbd71528fdd (image=192.168.24.1:8787/tripleomaster/centos-binary-nova-conductor:6d67ae9682c5ab2fa6c9fc2e44970d8ad4a53673_3bda53e2-updated-20200128072021, name=nova_db_sync)

So the exec seems completed, byt async loop timeouts failing detect results?..

Changed in tripleo:
importance: Undecided → High
milestone: none → ussuri-2
status: New → Triaged
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/704626

Changed in tripleo:
assignee: nobody → Bogdan Dobrelya (bogdando)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (master)

Reviewed: https://review.opendev.org/704626
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=a4db9a480b9e26dbe51c26873088b92ec1db32f8
Submitter: Zuul
Branch: master

commit a4db9a480b9e26dbe51c26873088b92ec1db32f8
Author: Bogdan Dobrelya <email address hidden>
Date: Tue Jan 28 17:28:16 2020 +0100

    Rise async value for containers starting commands

    Official documentation states:

    If the value of async: is not high enough, this will cause the "check
    on it later" task to fail because the temporary status file that the
    async_status: is looking for will not have been written or no longer
    exist.

    And the listed there example has async=1000.

    Rize the value of a 300 used in tripleo_container_manage for creating
    new containers to appear to be "high enough"...

    Change-Id: I598d230bfbb5696f1c0799295d9e456eade56676
    Closes-bug: #1861093
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/706606

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (stable/train)

Reviewed: https://review.opendev.org/706606
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=7560cc212a38fa9618b40ff1b27ae279120a7462
Submitter: Zuul
Branch: stable/train

commit 7560cc212a38fa9618b40ff1b27ae279120a7462
Author: Bogdan Dobrelya <email address hidden>
Date: Tue Jan 28 17:28:16 2020 +0100

    Rise async value for containers starting commands

    Official documentation states:

    If the value of async: is not high enough, this will cause the "check
    on it later" task to fail because the temporary status file that the
    async_status: is looking for will not have been written or no longer
    exist.

    And the listed there example has async=1000.

    Rize the value of a 300 used in tripleo_container_manage for creating
    new containers to appear to be "high enough"...

    Change-Id: I598d230bfbb5696f1c0799295d9e456eade56676
    Closes-bug: #1861093
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit a4db9a480b9e26dbe51c26873088b92ec1db32f8)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-ansible 1.2.0

This issue was fixed in the openstack/tripleo-ansible 1.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-ansible 0.5.0

This issue was fixed in the openstack/tripleo-ansible 0.5.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.