ovb jobs intermittent fail (race?) during overcloud deploy with rabbit/pcs cluster errors

Bug #1837843 reported by Marios Andreou
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Unassigned

Bug Description

This is intermittent but seen enough times to need a bug. The trace during overcloud deploy is like:

        2019-07-22 02:46:58 | "puppet-user: error: Could not connect to cluster (is it running?)",
        ...
        2019-07-22 02:58:18 | "stderr: Error: unable to find resource 'galera-bundle'",
        ...
        2019-07-22 02:58:18 | "Error running ['docker', 'run', '--name', 'rabbitmq_init_bundle', '--label', 'config_id=tripleo_step2', '--label', 'container_name=rabbitmq_init_bundle', '--label', 'managed_by=paunch', '--label', 'config_data={\"ipc\": \"host\", \"start_order\": 1, \"image\": \"192.168.24.1:8787/tripleomaster/centos-binary-rabbitmq:4cadc580aed3cde73c487f827f76bf7b92b4d1e5_10e135ca-updated-20190722011122\", \"environment\": [\"LANG=en_US.UTF-8\", \"LC_ALL=en_US.UTF-8\", \"TRIPLEO_DEPLOY_IDENTIFIER=1563761118\"], \"command\": [\"/container_puppet_apply.sh\", \"2\", \"file,file_line,concat,augeas,pacemaker::resource::bundle,pacemaker::property,pacemaker::resource::ocf,pacemaker::constraint::order,pacemaker::constraint::colocation,rabbitmq_policy,rabbitmq_user,rabbitmq_ready\", \"include ::tripleo::profile::base::pacemaker;include ::tripleo::profile::pacemaker::rabbitmq_bundle\", \"\"], \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/var/lib/container-config-scripts/container_puppet_apply.sh:/container_puppet_apply.sh:ro\", \"/etc/puppet:/tmp/puppet-etc:ro\", \"/usr/share/openstack-puppet/modules:/usr/share/openstack-puppet/modules:ro\", \"/etc/corosync/corosync.conf:/etc/corosync/corosync.conf:ro\", \"/bin/true:/bin/epmd\"], \"net\": \"host\", \"detach\": false}', '--env=LANG=en_US.UTF-8', '--env=LC_ALL=en_US.UTF-8', '--env=TRIPLEO_DEPLOY_IDENTIFIER=1563761118', '--net=host', '--ipc=host', '--user=root', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/var/lib/container-config-scripts/container_puppet_apply.sh:/container_puppet_apply.sh:ro', '--volume=/etc/puppet:/tmp/puppet-etc:ro', '--volume=/usr/share/openstack-puppet/modules:/usr/share/openstack-puppet/modules:ro', '--volume=/etc/corosync/corosync.conf:/etc/corosync/corosync.conf:ro', '--volume=/bin/true:/bin/epmd', '192.168.24.1:8787/tripleomaster/centos-binary-rabbitmq:4cadc580aed3cde73c487f827f76bf7b92b4d1e5_10e135ca-updated-20190722011122', '/container_puppet_apply.sh', '2', 'file,file_line,concat,augeas,pacemaker::resource::bundle,pacemaker::property,pacemaker::resource::ocf,pacemaker::constraint::order,pacemaker::constraint::colocation,rabbitmq_policy,rabbitmq_user,rabbitmq_ready', 'include ::tripleo::profile::base::pacemaker;include ::tripleo::profile::pacemaker::rabbitmq_bundle', '']. [6]",
        ...
        2019-07-22 02:58:18 | "Error: Facter: error while resolving custom fact \"rabbitmq_nodename\": undefined method `[]' for nil:NilClass",

Examples at [1][2] and you can see more relevant logs at [3]

[1] http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035-master/fc50eab/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
[2] https://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/e69ad4b/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
[3] https://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/e69ad4b/logs/overcloud-controller-2/var/log/extra/docker/containers/rabbitmq_init_bundle/stdout.log.txt.gz

Tags: alert ci
wes hayutin (weshayutin)
Changed in tripleo:
milestone: none → train-3
wes hayutin (weshayutin)
tags: added: alert
Changed in tripleo:
milestone: train-3 → ussuri-1
Changed in tripleo:
milestone: ussuri-1 → ussuri-2
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.