stable/train ovb jobs faling timeouts w/ paunch launching containers

Bug #1907503 reported by wes hayutin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

2020-12-09 19:13:20 | 2020-12-09 19:13:16.749074 | fa163ef9-bbd3-a467-0c63-0000000031f4 | FATAL | Wait for containers to start for step 2 using paunch | overcloud-controller-0 | error={"ansible_job_id": "497122859850.32066", "attempts": 1200, "changed": false, "finished": 0, "started": 1}

- mistral can not connect:
2020-12-09 17:23:34.899 ERROR /var/log/containers/mistral/executor.log.1: 8 ERROR mistral.executors.default_executor Traceback (most recent call last):
2020-12-09 17:23:34.899 ERROR /var/log/containers/mistral/executor.log.1: 8 ERROR mistral.executors.default_executor File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 672, in urlopen
2020-12-09 17:23:34.899 ERROR /var/log/containers/mistral/executor.log.1: 8 ERROR mistral.executors.default_executor chunked=chunked,
2020-12-09 17:23:34.899 ERROR /var/log/containers/mistral/executor.log.1: 8 ERROR mistral.executors.default_executor File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 421, in _make_request
2020-12-09 17:23:34.899 ERROR /var/log/containers/mistral/executor.log.1: 8 ERROR mistral.executors.default_executor six.raise_from(e, None)
2020-12-09 17:23:34.899 ERROR /var/log/containers/mistral/executor.log.1: 8 ERROR mistral.executors.default_executor File "<string>", line 3, in raise_from
2020-12-09 17:23:34.899 ERROR /var/log/containers/mistral/executor.log.1: 8 ERROR mistral.executors.default_executor File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 416, in _make_request
2020-12-09 17:23:34.899 ERROR /var/log/containers/mistral/executor.log.1: 8 ERROR mistral.executors.default_executor httplib_response = conn.getresponse()
2020-12-09 17:23:34.899 ERROR /var/log/containers/mistral/executor.log.1: 8 ERROR mistral.executors.default_executor File "/usr/lib64/python3.6/http/client.py", line 1346, in getresponse
2020-12-09 17:23:34.899 ERROR /var/log/containers/mistral/executor.log.1: 8 ERROR mistral.executors.default_executor response.begin()
2020-12-09 17:23:34.899 ERROR /var/log/containers/mistral/executor.log.1: 8 ERROR mistral.executors.default_executor File "/usr/lib64/python3.6/http/client.py", line 307, in begin
2020-12-09 17:23:34.899 ERROR /var/log/containers/mistral/executor.log.1: 8 ERROR mistral.executors.default_executor version, status, reason = self._read_status()
2020-12-09 17:23:34.899 ERROR /var/log/containers/mistral/executor.log.1: 8 ERROR mistral.executors.default_executor File "/usr/lib64/python3.6/http/client.py", line 276, in _read_status
2020-12-09 17:23:34.899 ERROR /var/log/containers/mistral/executor.log.1: 8 ERROR mistral.executors.default_executor raise RemoteDisconnected("Remote end closed connection without"
2020-12-09 17:23:34.899 ERROR /var/log/containers/mistral/executor.log.1: 8 ERROR mistral.executors.default_executor http.client.RemoteDisconnected: Remote end closed connection without

oslo times out:
2020-12-09 17:08:07.273 ERROR /var/log/containers/ironic/ironic-conductor.log: 7 ERROR oslo.service.loopingcall Traceback (most recent call last):
2020-12-09 17:08:07.273 ERROR /var/log/containers/ironic/ironic-conductor.log: 7 ERROR oslo.service.loopingcall File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 154, in _run_loop
2020-12-09 17:08:07.273 ERROR /var/log/containers/ironic/ironic-conductor.log: 7 ERROR oslo.service.loopingcall idle = idle_for_func(result, self._elapsed(watch))
2020-12-09 17:08:07.273 ERROR /var/log/containers/ironic/ironic-conductor.log: 7 ERROR oslo.service.loopingcall File "/usr/lib/python3.6/site-packages/oslo_service/loopingcall.py", line 351, in _idle_for
2020-12-09 17:08:07.273 ERROR /var/log/containers/ironic/ironic-conductor.log: 7 ERROR oslo.service.loopingcall % self._error_time)
2020-12-09 17:08:07.273 ERROR /var/log/containers/ironic/ironic-conductor.log: 7 ERROR oslo.service.loopingcall oslo_service.loopingcall.LoopingCallTimeOut: Looping call timed out after 19.85 seconds
2020-12-09 17:08:07.273 ERROR /var/log/containers/ironic/ironic-conductor.log: 7 ERROR oslo.service.loopingcall
2020-12-09 17:08:07.280 ERROR /var/log/containers/ironic/ironic-conductor.log: 7 ERROR ironic.conductor.utils [req-3a0c7f0a-3739-44c8-bcae-e6ea892bf1dd 8167cccf17ab400fa2c65481591e2512 28478479dabc49c0a2003dd120f8f330 - default default] Timed out after 30 secs waiting for power off on node 5379a138-1bd2-414a-8265-f6b879642446.: oslo_service.loopingcall.LoopingCallTimeOut: Looping call timed out after 19.85 seconds

ironic times out:
2020-12-09 17:08:32.362 ERROR /var/log/containers/ironic/ironic-conductor.log: 7 ERROR ironic.conductor.utils [req-8492f23f-fa4f-4627-b878-2d4705b9dfe9 8167cccf17ab400fa2c65481591e2512 28478479dabc49c0a2003dd120f8f330 - default default] Timed out after 30 secs waiting for power off on node 23741894-67ec-419e-8e2d-6fd423488b55.: oslo_service.loopingcall.LoopingCallTimeOut: Looping call timed out after 28.18 seconds

Container Status:
https://logserver.rdoproject.org/50/766250/1/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001/b9beb96/logs/undercloud/var/log/extra/podman/podman_allinfo.log.gz

https://logserver.rdoproject.org/62/766262/1/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001/85b987e/logs/undercloud/var/log/extra/all_available_packages.txt.gz

pacemaker-cluster-libs-0:2.0.4-6.el8.i686
pacemaker-cluster-libs-0:2.0.4-6.el8.x86_64
pacemaker-libs-0:2.0.4-6.el8.i686
pacemaker-libs-0:2.0.4-6.el8.x86_64
pacemaker-schemas-0:2.0.4-6.el8.noarch

Containers have:

pacemaker.x86_64 2.0.3-5.el8_2.1 @HighAvailability
pacemaker-cli.x86_64 2.0.3-5.el8_2.1 @HighAvailability
pacemaker-cluster-libs.x86_64 2.0.3-5.el8_2.1 @AppStream
pacemaker-libs.x86_64 2.0.3-5.el8_2.1 @AppStream
pacemaker-remote.x86_64 2.0.3-5.el8_2.1 @HighAvailability
pacemaker-schemas.noarch 2.0.3-5.el8_2.1 @AppStream

Revision history for this message
wes hayutin (weshayutin) wrote :

pacemaker rpms are out of sync between containers and nodes... needs a promotion to fix it.

Launched https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic-integration-stable3

Changed in tripleo:
assignee: nobody → Zahid Hasan (akkim31)
status: Triaged → Confirmed
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.