f28 standalon fails at timeout

Bug #1804649 reported by Quique Llorente on 2018-11-22
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
Quique Llorente

Bug Description

At standalone deploy beginning it timeouts accessing heat
http://logs.openstack.org/56/618056/9/check/tripleo-ci-fedora-28-standalone/d0f031d/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz#_2018-11-22_12_33_54
2018-11-22 12:33:54 | Exception: timed out
2018-11-22 12:33:54 | Traceback (most recent call last):
2018-11-22 12:33:54 | File "/usr/lib/python3.6/site-packages/tripleoclient/v1/tripleo_deploy.py", line 1134, in _standalone_deploy
2018-11-22 12:33:54 | utils.wait_api_port_ready(parsed_args.heat_api_port)
2018-11-22 12:33:54 | File "/usr/lib/python3.6/site-packages/tripleoclient/utils.py", line 1086, in wait_api_port_ready
2018-11-22 12:33:54 | request.urlopen("http://%s:%s/" % (host, api_port), timeout=1)
2018-11-22 12:33:54 | File "/usr/lib64/python3.6/urllib/request.py", line 223, in urlopen
2018-11-22 12:33:54 | return opener.open(url, data, timeout)
2018-11-22 12:33:54 | File "/usr/lib64/python3.6/urllib/request.py", line 526, in open
2018-11-22 12:33:54 | response = self._open(req, data)
2018-11-22 12:33:54 | File "/usr/lib64/python3.6/urllib/request.py", line 544, in _open
2018-11-22 12:33:54 | '_open', req)
2018-11-22 12:33:54 | File "/usr/lib64/python3.6/urllib/request.py", line 504, in _call_chain
2018-11-22 12:33:54 | result = func(*args)
2018-11-22 12:33:54 | File "/usr/lib64/python3.6/urllib/request.py", line 1346, in http_open
2018-11-22 12:33:54 | return self.do_open(http.client.HTTPConnection, req)
2018-11-22 12:33:54 | File "/usr/lib64/python3.6/urllib/request.py", line 1321, in do_open
2018-11-22 12:33:54 | r = h.getresponse()
2018-11-22 12:33:54 | File "/usr/lib64/python3.6/http/client.py", line 1331, in getresponse
2018-11-22 12:33:54 | response.begin()
2018-11-22 12:33:54 | File "/usr/lib64/python3.6/http/client.py", line 297, in begin
2018-11-22 12:33:54 | version, status, reason = self._read_status()
2018-11-22 12:33:54 | File "/usr/lib64/python3.6/http/client.py", line 258, in _read_status
2018-11-22 12:33:54 | line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
2018-11-22 12:33:54 | File "/usr/lib64/python3.6/socket.py", line 586, in readinto
2018-11-22 12:33:54 | return self._sock.recv_into(b)
2018-11-22 12:33:54 | socket.timeout: timed out
2018-11-22 12:33:54 | None

Bogdan Dobrelya (bogdando) wrote :

Some snippets with errors from the job log

Nov 22 10:51:31 ./logs/undercloud/var/log/journal.txt.gz localhost kernel: RAS: Correctable Errors collector initialized.
Nov 22 10:51:38 ./logs/undercloud/var/log/journal.txt.gz fedora-28-rax-ord-0000652316 ntpd[766]: error resolving pool 2.fedora.pool.ntp.org: Name or service not known (-2)
Nov 22 11:01:36 ./logs/undercloud/var/log/journal.txt.gz fedora-28-rax-ord-0000652316 dnf[1369]: Failed determining last makecache time.
Nov 22 12:19:41 ./logs/undercloud/var/log/journal.txt.gz fedora-28-rax-ord-0000652316 ntpd[766]: bind(28) AF_INET6 fe80::5034:72ff:feec:4742%5#123 flags 0x11 failed: Cannot assign requested address
Nov 22 12:19:41 ./logs/undercloud/var/log/journal.txt.gz fedora-28-rax-ord-0000652316 ntpd[766]: failed to init interface for address fe80::5034:72ff:feec:4742%5

Quique Llorente (quiquell) wrote :

The podman error is here
http://logs.openstack.org/98/619598/3/check/tripleo-ci-fedora-28-standalone/ee64c01/logs/undercloud/var/log/extra/podman/podman_allinfo.log.txt.gz
\n\n
+ podman images
\n\n
+ podman stats --all --no-stream
Error: GetWinsize: inappropriate ioctl for device
ID NAME CPU % MEM USAGE / LIMIT MEM % NET IO BLOCK IO PIDS
\n\n
+ podman version
Version: 0.11.1.1
G

Quique Llorente (quiquell) wrote :

Found the podman issue:

"kernel: overlayfs: unrecognized mount option "nodev" or missing value"

http://logs.openstack.org/98/619598/3/check/tripleo-ci-fedora-28-standalone/ee64c01/logs/undercloud/var/log/journal.txt.gz#_Nov_22_16_02_35

Quique Llorente (quiquell) wrote :

Newer version of skeop fedora RPM spec is adding 'nodev' option that is not supported by kernel
https://src.fedoraproject.org/cgit/rpms/skopeo.git/commit/?id=7c8484c8b1a0fab41e5124ebf4b0d870faf2654f

Fix proposed to branch: master
Review: https://review.openstack.org/619741

Changed in tripleo:
status: Triaged → In Progress

Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https://review.openstack.org/619741
Reason: Clearing the gate. Do not retore this until being given the all clear. See http://lists.openstack.org/pipermail/openstack-discuss/2018-November/000368.html

Reviewed: https://review.openstack.org/619741
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=184d5537cce6765552a4d4215225b56e233adb1e
Submitter: Zuul
Branch: master

commit 184d5537cce6765552a4d4215225b56e233adb1e
Author: Quique Llorente <email address hidden>
Date: Fri Nov 23 15:41:16 2018 +0100

    Catch socket.timeout waiting port api

    We are facing this bug [1] retries was broken at timeouts catching
    timeouts let's us handle retries as expected.

    Closes-Bug: #1804649

    Change-Id: I8ec6b983dbd1b2a1586b787d05c45d1b9c195074
    Story: https://tree.taiga.io/project/tripleo-ci-board/task/403?kanban-status=1447275

Changed in tripleo:
status: In Progress → Fix Released

This issue was fixed in the openstack/python-tripleoclient 11.2.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers