RDO cloud is not in operational state

Bug #1783540 reported by Sagi (Sergey) Shnaidman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
chandan kumar

Bug Description

When running zuul "pre" roles, the role "prepare-workspace" fails. No log in output.

https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-centos-7-master-promote-consistent-to-tripleo-ci-testing/1a3060a/job-output.txt.gz#_2018-07-25_04_14_15_607154

2018-07-25 04:14:15.607154 | TASK [prepare-workspace : Synchronize src repos to workspace directory.]
2018-07-25 04:20:07.725038 | primary | Output suppressed because no_log was given
2018-07-25 04:20:07.743643 |
2018-07-25 04:20:07.743791 | PLAY RECAP
2018-07-25 04:20:07.743881 | primary | ok: 1 changed: 0 unreachable: 0 failed: 1

RDO infrastructure fails in various places. For example:

http://logs.openstack.org/71/584771/18/check/tripleo-ci-centos-7-containers-multinode-pike/cbdd936/job-output.txt.gz#_2018-07-25_10_46_09_922077

2018-07-25 10:46:09.922077 | primary | RuntimeError: Failed to retrieve repo file from https://trunk.rdoproject.org/centos7-pike/current-tripleo/delorean.repo after 10 retries

See also:
ReadTimeoutError on mirror.regionone.rdo-cloud-tripleo.rdoproject.org
https://tree.taiga.io/project/morucci-software-factory/issue/1569

https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset037-updates-queens/e7941a6/job-output.txt.gz#_2018-07-25_06_58_10_824679

primary | Cloning into 'openstack-infra/tripleo-ci'...
2018-07-25 09:23:06.588100 | RUN END RESULT_TIMED_OUT: [untrusted : review.rdoproject.org/rdo-jobs/playbooks/legacy/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset037-updates-queens/run.yaml@master]

https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/legacy-periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset030-queens/3b2827a/job-output.txt.gz#_2018-07-25_06_50_49_442091

SSL read: errno -5961 (PR_CONNECT_RESET_ERROR)
2018-07-25 06:50:49.442091 | primary | 2018-07-25 06:50:49 | * TCP connection reset by peer

and others network failures.

summary: - Preparing workspace on node fails in periodic jobs
+ RDO cloud is not in operational state
description: updated
tags: added: alert
Revision history for this message
chandan kumar (chkumar246) wrote :

Some other failures also 2018-07-25 10:46:09.921862 | primary | "{} retries".format(repo_url, retries))
2018-07-25 10:46:09.922077 | primary | RuntimeError: Failed to retrieve repo file from https://trunk.rdoproject.org/centos7-pike/current-tripleo/delorean.repo after 10 retries

http://logs.openstack.org/71/584771/18/check/tripleo-ci-centos-7-containers-multinode-pike/cbdd936/job-output.txt.gz#_2018-07-25_10_46_09_922077

Revision history for this message
Alan Pevec (apevec) wrote :

I've opened a ticket for rdo cloud ops: [tickets.osci.io #752] AutoReply: RDO Cloud networking instability

Revision history for this message
Paul Belanger (pabelanger) wrote :

For upstream jobs, we need to stop depending on trunk.rdoproject.org, we've setup a reverse proxy cache in AFS, and all HTTP requests should use it. This helps mitigate upstream failures when rdoproject.org is down.

Changed in tripleo:
milestone: rocky-3 → rocky-rc1
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Fix Released
tags: removed: alert promotion-blocker
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.