multiple periodic integration jobs fail configure-mirrors - Failed to connect to mirrors.centos.org port 443: No route to host

Bug #1989452 reported by Marios Andreou
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

In multiple jobs across all the lines we have jobs failing with status RETRY and trace that looks like:

        2022-09-13 01:46:26.523015 | primary | error: Curl error (7): Couldn't connect to server for https://mirrors.centos.org/metalink?repo=centos-extras-sig-extras-common-9-stream&arch=x86_64&protocol=https,http [Failed to connect to mirrors.centos.org port 443: No route to host] (https://mirrors.centos.org/metalink?repo=centos-extras-sig-extras-common-9-stream&arch=x86_64&protocol=https,http).
2022-09-13 01:46:26.523038 | primary | error: Curl error (7): Couldn't connect to server for https://mirrors.centos.org/metalink?repo=centos-extras-sig-extras-common-9-stream&arch=x86_64&protocol=https,http [Failed to connect to mirrors.centos.org port 443: No route to host] (https://mirrors.centos.org/metalink?repo=centos-extras-sig-extras-common-9-stream&arch=x86_64&protocol=https,http).
2022-09-13 01:46:26.523061 | primary | CentOS Stream 9 - Extras packages 0.0 B/s | 0 B 04:55
2022-09-13 01:46:26.523084 | primary | Errors during downloading metadata for repository 'extras-common':
2022-09-13 01:46:26.523108 | primary | - Curl error (7): Couldn't connect to server for https://mirrors.centos.org/metalink?repo=centos-extras-sig-extras-common-9-stream&arch=x86_64&protocol=https,http [Failed to connect to mirrors.centos.org port 443: No route to host]

This is promotion blocker for master - multiple examples in the latest buildset [1] - same for wallaby/centos9 [2] and wallaby/8 [3] (see jobs in RETRY in those links)

[1] https://review.rdoproject.org/zuul/buildset/372d55679278445da538e008b4ac3018
[2] https://review.rdoproject.org/zuul/buildset/ab9b2a4e6bca4e298565a466bae71fae
[3] https://review.rdoproject.org/zuul/build/8f74197cc4e6457eb37e693006c95770

Revision history for this message
Marios Andreou (marios-b) wrote :

noting we had a similar bug a while back at https://bugs.launchpad.net/tripleo/+bug/1983817 but in that case for Could not resolve host: mirror.regionone.vexxhost-nodepool-tripleo.rdoproject.org Edit

Revision history for this message
Marios Andreou (marios-b) wrote :

removing the cix flags - i missed that the buildsets I pointed to in the description have re-runs for those failed jobs so we are not blocked on this

tags: removed: alert ci promotion-blocker
Changed in tripleo:
importance: Critical → Undecided
Revision history for this message
Marios Andreou (marios-b) wrote :

re-adding cix flags as we are still seeing it so we want it on the cix board

Changed in tripleo:
importance: Undecided → Critical
tags: added: alert ci promotion-blocker
Revision history for this message
Ronelle Landy (rlandy) wrote :
Revision history for this message
Ronelle Landy (rlandy) wrote :
Revision history for this message
Ronelle Landy (rlandy) wrote :

<apevec> 1. it still connects to external mirrors.c.o host
<apevec> 2. our VH mirror is not registered w/ metalink
<apevec> in VH, our jobs should end up using afs-mirror.vexxhost.rdoproject.org/
<apevec> actually its alias

<apevec> mirror.regionone.vexxhost IN CNAME afs-mirror.vexxhost

Revision history for this message
Ronelle Landy (rlandy) wrote :
Revision history for this message
Ronelle Landy (rlandy) wrote :

from Alfredo Moralejo Alonso: https://review.opendev.org/c/zuul/zuul-jobs/+/857730 should fix that

Revision history for this message
Ronelle Landy (rlandy) wrote :

https://paste.openstack.org/show/bkgTOoIbPzJjK1nr2TEK/hold results of traceroute from impacted node

Revision history for this message
Alan Pevec (apevec) wrote :

mirrors.c.o is DNS round-robined, so by the time traceroute ran it was likely different IP than when libcurl hits No route to host in the job

mirrors.centos.org. 600 IN CNAME mirrors.fedoraproject.org.
mirrors.fedoraproject.org. 115 IN CNAME wildcard.fedoraproject.org.
wildcard.fedoraproject.org. 40 IN A 152.19.134.142
wildcard.fedoraproject.org. 40 IN A 38.145.60.20
wildcard.fedoraproject.org. 40 IN A 38.145.60.21
wildcard.fedoraproject.org. 40 IN A 67.219.144.68
wildcard.fedoraproject.org. 40 IN A 152.19.134.198
wildcard.fedoraproject.org. 40 IN A 8.43.85.73
wildcard.fedoraproject.org. 40 IN A 8.43.85.67
wildcard.fedoraproject.org. 40 IN A 140.211.169.196
wildcard.fedoraproject.org. 40 IN A 209.132.190.2

We need to find out how to show in logs which actual IP is failing, and then ask Fedora infra team to remove it from their DNS rotation.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)
Revision history for this message
Marios Andreou (marios-b) wrote (last edit ):

per comment 8 above, instead this was used

https://review.opendev.org/c/zuul/zuul-jobs/+/857988 configure-mirrors: make each compontent in 9-stream configurable

so lets see if this is still a thing today otherwise we can close

[EDIT]: we still need to configure it on our side with https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/858209

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master)

Change abandoned by "Ronelle Landy <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/858209
Reason: will add to vars in parent job

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ci (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/858309

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ci (master)

Change abandoned by "Ronelle Landy <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/858309

Ronelle Landy (rlandy)
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.