Jobs running on vexxhost provider failing with Mirror Issues

Bug #2017992 reported by yatin
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Invalid
Critical
yatin

Bug Description

Fails as below:-
2023-04-27 12:36:47.535483 | controller | ++ functions-common:apt_get_update:1155 : timeout 300 sh -c 'while ! sudo http_proxy= https_proxy= no_proxy= apt-get update; do sleep 30; done'
2023-04-27 12:36:50.877357 | controller | Err:1 https://mirror.ca-ymq-1.vexxhost.opendev.org/ubuntu focal InRelease
2023-04-27 12:36:50.877408 | controller | Could not connect to mirror.ca-ymq-1.vexxhost.opendev.org:443 (2604:e100:1:0:f816:3eff:fe0c:e2c0). - connect (113: No route to host) Could not connect to mirror.ca-ymq-1.vexxhost.opendev.org:443 (199.204.45.149). - connect (113: No route to host)
2023-04-27 12:36:50.877419 | controller | Err:2 https://mirror.ca-ymq-1.vexxhost.opendev.org/ubuntu focal-updates InRelease
2023-04-27 12:36:50.877428 | controller | Unable to connect to mirror.ca-ymq-1.vexxhost.opendev.org:https:
2023-04-27 12:36:50.877437 | controller | Err:3 https://mirror.ca-ymq-1.vexxhost.opendev.org/ubuntu focal-backports InRelease
2023-04-27 12:36:50.877446 | controller | Unable to connect to mirror.ca-ymq-1.vexxhost.opendev.org:https:
2023-04-27 12:36:50.877454 | controller | Err:4 https://mirror.ca-ymq-1.vexxhost.opendev.org/ubuntu focal-security InRelease
2023-04-27 12:36:50.877462 | controller | Unable to connect to mirror.ca-ymq-1.vexxhost.opendev.org:https:
2023-04-27 12:36:50.892401 | controller | Reading package lists...
2023-04-27 12:36:50.895427 | controller | W: Failed to fetch https://mirror.ca-ymq-1.vexxhost.opendev.org/ubuntu/dists/focal/InRelease Could not connect to mirror.ca-ymq-1.vexxhost.opendev.org:443 (2604:e100:1:0:f816:3eff:fe0c:e2c0). - connect (113: No route to host) Could not connect to mirror.ca-ymq-1.vexxhost.opendev.org:443 (199.204.45.149). - connect (113: No route to host)
2023-04-27 12:36:50.895462 | controller | W: Failed to fetch https://mirror.ca-ymq-1.vexxhost.opendev.org/ubuntu/dists/focal-updates/InRelease Unable to connect to mirror.ca-ymq-1.vexxhost.opendev.org:https:
2023-04-27 12:36:50.895539 | controller | W: Failed to fetch https://mirror.ca-ymq-1.vexxhost.opendev.org/ubuntu/dists/focal-backports/InRelease Unable to connect to mirror.ca-ymq-1.vexxhost.opendev.org:https:
2023-04-27 12:36:50.895604 | controller | W: Failed to fetch https://mirror.ca-ymq-1.vexxhost.opendev.org/ubuntu/dists/focal-security/InRelease Unable to connect to mirror.ca-ymq-1.vexxhost.opendev.org:https:
2023-04-27 12:36:50.895677 | controller | W: Some index files failed to download. They have been ignored, or old ones used instead.

2023-04-27 12:36:51.007845 | controller |
2023-04-27 12:36:51.008040 | controller | E: Unable to locate package apache2
2023-04-27 12:36:51.008098 | controller | E: Unable to locate package apache2-dev
2023-04-27 12:36:51.008157 | controller | E: Unable to locate package bc
2023-04-27 12:36:51.008218 | controller | E: Package 'bsdmainutils' has no installation candidate
2023-04-27 12:36:51.008321 | controller | E: Unable to locate package gawk
2023-04-27 12:36:51.008382 | controller | E: Unable to locate package gettext
2023-04-27 12:36:51.008438 | controller | E: Unable to locate package graphviz

Example failures:-
- https://db3cff29bdb713e861e1-7db0c1fa1bd98a0adf758f7c1d49f672.ssl.cf5.rackcdn.com/881735/4/check/neutron-tempest-plugin-openvswitch-enforce-scope-new-defaults/278dce1/job-output.txt
- https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_944/881742/4/check/neutron-tempest-plugin-designate-scenario/9444d54/job-output.txt

It's happening randomly from last couple of weeks, seems some issue in general with IPv6 external connectivity.
From opensearch[1] it can be seen following host_ids are impacted:-
c984fb897502bc826ccaf0e258b6071e76c29b305bc5b31b301de76a
1b1e841d5cdc8c40a480da993c1cbf0cd64900baff612378898e72ab
6cc97bc57f540569368fcc47255180c5d21ed00a22cad83eeb600cec
86c687840cd74cd63cbb095748afa5c9cd0f6fcea898d90aa030cc68
94cd367e7821f5d74cf44c5ebafd9af18d2b6dff64a9bee067337cf6
8926aa5796637312bf5e46a0671a88021c208235fafdfcf22931eb01
70670f45d0dc4eaae28e6553525eec409dfb6f80e8d6c8dcef7d7bf5

And seems like it started happening when nodes were upgraded as part of fixing nested-virt issue[2]

[1] https://opensearch.logs.openstack.org/_dashboards/app/discover/?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_a=(columns:!(_source),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:filename,negate:!f,params:(query:job-output.txt),type:phrase),query:(match_phrase:(filename:job-output.txt)))),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:'message:%22Package%20bsdmainutils%20is%20not%20available,%20but%20is%20referred%20to%20by%20another%20package%22'),sort:!())
[2] https://bugs.launchpad.net/neutron/+bug/1999249/comments/3

Changed in neutron:
importance: Undecided → Critical
yatin (yatinkarel)
Changed in neutron:
assignee: nobody → yatin (yatinkarel)
status: New → Triaged
Revision history for this message
yatin (yatinkarel) wrote :

It's an issue on infra side, assigned myself to followup the fixes.

For now impacted labels are disabled in vexxhost-ca-ymq-1 provider with [1], jobs are working fine now. Since these labels are left with just 2 providers(ovh-bhs1 and ovh-gra1), sometimes these jobs may be in zuul queue for longer.

We also raised the issue with vexxhost(ticket id 363862) to fix the impacted compute nodes. Once it's fixed temporary workaround can be reverted.

[1] https://review.opendev.org/c/openstack/project-config/+/881810

Revision history for this message
yatin (yatinkarel) wrote :

We got an update[1] from vexxhost Team and now the vexxhost node provider is re enabled[2].
Since it reenabled we are not seeing the issue. Noticed till now the jobs ran on 3 impacted nodes and is passing so can consider the issue resolved. Closing the bug.

[1] We have performed some optimizations on our compute pool. You can re-enable for now and see how it goes.
[2] https://review.opendev.org/c/openstack/project-config/+/882787

Changed in neutron:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.