tracker-bug: network lag in tripleo-infra tenant prevents container promotions

Bug #1770860 reported by wes hayutin
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Arx Cruz

Bug Description

[centos@promoter-server container-push]$ time sudo docker login -u $RDOPROJECT_USERNAME -p $RDOPROJECT_PASSWORD trunk.registry.rdoproject.org
Login Succeeded

real 0m25.453s
user 0m0.013s
sys 0m0.026s

===================

On another box outside of the tripleo-infra tenant logging into the rdo-registry is quite fast

[root@my-f27 sprint13]# time docker login -u $RDOPROJECT_USERNAME -p $RDOPROJECT_PASSWORD trunk.registry.rdoproject.org
Login Succeeded

real 0m1.254s
user 0m0.007s
sys 0m0.016s

===================

Opening an issue w/ support

Tags: ci
Revision history for this message
Matt Young (halcyondude) wrote :

tripleo-ci triage

This issue was resolved over the weekend by sagi and wes, the DNS server being used by the promoter (in RDO) was failing and/or exceedingly slow. The mitigation was to use our own DNS server on the tripleo-infra tenant.

actions

- (TC) follow up with infra team to flag that server as being slow/unresponsive
- update infra setup playbooks to include this DNS server ("dns-server, 192.168.100.15, 38.145.33.91")

tags: removed: alert promotion-blocker
Revision history for this message
Alan Pevec (apevec) wrote :

> the DNS server being used by the promoter (in RDO) was failing and/or exceedingly slow.

Which IP was that?

> - (TC) follow up with infra team to flag that server as being slow/unresponsive

Please CC me.

Matt Young (halcyondude)
Changed in tripleo:
assignee: nobody → Arx Cruz (arxcruz)
Revision history for this message
Arx Cruz (arxcruz) wrote :

We contact rdo infra guys, and they say there's nothing wrong from their side, they point to this service [1] and we saw some spikes but nothing that could identify the root cause.

[1] - https://smokeping.cloud.rdoproject.org/smokeping/sm.cgi?target=dns.dns-multihost

Changed in tripleo:
milestone: rocky-2 → rocky-3
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
Matt Young (halcyondude) wrote :

Details on how this was resolved/released.

1. The private promoter has been shut down and keys have been removed.
2. Changes have been made to resolve the DNS issues we were experiencing in tripleo-infra tenant.

TLDR: we had dhcp overriding locally set DNS settings, pointing to a non-existent dns server, incurring lag per dns lookup.

3. Work is in flight presently (arx, https://review.rdoproject.org/r/#/c/14060) to add this to our playbooks/scripts that provision tripleo-ci infra assets.

4. the promoter is back on it's normal instance and running without issue.

---

unrelated to this issue, but to the promoter itself (and changes made during the same few days) is an update to the promoter script to ensure that old docker images are untagged and removed, as the disk was (also) full.

https://github.com/rdo-infra/ci-config/commit/f801e5c6d026944cd295cde9f41d9ae8bf7eec38

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.