Possible DNS issue in the CI

Bug #1721702 reported by Cédric Jeanneret deactivated on 2017-10-06
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
tripleo
High
Gabriele Cerami

Bug Description

Hello,

I suspect a small DNS issue in the CI for gate-tripleo-ci-centos-7-nonha-multinode-oooq:
http://logs.openstack.org/97/509397/3/check/gate-tripleo-ci-centos-7-nonha-multinode-oooq/d67f8e5/console.html#_2017-10-06_06_52_33_436723

fatal: [undercloud]: FAILED! => {"changed": false, "cmd": "/usr/bin/git clone --origin origin https://github.com/redhat-openstack/rdoinfo /home/jenkins/DLRN/rdoinfo", "failed": true, "msg": "fatal: unable to access 'https://github.com/redhat-openstack/rdoinfo/': Could not resolve host: github.com; Name or service not known", "rc": 128, "stderr": "fatal: unable to access 'https://github.com/redhat-openstack/rdoinfo/': Could not resolve host: github.com; Name or service not known\n", "stdout": "Cloning into '/home/jenkins/DLRN/rdoinfo'...\n", "stdout_lines": ["Cloning into '/home/jenkins/DLRN/rdoinfo'..."]}

Tags: ci Edit Tag help
Alex Schultz (alex-schultz) wrote :

This happens from time to time and we need to fix the unbound.log so we can see why

Changed in tripleo:
importance: Undecided → Medium
status: New → Triaged
milestone: none → queens-2
tags: added: ci
David Moreau Simard (dmsimard) wrote :

We are not seeing any logs in unbound because the log file is configured with a selinux context that does not allow unbound to write to it:

https://github.com/openstack-infra/project-config/blob/544fe001f477be137725705d92086e3bb0445b29/nodepool/elements/nodepool-base/finalise.d/89-unbound#L64-L65

Oct 06 16:57:44 upstream-centos-7-rdo-cloud-tripleo-14922 unbound[1191]: Oct 06 16:57:44 unbound[1191:0] error: Could not open logfile /var/log/unbound.log: Permission denied

type=AVC msg=audit(1507309060.078:44): avc: denied { open } for pid=1058 comm="unbound" path="/var/log/unbound.log" dev="vda1" ino=2682530 scontext=system_u:system_r:named_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file

bash-4.2# ls -lZ /var/log/unbound.log
-rw-r--r--. unbound root system_u:object_r:var_log_t:s0 /var/log/unbound.log

bash-4.2# chcon -t named_log_t /var/log/unbound.log

bash-4.2# systemctl restart unbound

bash-4.2# cat /var/log/unbound.log
Oct 06 17:54:52 unbound[2483:0] info: start of service (unbound 1.4.20).

David Moreau Simard (dmsimard) wrote :

It turns out that /var/log/unbound.log is not supposed to be used in the upstream nodepool images, rather /var/lib/unbound/unbound.log: https://review.openstack.org/#/c/510202/

/var/log/unbound.log is likely used on review.rdoproject.org images due to the delta between upstream and downstream images since we have stopped synchronizing project-config.

Changed in tripleo:
milestone: queens-2 → queens-3
Paul Belanger (pabelanger) wrote :

Some more information, github.com only has TTL of 30secods, which doesn't make it good for caching results in unbound. we've discussed maybe setting cache-min-ttl: 300 in unbound.conf, but that could be an issue if github.com changes the DNS (maybe the CDN network) during that cache period.

val-bogus-ttl: 60 might be able to help with this, but needs to be tested. Which dmsimard signed up for :)

A parallel effort should be to move rdoinfo off github.com.

Alan Pevec (apevec) wrote :

rdoinfo is actually at https://review.rdoproject.org/r/rdoinfo
github.com is only a mirror but it was argued that it is more reliable than rdoproject.org hence it is used.
Alternative would be to mirror it into upstream infra but I've heard that proxy-cache for git over http will not work, any other suggestions?

David Moreau Simard (dmsimard) wrote :

cache-bogus-ttl is not what we expected it to be.
It sounds like what we'd like to do doesn't exist (yet), it would be the opposite of "cache-max-negative-ttl" see: https://bugzilla.redhat.com/show_bug.cgi?id=1360222

In the meantime, we're improving the configure-unbound role to allow for configuration of a minimum TTL. Jobs impacted by frequent DNS issues will be encouraged to tweak the minimum TTL value to see if this helps.

See:
- https://review.openstack.org/#/c/523228/
- https://review.openstack.org/#/c/523178/

Alan Pevec (apevec) wrote :

Min. TTL in tripleo-ci is enforced after https://review.openstack.org/524018
merged Dec 1.
Are there anymore DNS issues since then?

Alan Pevec (apevec) on 2018-01-03
Changed in tripleo:
status: Triaged → Fix Released
Changed in tripleo:
status: Fix Released → Triaged
importance: Medium → High
wes hayutin (weshayutin) on 2018-01-08
Changed in tripleo:
assignee: nobody → Gabriele Cerami (gcerami)
Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3

Is this still an issue?

Javier Peña (jpena-c) wrote :

I haven't seen this for a long time, so I think we could close it.

Changed in tripleo:
milestone: stein-3 → stein-rc1
Changed in tripleo:
milestone: stein-rc1 → train-1
Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
milestone: train-2 → train-3
Changed in tripleo:
milestone: train-3 → ussuri-1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.