tripleo

Bug #1997202
Comment #5

Comment 5 for bug 1997202

Revision history for this message

Clark Boylan (cboylan) wrote on 2022-11-21:

Noted this on IRC but recording it here as well. At the beginning of the job we record the ansible host info. This host info shows that 127.0.0.1 is properly set as a DNS resolver: https://zuul.opendev.org/t/openstack/build/22500697c5244d31b0687057040cf1af/log/zuul-info/host-info.primary.yaml#187-189. But then after the jobs has failed an the job records logs it grabs the resolv.conf file which shows no resolvers are set: https://zuul.opendev.org/t/openstack/build/22500697c5244d31b0687057040cf1af/log/logs/undercloud/etc/resolv.conf.

This points to something updating the DNS configuration on the node while the job is running and when it does so appears to break DNS. For example in the links above we see no resolver is set, possibly because NetworkManager is attempting to set that info via DHCP but rackspace nodes do not do DHCP. Elsewhere DHCP may be overriding DNS resolvers to use local cloud resolvers which are overwhelmed by the number of requests? In any case this appears to be something in the job itself modifying the commit the node starts with when the job begins.