This points to something updating the DNS configuration on the node while the job is running and when it does so appears to break DNS. For example in the links above we see no resolver is set, possibly because NetworkManager is attempting to set that info via DHCP but rackspace nodes do not do DHCP. Elsewhere DHCP may be overriding DNS resolvers to use local cloud resolvers which are overwhelmed by the number of requests? In any case this appears to be something in the job itself modifying the commit the node starts with when the job begins.
Noted this on IRC but recording it here as well. At the beginning of the job we record the ansible host info. This host info shows that 127.0.0.1 is properly set as a DNS resolver: https:/ /zuul.opendev. org/t/openstack /build/ 22500697c5244d3 1b0687057040cf1 af/log/ zuul-info/ host-info. primary. yaml#187- 189. But then after the jobs has failed an the job records logs it grabs the resolv.conf file which shows no resolvers are set: https:/ /zuul.opendev. org/t/openstack /build/ 22500697c5244d3 1b0687057040cf1 af/log/ logs/undercloud /etc/resolv. conf.
This points to something updating the DNS configuration on the node while the job is running and when it does so appears to break DNS. For example in the links above we see no resolver is set, possibly because NetworkManager is attempting to set that info via DHCP but rackspace nodes do not do DHCP. Elsewhere DHCP may be overriding DNS resolvers to use local cloud resolvers which are overwhelmed by the number of requests? In any case this appears to be something in the job itself modifying the commit the node starts with when the job begins.