DNS servers retrieval change with systemd-resolved

Bug #1914229 reported by Rodolfo Alonso
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tempest
Fix Released
Medium
Unassigned
tripleo
Fix Released
High
Unassigned

Bug Description

In newer OS, "systemd-resolved" service provides network name resolution to local applications. There are up to four modes of handling /etc/resolv.conf [1], but the recommended way to retrieve the DNS nameservers is executing "resolvectl dns":

root@ubuntu:~# resolvectl dns
Global: 10.10.0.2 10.100.0.2
Link 4 (ens8):
Link 3 (ens7):
Link 2 (ens3):

"Global" key will contain all the nameservers used by the system.

[1]https://man.archlinux.org/man/systemd-resolved.8#/ETC/RESOLV.CONF

Revision history for this message
Martin Kopec (mkopec) wrote :

I don't understand how this is related to tempest, is there something we need to change in tempest? a test? which one? please write bug descriptions as much descriptive as possible, it's easier for contributors to understand the background and start working on the issue. Thanks.

Revision history for this message
Martin Kopec (mkopec) wrote :

Moving to Incomplete as it lacks important information requested a month ago. Feel free to move it back to New after providing the requested info.

Changed in tempest:
status: New → Incomplete
Revision history for this message
Candido Campos Rivas (ccamposr) wrote :

the problem is when the image has some dns preconfigured in /etc/resolv.conf, for example:

[stack@undercloud-0 ~]$ virt-copy-out -a tempest_image /etc/resolv.conf .
[stack@undercloud-0 ~]$ cat resolv.conf
# Generated by NetworkManager
nameserver 192.168.122.1

then the test(test_subnet_details) fails in first check, when the vm is created:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 89, in wrapper
    return f(*func_args, **func_kwargs)
  File "/usr/lib/python3.6/site-packages/tempest/scenario/test_network_basic_ops.py", line 634, in test_subnet_details
    trgt_serv=dns_servers))
  File "/usr/lib/python3.6/site-packages/testtools/testcase.py", line 411, in assertEqual
    self.assertThat(observed, matcher, message)
  File "/usr/lib/python3.6/site-packages/testtools/testcase.py", line 498, in assertThat
    raise mismatch_error
testtools.matchers._impl.MismatchError: {'1.2.3.4'} != {'1.2.3.4', '192.168.122.1'}: Looking for servers: ['1.2.3.4']. Retrieved DNS nameservers: ['192.168.122.1', '1.2.3.4'] From host: 10.0.0.234.

I propose as solution:

git show 47ab33174f9c4538e7ecdaab187ae7edf8774f1a
commit 47ab33174f9c4538e7ecdaab187ae7edf8774f1a (HEAD -> master)
Author: ccamposr <email address hidden>
Date: Wed Mar 10 13:31:50 2021 +0100

    Fix unstability in test_subnet_details

    when the image has some dns preconfigured in /etc/resolv.conf, for example:

    resolv.conf
    nameserver 192.168.122.1

    the test failed due to:
    testtools.matchers._impl.MismatchError: {'1.2.3.4'} != {'1.2.3.4', '192.168.122.1'}: Looking for servers: ['1.2.3.4']. Retrieved DNS nameservers: ['192.168.122.1', '1.2.3.4'] From host: 10.0.0.234

    https: //bugs.launchpad.net/tempest/+bug/1914229
    Related-Bug: #1914229
    Change-Id: I9bd8b47642891a1fe42c48a0b9fe50cf9bc4e03b

diff --git a/tempest/scenario/test_network_basic_ops.py b/tempest/scenario/test_network_basic_ops.py
index e359c712f..be610905e 100644
--- a/tempest/scenario/test_network_basic_ops.py
+++ b/tempest/scenario/test_network_basic_ops.py
@@ -623,6 +623,13 @@ class TestNetworkBasicOps(manager.NetworkScenarioTest):
         ssh_client = self.get_remote_client(
             ip_address, private_key=private_key, server=server)

+ # NOTE: Server needs to renew its dhcp lease in order to get new
+ # definitions from subnet
+ # NOTE(amuller): we are renewing the lease as part of the retry
+ # because Neutron updates dnsmasq asynchronously after the
+ # subnet-update API call returns.
+ ssh_client.renew_lease(fixed_ip=floating_ip['fixed_ip_address'],
+ dhcp_client=CONF.scenario.dhcp_client)
         dns_servers = [initial_dns_server]
         servers = ssh_client.get_dns_servers()

The idea is force a renew lease after the vm is up, same that it is done after the subnet info update. Then the resolv.conf is overwritten with the dhcp info:

[root@vm1 cloud-user]# cat /etc/resolv.conf
; generated by /usr/sbin/dhclient-script
search openstackgate.local
nameserver 10.0.0.1

https://review.opendev.org/c/openstack/tempest/+/779756

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hi Candido:

Thank you for adding more information to this bug I left uncompleted. As we saw in our tests, the image preconfigured DNS (in /etc/resolv.conf) could be a problem when retrieving the DNS list (reading from this file).

As you commented, the DHCP renew process will cleanup the resolv.conf file, deleting this entry. As we saw during the testing, some images do not start the "systemd-resolved" service; because of this the "resolvectl dns" command will fail. This command returns the DNS per interface and could be easier to discriminate in the tests. But your solution is easier to implement and could work with any image.

Regards.

Revision history for this message
Martin Kopec (mkopec) wrote :

Thank you for providing more info on this.

Moving to In Progress as there is a review related to this: https://review.opendev.org/c/openstack/tempest/+/779756

Changed in tempest:
status: Incomplete → In Progress
Revision history for this message
Martin Kopec (mkopec) wrote :

This is a duplicate of [1] (I missed that at first). [1] was marked as invalid and there were strong opinions against the fix [2] proposed in [1]. Basically because we decided not to go with [2] the bug [1] was marked invalid. However now I'm rethinking that atm, I gave it many thoughts and came to the conclusion that this bug (then [1] as well) is valid.

Let's see the facts I'm considering:

NOTE: 192.1688.122.1 is called 'the IP' in the text below

1. Pre-configured DNS entries

These cause many troubles - it gave me quite enough google results (especially when you search specifically for the IP)

centos images containing the IP is considered more a bug than a feature, see:
https://bugs.centos.org/view.php?id=16948
https://bugzilla.redhat.com/show_bug.cgi?id=1623913

ubuntu images, on the other hand, have the IP by default:
https://wiki.ubuntu.com/SecurityTeam/TestingEnvironment
https://assets.ubuntu.com/v1/f954307f-ubuntu-server-guide.pdf
Seems that in ubuntu images having the IP set/using it just makes sense

This shows that pre-configured DNS entries don't necessary mean that the image itself is broken (as was stated in [1]).

Based on this I think the bug is valid as Tempest is failing on something that is not related to the test itself - Tempest just has a wrong assumption regarding the DNS entries - the assumption is based on cirros and centos images.

2. We discussed during PTGs that we want to support different images than just cirros

See f.e. wallaby PTG: https://etherpad.opendev.org/p/qa-wallaby-ptg (topic: Use different guest image for gate jobs to run tempest tests)

3. Based on my testing, the renew of DNS entries takes ~1 second

4. The solution approach
[2] review tried to workaround the additional DNS entry leaving it untouched in the resolv.conf which could lead to multiple wrong situations which could mess up the test and provide incorrect results. For example a situation when the IP is not a DNS server in the tested cloud - that might lead to a slow resolve which would make the test slower (and possibly lead to a timeout and etc).
Therefore the [2] was not accepted.

However now we have a different approach proposed in this bug:
https://review.opendev.org/c/openstack/tempest/+/779756

The approach renews the DNS entries which puts the image to the (DNS) context of the tested environment and removes any pre-configured entries which are not valid in the env.

That puts any custom image used into the same context (the point of a good test = the same starting point) and solves the issue described in the fact #1. It also aligns with the fact #2. The fact #3 just confirms this is not an insane solution.

Based on all of the above, I agree with the proposed solution and will vote +2.

[1] https://bugs.launchpad.net/tempest/+bug/1860129
[2] https://review.opendev.org/c/openstack/tempest/+/703072/

Revision history for this message
Ghanshyam Mann (ghanshyammann) wrote :

Thanks, martin for the detailed investigation. i agree with the approach proposed in the 779756 which is a much better way to clean up the DNS entry.

On supporting such mage in tempest or not I agree that we can support as we discussed in wallaby PTG.

Changed in tempest:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tempest (master)

Reviewed: https://review.opendev.org/c/openstack/tempest/+/779756
Committed: https://opendev.org/openstack/tempest/commit/3893f9db4839e38b2d720c3b1e37728a9aa0f18b
Submitter: "Zuul (22348)"
Branch: master

commit 3893f9db4839e38b2d720c3b1e37728a9aa0f18b
Author: ccamposr <email address hidden>
Date: Wed Mar 10 13:31:50 2021 +0100

    Fix unstability in test_subnet_details

    when the image has some dns preconfigured in /etc/resolv.conf,
    for example:

    resolv.conf
    nameserver 192.168.122.1

    the test failed due to:
    testtools.matchers._impl.MismatchError:
    {'1.2.3.4'} != {'1.2.3.4', '192.168.122.1'}: Looking for servers: ['1.2.3.4'].
    Retrieved DNS nameservers: ['192.168.122.1', '1.2.3.4'] From host: 10.0.0.234

    DHCP renewal is forced in the server that will cleanup the file
    /etc/resolv.conf from any other preconfigured DNS entry. When
    the DHCP server pushes the new configuration, the Neutron network
    DNS entries are applied

    Closes-Bug: #1914229
    Change-Id: I9bd8b47642891a1fe42c48a0b9fe50cf9bc4e03b

Changed in tempest:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tempest (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tempest/+/794776

Revision history for this message
chandan kumar (chkumar246) wrote :

Also seen in tripleo https://9f79c46754a50dc48a25-d1b24bbabb1277ef639cf58978eb91ee.ssl.cf1.rackcdn.com/800965/3/gate/tripleo-ci-centos-8-containers-multinode/8f79636/logs/undercloud/var/log/tempest/tempest_run.log

```
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_subnet_details [58.734626s] ... FAILED

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 70, in wrapper
        return f(*func_args, **func_kwargs)
      File "/usr/lib/python3.6/site-packages/tempest/scenario/test_network_basic_ops.py", line 642, in test_subnet_details
        trgt_serv=dns_servers))
      File "/usr/lib/python3.6/site-packages/testtools/testcase.py", line 411, in assertEqual
        self.assertThat(observed, matcher, message)
      File "/usr/lib/python3.6/site-packages/testtools/testcase.py", line 498, in assertThat
        raise mismatch_error
    testtools.matchers._impl.MismatchError: {'1.2.3.4'} != set(): Looking for servers: ['1.2.3.4']. Retrieved DNS nameservers: [] From host: 192.168.24.197.

```

Changed in tripleo:
status: New → Triaged
importance: Undecided → High
milestone: none → xena-2
tags: added: alert ci promotion-blocker
Revision history for this message
chandan kumar (chkumar246) wrote :
Changed in tripleo:
milestone: xena-2 → xena-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tempest/+/802005

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tempest train-last

This issue was fixed in the openstack/tempest train-last release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tempest (master)

Reviewed: https://review.opendev.org/c/openstack/tempest/+/802005
Committed: https://opendev.org/openstack/tempest/commit/126fe656a976b3f46a755e83ea9950f72815a87e
Submitter: "Zuul (22348)"
Branch: master

commit 126fe656a976b3f46a755e83ea9950f72815a87e
Author: Slawek Kaplonski <email address hidden>
Date: Fri Jul 23 13:18:05 2021 +0200

    Wait couple of seconds for dns servers to be set in the guest

    In test
    tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_subnet_details
    there is renewal of the DHCP lease made to configure dns nameservers.
    And sometimes this test is failing due to missing nameservers in the
    /etc/resolv.conf file in the guest VM.
    After analyzing logs from such failed jobs I think that the reason of
    that may be race between getting dns nameservers from guest VM by test
    and actually configuring it inside the guest vm.
    So this patch proposes to add wait (5 seconds by default) for non empty
    list of the dns nameservers returned from the guest VM. That should
    avoid such failures of that test.

    Closes-bug: #1914229
    Change-Id: I093ae5c11f88cc29e91285ff674788de53645b4e

Revision history for this message
Marios Andreou (marios-b) wrote :

        * Revert "Add TestNetworkBasicOps.test_subnet_details to skip list"
        * Change-Id: Ia9165e7107fb9fe0f80cf8a65f32ff9f68a725f5
        * https://review.opendev.org/c/openstack/openstack-tempest-skiplist/+/803432

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tempest 28.1.0

This issue was fixed in the openstack/tempest 28.1.0 release.

Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tempest (master)

Change abandoned by "Rodolfo Alonso <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tempest/+/794776

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.