OVN DNS not working as documented

Bug #2059405 reported by Martin Ananda Boeker
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
New
Medium
Unassigned

Bug Description

Env: 2023.1

As far as I can tell, I have configured OVN and DNS as documented.

In kolla.yml
kolla_enable_ovn: true

In kolla/globals.yml:
neutron_plugin_agent: ovn
neutron_enable_ovn_agent: true

It seems that it does not matter what I put in dns.yml, and documentation confirms that because OVN should be doing dns responses by grabbing queries to port 53. The behavior however is very strange. I only have two instances, vm1 (172.30.89.175) and vm2 (172.30.89.177)

Here is the output of `ovn-sbctl list dns`:

_uuid : cdc31ab2-a363-4585-a835-c8019d4b265d
datapaths : [ca41c1b4-f4b1-4606-99e5-dc47a383accf]
external_ids : {dns_id="4c6895d8-fad3-4591-acc4-6a4ed8710d2b"}
records : {"175.89.30.172.in-addr.arpa"=vm1.aio.local, "177.89.30.172.in-addr.arpa"=vm2.aio.local, vm1="172.30.89.175", vm1.aio.local="172.30.89.175", vm2="172.30.89.177", vm2.aio.local="172.30.89.177"}

Here's the output of trying to communicate between VMs:

admin@vm1:~$ resolvectl
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (ens3)
    Current Scopes: DNS
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 172.30.89.76
       DNS Servers: 172.30.89.46 172.30.89.61 172.30.89.76
        DNS Domain: aio.local

admin@vm1:~$ ping vm2
ping: vm2: Temporary failure in name resolution

admin@vm1:~$ host vm2
Host vm2.aio.local not found: 5(REFUSED)

admin@vm1:~$ host vm2.aio.local
Host vm2.aio.local not found: 5(REFUSED)

admin@vm1:~$ host vm2 172.30.89.46
Using domain server:
Name: 172.30.89.46
Address: 172.30.89.46#53
Aliases:

vm2.aio.local has address 172.30.89.177
Host vm2.aio.local not found: 5(REFUSED)
Host vm2.aio.local not found: 5(REFUSED)

172.30.89.46 172.30.89.61 172.30.89.76 are the controllers, however during testing we went as far as to disable Designate, so they cannot answer. However we see that when we manually specify a dns server to query against, even if that dns server does not know the answer, OVN responds with the correct address (and then we get two additional REFUSED errors).

This is very strange behavior.. Are we missing something here?

Tags: dns ovn
Revision history for this message
Martin Ananda Boeker (mboeker) wrote :

Because the controllers are not doing DNS, I removed them from the OVN config and dns.yml. In the test below, I'm querying the gateway, which of course also does not resolve DNS, but you can see OVN is providing the correct address.. I rebuilt the VMs so now vm2 has IP 172.30.89.175.

admin@vm1:~$ resolvectl
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (ens3)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
    DNS Domain: aio.local

admin@vm1:~$ ping vm2
ping: vm1: Temporary failure in name resolution

admin@vm1:~$ host vm2
Host vm1.aio.local not found: 2(SERVFAIL)

admin@vm1:~$ host vm1 172.30.89.46
Using domain server:
Name: 172.30.89.46
Address: 172.30.89.46#53
Aliases:

vm1.aio.local has address 172.30.89.175
Host vm1.aio.local not found: 5(REFUSED)
Host vm1.aio.local not found: 5(REFUSED)

So once again, OVN has the answer, but it's not providing it until I try to query something outside, and even then I get the correct answer in addition to two failures.

Revision history for this message
Will Szumski (willjs) wrote :

OVN will snaffle the DNS queries before forwarding them on to the DNS servers configured in the VM. If OVN has an entry, it will respond without forwarding it. Are you sure this is not what is happening here? Have you configured broken DNS servers in the VM?

Revision history for this message
Martin Ananda Boeker (mboeker) wrote :

Hi Will,

So OVN is responding, but only when I specify an external server, regardless of what's in resolvectl. And even then, we get the correct response from OVN followed by error messages.

Here is the resolvectl output, with the controllers set as DNS servers. Note above that I've also tried this without any DNS servers specified. Currently the controllers are running designate, but of course there are no entries for vm1 or vm2 specifically created:

Global
         Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
  resolv.conf mode: stub

Link 2 (eth0)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 172.30.89.76
       DNS Servers: 172.30.89.46 172.30.89.61 172.30.89.76
        DNS Domain: aio.local

[admin@vm2 ~]$ host vm1
Host vm1 not found: 2(SERVFAIL)

[admin@vm2 ~]$ host vm1.aio.local
Host vm1.aio.local not found: 3(NXDOMAIN)

[admin@vm2 ~]$ host vm1 172.30.89.46
Using domain server:
Name: 172.30.89.46
Address: 172.30.89.46#53
Aliases:

vm1.aio.local has address 172.30.89.177
Host vm1.aio.local not found: 3(NXDOMAIN)
Host vm1.aio.local not found: 3(NXDOMAIN)

[admin@vm2 ~]$ host vm1 1.2.3.4
Using domain server:
Name: 1.2.3.4
Address: 1.2.3.4#53
Aliases:

vm1.aio.local has address 172.30.89.177
;; communications error to 1.2.3.4#53: timed out
;; communications error to 1.2.3.4#53: timed out
;; no servers could be reached

;; communications error to 1.2.3.4#53: timed out
;; communications error to 1.2.3.4#53: timed out
;; no servers could be reached

You can see, if I specify nothing as a DNS server it just fails using the short hostname. If I specify anything as a DNS server, even if it's junk, OVN is responding but I also get errors.

Revision history for this message
Martin Ananda Boeker (mboeker) wrote (last edit ):

Here is evidence that OVN is NOT actually catching the DNS traffic, even though it is reaching the DNS server (controller):

ON VM:
admin@vm1:~$ host vm2
Host vm2.aio.local not found: 5(REFUSED)

ON CONTROLLER, tcpdump -n port 53:

12:30:08.086208 IP 172.30.89.176.38733 > 172.30.89.61.53: 8954+ [1au] A? vm2.aio.local. (44)
12:30:08.086396 IP 172.30.89.61.53 > 172.30.89.176.38733: 8954 Refused- 0/0/1 (44)

The REFUSED response from the controller is expected, because there is no DNS entry in designate for vm2, but the question is why did OVN not reply since clearly the request left the VM. Here again the OVN config:

ubuntu@AIOTEST02:~$ ovn-sbctl list dns
_uuid : f18eeb3b-3319-4546-ad58-1549f8ed7f70
datapaths : [c36f655d-0364-45bf-a750-663ad676d607]
external_ids : {dns_id="db82ba60-c867-49eb-bb65-0de79745aafb"}
records : {"174.89.30.172.in-addr.arpa"=vm2.aio.local, "176.89.30.172.in-addr.arpa"=vm1.aio.local, vm1="172.30.89.176", vm1.aio.local="172.30.89.176", vm2="172.30.89.174", vm2.aio.local="172.30.89.174"}

Revision history for this message
Will Szumski (willjs) wrote :

Unsure, it looks like you have the relevant configuration (dns extension and dns domain). I would suggest marking this as affecting neutron as they will likely know more about the intricate details. I'd also include your OVN version as I know this is older in the Ubuntu images than Rocky.

Revision history for this message
Martin Ananda Boeker (mboeker) wrote :

Kayobe config seems correct, marking as Neutron.
OVN internal version is : [23.03.1-20.27.0-70.6]

affects: kayobe → neutron
Revision history for this message
Martin Ananda Boeker (mboeker) wrote :

Saw there is also a project for "networking-ovn" but I feel like OVN itself is behaving correctly.. If feedback is that it should be there instead I will move it again.

Revision history for this message
Brian Haley (brian-haley) wrote :

I do see something similar running Neutron from master branch with OVN 23.03.3. Not sure of why the failure someone will need to triage further.

tags: added: dns ovn
Changed in neutron:
status: New → Confirmed
Revision history for this message
Bernard Cafarelli (bcafarel) wrote :

And for question in #7, moving to neutron is correct, networking-ovn was used when OVN mechanism driver was still a separate project

Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Martin Ananda Boeker (mboeker) wrote :

Any update on this issue?

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

I think the issue may be related to using .local as TLD, which has special treatment, see e.g. https://en.wikipedia.org/wiki/Link-Local_Multicast_Name_Resolution

Can you also reproduce the issue when using a different domain?

Changed in neutron:
status: Confirmed → Incomplete
Revision history for this message
Martin Ananda Boeker (mboeker) wrote :

Next time I redeploy the cluster I will try with .internal

Should not be long.

Revision history for this message
Martin Ananda Boeker (mboeker) wrote :

Hi. Just redeployed with .internal and the behavior is the same.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote (last edit ):

Martin, thanks for confirming. This looks like a real issue, but we'll need to collect some data to make sure you configured it correctly. See: https://docs.openstack.org/neutron/latest/admin/config-dns-int.html for reference.

Please attach:

- neutron-server log;
- neutron configuration (/etc/neutron/*), tarball;
- ovs-ofctl dump-flows on node(s) hosting vms;
- ovn-nbctl list DNS.

Thanks. Once these are uploaded, please move the issue to New and we can take it from there. Thanks for your cooperation.

Revision history for this message
Martin Ananda Boeker (mboeker) wrote (last edit ):

I've attached the following (all in a zip file). I expect the two config.tar.gz's are identical but included them anyway:
aiotest01-dump-flows
aiotest01-neutron_server-config.tar.gz
aiotest02-dump-flows
aiotest02-neutron_server-config.tar.gz
aiotest03-ovn-nb-dns

In this environment there are only three nodes, but in our bigger test environment the behavior was the same.

For these logs:
aiotest01 is the hypervisor where vm1 is
aiotest02 is the hypervisor where vm2 is
aiotest03 is the NB master

Output from within the instances:

thales@vm2:~$ host vm1
vm1.aio.local has address 172.30.89.94
Host vm1.aio.local not found: 3(NXDOMAIN)
Host vm1.aio.local not found: 3(NXDOMAIN)
thales@vm2:~$ ping -c 1 vm1
PING vm1.aio.local (172.30.89.94) 56(84) bytes of data.
64 bytes from 172.30.89.94: icmp_seq=1 ttl=64 time=1.85 ms

--- vm1.aio.local ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.846/1.846/1.846/0.000 ms

Note that in the bigger setup the domain is aio.internal but the behavior is exactly the same.

This only works because we have created DNS servers reachable from within this network. They have no entries, so they don't answer the queries, but they respond and that is enough for OVN. If there are no reachable DNS servers, then OVN will not respond either.

Changed in neutron:
status: Incomplete → New
Revision history for this message
Martin Ananda Boeker (mboeker) wrote :

Status updated to NEW. Thanks Ihar!

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

@Martin

@Yatin Karel looked into this issue a bit and looks like `host` tool will by default request A, AAAA, and MX records for the requested hostname. OVN will attempt to reply only to types that it knows about (ipv4 in your case), as to aaaa and mx - they will be forwarded to dns servers. If servers don't reply, then you will experience timeouts.

You may try and see if passing a particular type (`host -t a`) will not produce the problematic behavior.

---

I wonder if you are a victim of http://patchwork<email address hidden>/

I don't think there's neutron side integration to set the new ovn-owned attribute for DNS records. (Not sure there should be - and if so, a new api may be needed.)

To confirm if that's the culprit, could you please check with ovn version that includes the patch (24.03+) and after setting options:dns-owned on the DNS record to true?

Let me know.

Revision history for this message
Martin Ananda Boeker (mboeker) wrote :

Have not seen any input on this issue in a long time, and just noticed it's still marked "new" and "unassigned"

My comment on 2024-04-03 is pretty significant, also there's been no input since I provided the additional data.

Hi @Ihar

Doing `host -t a $targethost` does get rid of the error, thank you for pointing that out!

The other part of the issue persists and I don't think the link you provided is an acceptable solution for our situation, because we do still want a DNS source like designate for the same domain.

If I have the domain foo.local (or foo.internal), I want Designate to have records for 'endpoint.foo.local' and 'web.foo.local' but of course I do not want to create designate entries for server1.foo.local through server99.foo.local, those should be (and are) in OVN.

I just created two networks in my openstack cluster: ovnonline and ovnoffline. ovnonline has a router that connects to a provider network and can get to the internet, ovnoffline does not. They have these instances:

Network ovnonline: dnstest01, dnstest02
Network ovnoffline: dnstest03, dnstest04

dnstest01 and dnstest02 can resolve each other's hostnames no problem.
dnstest03 and dnstest04 can NOT find each other.

OVN DNS has the correct names and addresses, and it is the source of the information, but is only responding to the instances connected to the online network. Each of these networks has nothing else configured, there are no dns servers.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/942373

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/942373
Committed: https://opendev.org/openstack/neutron/commit/cde5580bf1f2850c90733eea2a6a7e3b46e22b63
Submitter: "Zuul (22348)"
Branch: master

commit cde5580bf1f2850c90733eea2a6a7e3b46e22b63
Author: yatinkarel <email address hidden>
Date: Thu Feb 20 20:26:54 2025 +0530

    [OVN] Add option to allow configuring dns ovn-owned

    Added a configuration option '[ovn]dns_records_ovn_owned' to
    allow setting 'ovn-owned' DNS option added as part of [1].
    The Default is False so no change in the current behavior.

    If this option is set to True for OVN version 24.03 and above,
    DNS records will be treated local to the OVN controller and it
    will respond to the queries for the records and record types
    known to it else it will forward them to the configured
    DNS Server(s).

    Also added a maintenance task to update the option in
    all the DNS records as per the config option with neutron
    restart.

    [1] https://github.com/ovn-org/ovn/commit/1622526ff

    Depends-On: https://review.opendev.org/c/openstack/requirements/+/942797
    Depends-On: https://review.opendev.org/c/openstack/ovsdbapp/+/942367
    Related-Issue: https://issues.redhat.com/browse/OSPRH-10758
    Related-Bug: #2059405
    Change-Id: Ia645e8539753c03eb6ead9a868ba5bf194e9a724

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.