Server feature level is now lower than when we began our transaction. Restarting with new ID

Bug #1771353 reported by Andreas Hasenack
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

systemd 237-3ubuntu10 from bionic

There is no /etc/netplan/* file.

I'm getting some sort of a loop when systemd-resolved tries to resolve the hostname of my canonistack instance:

ubuntu@bip:~$ systemd-resolve bip.lcy02.canonistack
bip.lcy02.canonistack: resolve call failed: All attempts to contact name servers or networks failed

systemd-resolved debug output loops over like this:
Transaction 36341 is now 19121.
Cache miss for bip.lcy02.canonistack IN A
Transaction 19121 for <bip.lcy02.canonistack IN A> scope dns on */*.
Using feature level UDP+EDNS0 for transaction 19121.
Sending query packet with id 19121.
Processing incoming packet on transaction 19121. (rcode=SUCCESS)
Server feature level is now lower than when we began our transaction. Restarting with new ID.
Transaction 19121 is now 17157.
Cache miss for bip.lcy02.canonistack IN A
Transaction 17157 for <bip.lcy02.canonistack IN A> scope dns on */*.
Using feature level UDP+EDNS0 for transaction 17157.
Sending query packet with id 17157.
Processing incoming packet on transaction 17157. (rcode=SUCCESS)
Server feature level is now lower than when we began our transaction. Restarting with new ID.
...

Packet capture shows a loop like this:
13:38:32.293524 IP 10.55.32.54.53786 > 10.55.32.1.53: 46096+ [1au] A? bip.lcy02.canonistack. (50)
13:38:32.293753 IP 10.55.32.1.53 > 10.55.32.54.53786: 46096* 1/0/0 A 10.55.32.54 (55)
13:38:32.294026 IP 10.55.32.54.53786 > 10.55.32.1.53: 64494+ [1au] A? bip.lcy02.canonistack. (50)
13:38:32.294249 IP 10.55.32.1.53 > 10.55.32.54.53786: 64494* 1/0/0 A 10.55.32.54 (55)

I'm going to attach the full debug output, and the pcap file.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

/etc/hosts

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

/etc/nsswitch.conf

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

systemd-resolve --status

description: updated
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

All there is about /e/n/i

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Does the upstream DNS server actually know about bip.lcy02.canonistack?

If it doesn't, what sort of a response are you expecting?

Why /etc/hosts does not have bip & bip.lcy02.canonistack set?

Is this a fresh instance, or an upgrade?

in bionic, cloud-init should have configured your instance using netplan, not eni.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

It does. If I ask it directly, it responds just fine to my dig query. It also responds correctly to systemd (did you see the packet capture?).

Why should /etc/hosts have a bip entry? That's a hack when your dns server doesn't know about you.

The instance was an upgrade, can't remember now if it was artful or xenial. I can check if that matters.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Revision history for this message
Steve Dodd (anarchetic) wrote :

I'm see this just trying to resolve names on my local LAN:

Aug 02 17:50:40 beelink systemd-resolved[6697]: Processing incoming packet on transaction 52812. (rcode=SUCCESS)
Aug 02 17:50:40 beelink systemd-resolved[6697]: Server doesn't support EDNS(0) properly, downgrading feature level...
Aug 02 17:50:40 beelink systemd-resolved[6697]: Using degraded feature set (UDP) for DNS server 192.168.128.32.
Aug 02 17:50:40 beelink systemd-resolved[6697]: Server feature level is now lower than when we began our transaction. Restarting with new ID.
Aug 02 17:50:40 beelink systemd-resolved[6697]: Transaction 52812 is now 18875.
Aug 02 17:50:40 beelink systemd-resolved[6697]: Cache miss for xu-host.lan IN AAAA
Aug 02 17:50:40 beelink systemd-resolved[6697]: Transaction 18875 for <xu-host.lan IN AAAA> scope dns on lan0/*.
Aug 02 17:50:40 beelink systemd-resolved[6697]: Using feature level UDP+EDNS0 for transaction 18875.
Aug 02 17:50:40 beelink systemd-resolved[6697]: Sending query packet with id 18875.
Aug 02 17:50:40 beelink systemd-resolved[6697]: Processing incoming packet on transaction 18875. (rcode=SUCCESS)
Aug 02 17:50:40 beelink systemd-resolved[6697]: Server feature level is now lower than when we began our transaction. Restarting with new ID.
Aug 02 17:50:40 beelink systemd-resolved[6697]: Transaction 18875 is now 44944.
Aug 02 17:50:40 beelink systemd-resolved[6697]: Cache miss for xu-host.lan IN AAAA
Aug 02 17:50:40 beelink systemd-resolved[6697]: Transaction 44944 for <xu-host.lan IN AAAA> scope dns on lan0/*.
Aug 02 17:50:40 beelink systemd-resolved[6697]: Using feature level UDP+EDNS0 for transaction 44944.
Aug 02 17:50:40 beelink systemd-resolved[6697]: Sending query packet with id 44944.
Aug 02 17:50:40 beelink systemd-resolved[6697]: Processing incoming packet on transaction 44944. (rcode=SUCCESS)
Aug 02 17:50:40 beelink systemd-resolved[6697]: Server feature level is now lower than when we began our transaction. Restarting with new ID.
Aug 02 17:50:40 beelink systemd-resolved[6697]: Transaction 44944 is now 42801.
Aug 02 17:50:40 beelink systemd-resolved[6697]: Cache miss for xu-host.lan IN AAAA
Aug 02 17:50:40 beelink systemd-resolved[6697]: Transaction 42801 for <xu-host.lan IN AAAA> scope dns on lan0/*.
Aug 02 17:50:40 beelink systemd-resolved[6697]: Using feature level UDP+EDNS0 for transaction 42801.
Aug 02 17:50:40 beelink systemd-resolved[6697]: Sending query packet with id 42801.
Aug 02 17:50:40 beelink systemd-resolved[6697]: Processing incoming packet on transaction 42801. (rcode=SUCCESS)
Aug 02 17:50:40 beelink systemd-resolved[6697]: Server feature level is now lower than when we began our transaction. Restarting with new ID.

Upstream DNS is a probably fairly ancient version of dnsmasq..

Revision history for this message
Steve Dodd (anarchetic) wrote :

Have a filed an upstream report: https://github.com/systemd/systemd/issues/9785

Revision history for this message
Steve Dodd (anarchetic) wrote :

Hmm, my case seems to be caused by a dnsmasq bug - if there are no answers, it doesn't return an EDNS0 OPT record even if there was one in the query. This seems to be confusing systemd-resolved.

Not sure what is causing @ahasenack's issue - the pcap shows the upstream DNS consistently not returning OPT records, which is expected if it doesn't support EDNS0 at all.

So different causes but both apparently tickling some issue in resolved's logic.

Revision history for this message
Steve Dodd (anarchetic) wrote :

Installing libss-resolve works around the problem for me - is there a reason this is not installed by default on Ubuntu, per upstream's recommendations?

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@Steve Dodd

libnss-resolve is not the upstream recommendation any more, and it doesn't work without dbus, with non-nss based applications or containers.

by default, recent ubuntu releases use a local stub-resolver which does work more universally.

See http://manpages.ubuntu.com/manpages/bionic/en/man8/systemd-resolved.8.html /etc/resolv.conf section the four modes of handling /etc/resolv.conf of which Ubuntu uses the first (default / recommended one)

It seems to me, your issues is different to the original bug reporter, as the original bug reporter cannot resolve via $ systemd-resolve tool either, which is equivalnt to libnss-resolve. Please do not reuse somebody elses bug report for similar, but potentially different issues. And instead open a new bug report. Bug reports are cheap.

Revision history for this message
Steve Dodd (anarchetic) wrote :

@xnox:

Yes, on further investigation it looks like these might well be different bugs - OTOH the repeated issuance of queries to the upstream server, and the confusion about the server capabilities is very similar, which suggests a problem in the same code path may be being triggered. I will create a new bug once I am sure they are different.

Re. libnss-resolve, the man page still says:

"it is strongly recommended that local programs use the glibc NSS or bus APIs instead [of the stub resolver]"

But I do understand there are potential issues - might be nice to have the Ubuntu "position" documented (top of resolv.conf, perhaps).

Simon Kelley is looking at the dnsmasq issues.

Revision history for this message
Dimitri John Ledkov (xnox) wrote : Re: [Bug 1771353] Re: Server feature level is now lower than when we began our transaction. Restarting with new ID

On 3 August 2018 at 14:22, Steve Dodd <email address hidden> wrote:
> @xnox:
>
> Yes, on further investigation it looks like these might well be
> different bugs - OTOH the repeated issuance of queries to the upstream
> server, and the confusion about the server capabilities is very similar,
> which suggests a problem in the same code path may be being triggered. I
> will create a new bug once I am sure they are different.
>

Yeah, a lot of those debug output from resolved is just that debug
output, and it looks very similar for even regular normal operations.
E.g. the dns server on router is crap and I have transactions
restarting, lowering the supported level, then later it attempts to up
the supported level again only go back down again... So restarting of
transactions and changing ids is "normal" as it's simply "verbose".

> Re. libnss-resolve, the man page still says:
>
> "it is strongly recommended that local programs use the glibc NSS or bus
> APIs instead [of the stub resolver]"
>

That sounds like it predates implementation of stub-resolv.conf
generator with search stanzas and other improvements to the local stub
listener for large query sizes.

> But I do understand there are potential issues - might be nice to have
> the Ubuntu "position" documented (top of resolv.conf, perhaps).
>

I'm working on a blog post series / wiki page, and a talk for FOSDEM
about it. But it has been going slow...

> Simon Kelley is looking at the dnsmasq issues.
>

cool!

--
Regards,

Dimitri.

Revision history for this message
Steve Dodd (anarchetic) wrote :

The patch "resolved-Mitigate-DVE-2018-0001-by-retrying-NXDOMAIN-with.patch" for bug #1727237 seems to be the root cause of my problems (now reported separately in bug #1785383.)

As the patch changes the transaction restart logic it may be worth the OP rebuilding without that patch and retesting.

@xnox: my understanding is still that upstream recommendation is to use libnss-resolve, it is still there in the latest man pages. The handling of resolv.conf is to a certain extent orthogonal, as that file is only used by systemd to exchange nameserver info with "legacy" code..

Revision history for this message
Dan Streetman (ddstreet) wrote :

please reopen if this is still an issue

Changed in systemd (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.