resolver regression on ubuntu-core from #1636912

Bug #1659195 reported by Michael Vogt
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

We had a bunch of test failures in our automatic tests for snapd since this ~Saturday (21.01.2017). All on core, all with errors like:
"""
error: cannot install "test-snapd-tools": Get https://search.apps.ubuntu.com/api/v1/snaps/details/test-snapd-tools?channel=stable&fields=anon_download_url%2Carchitecture%2Cchannel%2Cdownload_sha3_384%2Csummary%2Cdescription%2Cdeltas%2Cbinary_filesize%2Cdownload_url%2Cepoch%2Cicon_url%2Clast_updated%2Cpackage_name%2Cprices%2Cpublisher%2Cratings_average%2Crevision%2Cscreenshot_urls%2Csnap_id%2Csupport_url%2Ctitle%2Ccontent%2Cversion%2Corigin%2Cdeveloper_id%2Cprivate%2Cconfinement: dial tcp: lookup search.apps.ubuntu.com on [::1]:53: read udp [::1]:41766->[::1]:53: read: connection refused
"""

The key here is the: "dial tcp: lookup search.apps.ubuntu.com on [::1]:53: read udp [::1]:41766->[::1]:53: read: connection refused". It was random but in the ~150 integration tests we run for the core image we had about 5 of those everytime. Further debugging yielded that the /etc/resolv.conf is empty when the errors happen.

It looks like the fix for https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1636912 is causing this behaviour for us. Reverting this change gave us stable tests again. I also tried to update to the latest resolvconf (the one that is reference in #1649931 and also in the followup bug #1649931). Using this resolvconf improved the situation dramatically. However we still got these errors from time to time with just updating resolvconf to the version in xenial-proposed. With the revert of #1649931 we have no prolbems so far in our test.

If there is anything I can try in our image PPA to help with a fix I will be happy to do that.

Thanks,
 Michael

Michael Vogt (mvo)
summary: - resolver regression on ubuntu-core from #1649931
+ resolver regression on ubuntu-core from #1636912
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Is this with zesty; xenial; or both?

Revision history for this message
Michael Vogt (mvo) wrote :

This is in xenail, more specifically in the ubuntu core system.

Revision history for this message
Steve Langasek (vorlon) wrote :

You say that the resolvconf SRU from #1649931 fixed it /mostly/ but not in all cases. If you are able to reproduce this problem, can you provide journalctl output from the affected environment so we can see what isn't happening in the right order? Do you have step-by-step instructions we can follow to reproduce the problem ourselves?

Revision history for this message
Michael Vogt (mvo) wrote :

Unfortunately there is no easy way to reproduce without building a new core. I build a core snap with the "networkd-allow-networkd-to-start-in-early-boot.patch" included.

With that our tests fail all over the place, e.g.: https://s3.amazonaws.com/archive.travis-ci.org/jobs/224387435/log.txt - search for "dial tcp: lookup".

I created an image with the problematic core (it is revision 1754) and ran the testsuite:

With an image generated with core r1754:
$ kvm -m 1500 -redir tcp:10022::22 ./ubuntu-core-16-amd64.img/pc.img -snapshot
$ export SPREAD_EXTERNAL_ADDRESS=localhost:10022
$ ./tests/lib/external/prepare-ssh.sh localhost 10022
$ spread -v -reuse external:ubuntu-core-16-64

but on qemu it is fine. However when I try this on linode I get:
...
error: cannot install "jq": Get https://search.apps.ubuntu.com/api/v1/snaps/details/jq?channel=stable&fields=anon_download_url%2Carchitecture%2Cchannel%2Cdownload_sha3_384%2Csummary%2Cdescription%2Cdeltas%2Cbinary_filesize%2Cdownload_url%2Cepoch%2Cicon_url%2Clast_updated%2Cpackage_name%2Cprices%2Cpublisher%2Cratings_average%2Crevision%2Cscreenshot_urls%2Csnap_id%2Csupport_url%2Ccontact%2Ctitle%2Ccontent%2Cversion%2Corigin%2Cdeveloper_id%2Cprivate%2Cconfinement: dial tcp: lookup search.apps.ubuntu.com on [::1]:53: read udp [::1]:54730->[::1]:53: read: connection refused
...
2017/04/21 20:54:24 Failed tasks: 6
    - linode:ubuntu-core-16-64:tests/main/auto-aliases
    - linode:ubuntu-core-16-64:tests/main/searching
    - linode:ubuntu-core-16-64:tests/main/snap-connect
    - linode:ubuntu-core-16-64:tests/main/snap-download
    - linode:ubuntu-core-16-64:tests/main/ubuntu-core-classic
    - linode:ubuntu-core-16-64:tests/main/ubuntu-core-create-user
2017/04/21 20:54:24 Failed task prepare: 10
    - linode:ubuntu-core-16-64:tests/main/auto-refresh
    - linode:ubuntu-core-16-64:tests/main/interfaces-content
    - linode:ubuntu-core-16-64:tests/main/interfaces-content-empty-content-attr
    - linode:ubuntu-core-16-64:tests/main/interfaces-fuse_support
    - linode:ubuntu-core-16-64:tests/main/interfaces-kernel-module-control
    - linode:ubuntu-core-16-64:tests/main/interfaces-snapd-control
    - linode:ubuntu-core-16-64:tests/main/refresh:strict_remote
    - linode:ubuntu-core-16-64:tests/main/revert-devmode:remote
    - linode:ubuntu-core-16-64:tests/main/revert:remote
    - linode:ubuntu-core-16-64:tests/main/snap-auto-mount
2017/04/21 20:54:24 Failed task restore: 1
    - linode:ubuntu-core-16-64:tests/main/snap-auto-mount

so ~10% of the tests on linode fail. Unfortunately I don't know yet why this is fine in qemu but not on linode.

Revision history for this message
Steve Langasek (vorlon) wrote :

How do you provision the nodes in linode? Do you use cloud-init? cloud-init interacts with the network-online.target, so that could be related.

Revision history for this message
Dan Streetman (ddstreet) wrote :

please reopen if this is still an issue

Changed in systemd (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.