[SRU] resolvconf is racy, which leads to broken resolv.conf in parallel calls

Bug #1825194 reported by Alfonso Sanchez-Beato on 2019-04-17
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
resolvconf (Ubuntu)
Critical
Unassigned
Xenial
Critical
Unassigned

Bug Description

[Impact]

The bug can lead to non-working DNS. This is a critical bug that needs to be fixed in the stable release.

The patch fixes the bug by using a file lock via the flock command.

[Test case]

This script should eventually stop with the current version, thing that should not happen after applying the patch:

while true; do
    printf "\n" | sudo resolvconf -a NetworkManager
    printf "nameserver 8.8.8.8\n" | sudo resolvconf -a networkd
    wait
    printf "nameserver 8.8.8.8\n" |
        sudo resolvconf -a NetworkManager &
    printf "\n" | sudo resolvconf -a networkd
    wait
    ping -c 1 www.google.com || break
done

[Regression Potential]

The patch is small and what it does is straightforward. It has also been thoroughly tested. However, the effect if something is wrong would be not being able to resolve DNS names, which is disastrous, so it needs to be very well tested for the SRU.

[Other Info]

It has been found that simultaneous calls to resolvconf can lead to inconsistent content in resolv.conf. For instance, no nameservers while NetworkManager has one in its record (see LP: #1824395):

$ cat /run/resolvconf/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN

$ cat /run/resolvconf/interface/NetworkManager
nameserver 192.168.1.6
nameserver 192.168.1.2

This can happen easily when calling "netplan apply", which can re-start both networkd and NM. resolvconf is called at that point by the "systemd-networkd-resolvconf-update.service" service, and also directly by NetworkManager, which leads to the situation described above. This is not surprising as there is nothing preventing different instances of resolvconf to access the same files. This sort of situation can be reproduced by running in a loop commands like:

$ printf "\n" | sudo resolvconf -a NetworkManager & printf "\n" | sudo resolvconf -a networkd
$ printf "nameserver 80.58.61.250\n" | sudo resolvconf -a NetworkManager & printf "\n" | sudo resolvconf -a networkd

Eventually, this leads to a resolv.conf that is not consistent with the last run command.

description: updated

The attachment "debdiff.patch" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Tony Espy (awe) wrote :

@Alfonso

What's the use case for calling 'netplan apply' in this scenario? Is NM the configured as the netplan renderer?

Is there ever a case where "systemd-networkd-resolvconf-update.service" can call resolvconf even though NM is the renderer from the start (i.e. there's no transition from networkd to NM)?

I've always contended that mixing two network management services on the same device was asking for trouble...

tags: added: disco
tags: added: rls-ee-incoming
Steve Langasek (vorlon) wrote :

I assume this is primarily an issue for UC16, since in UC18 we no longer use resolvconf at all, and have targeted the bug accordingly. Do you want to prepare a corresponding debdiff for xenial, and adjust the bug description to include a complete SRU template per https://wiki.ubuntu.com/StableReleaseUpdates#SRU_Bug_Template ?

Changed in resolvconf (Ubuntu):
status: New → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package resolvconf - 1.79ubuntu13

---------------
resolvconf (1.79ubuntu13) disco; urgency=medium

  * bin/resolvconf: use flock so resolvconf can be called in parallel
    safely (LP: #1825194).

 -- Alfonso Sanchez-Beato (email Canonical) <email address hidden> Wed, 17 Apr 2019 16:53:53 +0200

Changed in resolvconf (Ubuntu):
status: Fix Committed → Fix Released
Simon Quigley (tsimonq2) wrote :

Unsubscribing the Sponsors Team because there is no SRU bug template yet.

Patch for xenial.

description: updated

@vorlon, @tsimonq2, patch for xenial attached, and description changed for SRU. Sponsors teamm re-subscribed.

summary: - resolvconf is racy, which leads to broken resolv.conf in parallel calls
+ [SRU] resolvconf is racy, which leads to broken resolv.conf in parallel
+ calls
Simon Quigley (tsimonq2) on 2019-04-22
Changed in resolvconf (Ubuntu):
importance: Undecided → Critical
Changed in resolvconf (Ubuntu Xenial):
importance: Undecided → Critical
Steve Langasek (vorlon) wrote :

uploaded to the SRU queue.

Changed in resolvconf (Ubuntu Xenial):
status: New → In Progress
Steve Langasek (vorlon) wrote :

(please note that the changelog had to be adjusted; the current version of resolvconf in xenial-updates is 1.78ubuntu6.)

Robie Basak (racb) wrote :

Review question: what ensures that /run/resolvconf exists before flock is called? I only see an "mkdir -p" inside the block, not outside it. flock doesn't seem to create parent directories automatically in my test, and I don't see a tmpfiles.d entry.

Steve Langasek (vorlon) wrote :

Thanks Robie, that's a very good point and something I didn't notice while sponsoring. Alfonso, do you want to adjust the patch to address this? (That will also give us a chance to reupload the SRU and remove the misapplied diff cruft.)

Steve Langasek (vorlon) wrote :

On Wed, May 01, 2019 at 05:25:22PM -0000, Robie Basak wrote:
> Will this also need addressing in the development release?

It looks like it to me.

An upload of resolvconf to xenial-proposed has been rejected from the upload queue for the following reason: "Needs adjustment; see https://bugs.launchpad.net/ubuntu/+source/resolvconf/+bug/1825194/comments/11".

Thanks Robie, good catch. Please find attached the new patch for xenial: I have moved a bit below the call to flock to ensure the dir was created - the lines now out of the lock do not change any files so that should not be a problem.

Hello Alfonso, or anyone else affected,

Accepted resolvconf into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/resolvconf/1.78ubuntu7 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in resolvconf (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-xenial
Robie Basak (racb) wrote :

Unsubscribing ~ubuntu-sponsors.

Are you planning on fixing Bionic, Cosmic or Disco?

On Wed, May 15, 2019 at 12:22:20PM -0000, Robie Basak wrote:
> Unsubscribing ~ubuntu-sponsors.

> Are you planning on fixing Bionic, Cosmic or Disco?

Per
https://bugs.launchpad.net/ubuntu/+source/resolvconf/+bug/1825194/comments/4,
I believe this is "uninteresting" to fix for releases other than xenial;
xenial needs it because UC16 is netplan+resolvconf and may have both
networkd and NetworkManager backends on some devices, UC18 does not use
resolvconf, and Ubuntu 18.04 classic is unlikely to have users making use of
resolvconf+netplan (since resolvconf is only present by default on upgrade
for users who have their network configured using ifupdown).

Robie Basak (racb) wrote :

On Wed, May 15, 2019 at 12:42:24PM -0000, Steve Langasek wrote:
> Per
> https://bugs.launchpad.net/ubuntu/+source/resolvconf/+bug/1825194/comments/4,
> I believe this is "uninteresting" to fix for releases other than xenial;
> xenial needs it because UC16 is netplan+resolvconf and may have both
> networkd and NetworkManager backends on some devices, UC18 does not use
> resolvconf, and Ubuntu 18.04 classic is unlikely to have users making use of
> resolvconf+netplan (since resolvconf is only present by default on upgrade
> for users who have their network configured using ifupdown).

Ah sorry. I remember reading that when I looked at that last time, but
missed that today. Fair enough.

I have tested on xenial, the problem is solved with the change.

ubuntu@xenial:~$ apt policy resolvconf
resolvconf:
  Installed: 1.78ubuntu7
  Candidate: 1.78ubuntu7
  Version table:
 *** 1.78ubuntu7 500
        500 http://archive.ubuntu.com/ubuntu xenial-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     1.78ubuntu6 500
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
     1.78ubuntu2 500
        500 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages
ubuntu@xenial:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.6 LTS
Release: 16.04

tags: added: verification-done-xenial
removed: verification-needed-xenial
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers