stale dnsmasq pid file causes network start failure

Bug #1698712 reported by Paul Collins
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxd (Ubuntu)
Fix Released
Medium
Stéphane Graber

Bug Description

Today I noticed that none of my containers were getting IP addresses.

I found the following message in lxd.log:

err="readlink /proc/591/exe: no such file or directory" lvl=eror msg="Failed to bring up network" name=lxdbr0 t=2017-06-19T14:03:36+1200

PID 591:

$ ps 591
  PID TTY STAT TIME COMMAND
  591 ? S< 0:00 [loop5]
$ sudo ls -l /proc/591/exe
ls: cannot read symbolic link '/proc/591/exe': No such file or directory
lrwxrwxrwx 1 root root 0 Jun 15 10:02 /proc/591/exe
$ _

After some searching around I found https://github.com/lxc/lxd/issues/2767 which suggested a problem with the dnsmasq pid file, and indeed:

$ cat /var/lib/lxd/networks/lxdbr0/dnsmasq.pid
591
$ _

I deleted this file and restarted the lxd service and my containers shortly received IP addresses. This file was probably became stale thanks to a recent hard reboot of the host machine. It would be best if LXD could recover from this condition itself somehow, or at least provide a hint to the operator, as it seems non-trivial to debug, but storing this piece of state in a location that does not survive a reboot might also work.

Revision history for this message
Stéphane Graber (stgraber) wrote :

What version of LXD is that?

Changed in lxd (Ubuntu):
status: New → Incomplete
Revision history for this message
Paul Collins (pjdc) wrote :

2.12-0ubuntu3

Jacek Nykis (jacekn)
Changed in lxd (Ubuntu):
status: Incomplete → New
Changed in lxd (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Changed in lxd (Ubuntu):
assignee: nobody → Stéphane Graber (stgraber)
status: Triaged → In Progress
Revision history for this message
Stéphane Graber (stgraber) wrote :

Fix sent upstream

Changed in lxd (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lxd - 2.15-0ubuntu5

---------------
lxd (2.15-0ubuntu5) artful; urgency=medium

  * Cherry-pick upstream fixes:
    - 0006-lxc-config-Removal-of-multiple-devices-at-once.patch (LP: #1690299)
    - 0007-network-Don-t-fail-on-non-process-PIDs.patch (LP: #1698712)
    - 0008-config-Try-to-be-clever-about-in-snapshots.patch (LP: #1694855)
    - 0009-Fix-readonly-mode-for-directory-mount.patch
    - 0010-client-Fix-race-condition-in-operation-handling.patch
    - 0011-import-keep-volatile-keys.patch
    - 0012-import-remove-last-dependency-on-symlink.patch
    - 0013-Better-handle-errors-in-memory-reporting.patch
    - 0014-client-Don-t-live-migrate-stopped-containers.patch

 -- Stéphane Graber <email address hidden> Mon, 03 Jul 2017 18:19:16 -0400

Changed in lxd (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.