race condition updating statefile

Bug #1160490 reported by Ray Link on 2013-03-26
44
This bug affects 12 people
Affects Status Importance Assigned to Milestone
ifupdown (Ubuntu)
Medium
Chris J Arges
Precise
Medium
Chris J Arges
Quantal
Medium
Chris J Arges
Raring
Medium
Chris J Arges
Saucy
Medium
Chris J Arges

Bug Description

SRU Justification:
[Impact]
 * Users will occasionally see their network interfaces not come up due to race conditions.

[Test Case]
 * See this comment: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/996369/comments/35

[Regression Potential]
 * This fix backports a change from upstream ifupdown. Instead of locking a statefile it locks a lockfile.

--

Ubuntu 12.04.2
ifupdown 0.7~beta2ubuntu8

Symptom: Every so often, /etc/init/network-interface.conf fails to bring up the loopback interface.

> Mar 25 16:39:37 XXXXXXXX kernel: [ 28.793922] init: network-interface (lo) pre-start process (1079) terminated with status 1

/var/log/upstart/network-interface-lo shows:

> ifup: failed to overwrite statefile /run/network/ifstate: No such file or directory

Relevant section of the ifup sources, in update_state():

        if (rename (tmpstatefile, statefile)) {
                fprintf(stderr,
                        "%s: failed to overwrite statefile %s: %s\n",
                        argv0, statefile, strerror(errno));
                exit (1);
        }

update_state() opens the statefile, gets a F_SETLKW lock on it, opens a tmpfile, filters the contents of the statefile into the tmpfile, closes the tmpfile, then renames the tempfile over the statefile.

Once the rename() happens in one instance of ifup, any other blocked instances are waiting around to lock a file that no longer exists in the filesystem. Overlap enough instances of ifup just right and you have them all locking different copies of statefile, which then doesn't prevent any of them from rename()ing tmpstatefile out from underneath the others, thus causing their own rename()s to fail with ENOENT.

Example:

Process A starts, opens statefile.
Process A locks statefile.
Process B starts, opens statefile.
Process B waits for lock on statefile.
Process A renames tmpstatefile to statefile and exits.
Process B acquires lock on *outdated* statefile FILE pointer.
Process C starts, opens current statefile (written by Process A).
Process C locks current statefile.
** Two ifups now have locks **
Process B renames tmpstatefile to statefile and exits.
Process C tries to rename tmpstatefile, fails because tmpstatefile has already been renamed out from under it by Process B.

NOTE: Since Process B was operating on an outdated statefile, it has also stomped over any changes made by Process A, so simply making the tmpstatefile process-specific to avoid rename()ing out from under each other won't help.

Related bugs:
 * bug 1226067: ifquery fails with bad file descriptor

Ray Link (rlink) wrote :

Note: Exact same locking semantics are present in latest ifupdown (0.7.40) from upstream Debian.

Suggested fix: Lock a lockfile, not the statefile.

Ray Link (rlink) wrote :

Bug reported upstream (to Debian) against latest version (0.7.40). Will update with Debain bug number when accepted.

Ray Link (rlink) wrote :
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ifupdown (Ubuntu):
status: New → Confirmed
Bryan Quigley (bryanquigley) wrote :

This is trivial to reproduce if you have enough interfaces:
1. Setup a multicore (I used 4 CPU) VM with 12 interfaces. I experienced similar failures with both bonded interfaces and 12 static configs.
2. Add something to boot to detect if the interfaces are correct and then reboot. I added to /etc/rc.local:
sleep 60

if [ `ifconfig | grep eth | wc -l` = 12 ]; then
 echo "is equal to 12"
 reboot
else
 echo "Things are broken!!!"
fi
3. Set a terminal to ping the vm every 60 seconds. (ping 192.168.122.95 -i 60)

Eventually the icmp_req will be in sequential order for more than 3 pings. This likely means the script has been tripped and one of the interfaces didn't come up.

I've also never got it to reproduce with a single core VM. It can take 500+ reboots to reproduce with 4 interfaces. But with 12 interfaces it only takes around 5 reboots.

Ray Link (rlink) wrote :

It can happen much more frequently than that.

I have three machines in particular that do weekly reboots. They all have two network interfaces, but only one of them is used. Every week, one or more of them will lose on an interface that matters (loopback or the used network interface), and sometimes it will continue to fail through several subsequent manual reboots before it comes back up again.

Chris J Arges (arges) on 2013-08-20
Changed in ifupdown (Ubuntu):
assignee: nobody → Chris J Arges (arges)
Chris J Arges (arges) on 2013-08-20
Changed in ifupdown (Ubuntu Precise):
assignee: nobody → Chris J Arges (arges)
Changed in ifupdown (Ubuntu Quantal):
assignee: nobody → Chris J Arges (arges)
Changed in ifupdown (Ubuntu Raring):
assignee: nobody → Chris J Arges (arges)
status: New → In Progress
Changed in ifupdown (Ubuntu Saucy):
status: Confirmed → In Progress
Changed in ifupdown (Ubuntu Quantal):
status: New → In Progress
Changed in ifupdown (Ubuntu Precise):
status: New → In Progress
importance: Undecided → Medium
Changed in ifupdown (Ubuntu Quantal):
importance: Undecided → Medium
Changed in ifupdown (Ubuntu Raring):
importance: Undecided → Medium
Changed in ifupdown (Ubuntu Saucy):
importance: Undecided → Medium
Chris J Arges (arges) wrote :

I've been able produce test fixes for p/q/r/s. I've tested P/S and have been able to continually reboot a machine for 12h and have the correct number of interfaces come up where before it took about 20m to reproduce the issue.

Here are the builds:
http://people.canonical.com/~arges/lp1160490/

Chris J Arges (arges) wrote :
Chris J Arges (arges) wrote :
Chris J Arges (arges) wrote :
Chris J Arges (arges) wrote :
Chris J Arges (arges) on 2013-08-21
description: updated
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.7.44ubuntu2

---------------
ifupdown (0.7.44ubuntu2) saucy; urgency=low

  * On upgrade from 0.7.5ubuntu4, handle the fact that our unmodified conffile
    may have a mismatch in the dpkg database and manually shuffle the file
    around on upgrade. LP: #1217263.

  [ Chris Arges ]
  * Backport a fix from upstream mercurial
    (http://anonscm.debian.org/hg/collab-maint/ifupdown/rev/a93db3ecb8f0)
    for a race condition when updating the state file. LP: #1160490
 -- Steve Langasek <email address hidden> Sun, 01 Sep 2013 06:43:53 +0000

Changed in ifupdown (Ubuntu Saucy):
status: In Progress → Fix Released
sles (slesru) wrote :

Hello!

Just hit this bug again on 12.04.3 server.

Sep 10 13:03:39 inetgw2 kernel: [ 6.415157] init: network-interface (vlan7) pre-start process (1404) terminated with status 1

ifup: failed to overwrite statefile /run/network/ifstate: No such file or directory

Could you, please, release package with fix for 12.04?

Thank you!

Hello Ray, or anyone else affected,

Accepted ifupdown into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/ifupdown/0.7~beta2ubuntu9 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ifupdown (Ubuntu Precise):
status: In Progress → Fix Committed
tags: added: verification-needed
Changed in ifupdown (Ubuntu Quantal):
status: In Progress → Fix Committed
Steve Langasek (vorlon) wrote :

Hello Ray, or anyone else affected,

Accepted ifupdown into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/ifupdown/0.7.2ubuntu3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Steve Langasek (vorlon) on 2013-09-11
Changed in ifupdown (Ubuntu Raring):
status: In Progress → Fix Committed
Bryan Quigley (bryanquigley) wrote :

Just confirmed the fix on 12.04, will check 12.10 now.

tags: added: verification-done-precise
Bryan Quigley (bryanquigley) wrote :

Confirmed fixed on 12.10 as well.

tags: added: verification-done-quantal
Scott Moser (smoser) wrote :

Bug 1226067 seems to possibly be fallout of this change.
I'm only suggesting that based on the 'statefile' and the fact that that bug exists in saucy but not (yet) in raring.

Scott Moser (smoser) wrote :

Ok. I just confirmed that this change caused bug 1226067.

 * boot fresh raring cloud image (20130914)
 * ifquery works as expected
 * enable saucy-proposed, apt-get update, apt-get install ifupdown
 * reboot
 * ifquery broken
    $ ifquery --list --allow auto
    ifquery: failed to lock lockfile /run/network/.ifstate.lock: Bad file descriptor
   $ dpkg-query --show ifupdown
   ifupdown 0.7.5ubuntu2.1

description: updated
Scott Moser (smoser) on 2013-09-16
tags: added: verification-failed-precise verification-failed-quantal verification-failed-raring
removed: verification-done-precise verification-done-quantal
Chris J Arges (arges) wrote :

Patches that fix this regression are in bug 1226067.

Brian Murray (brian-murray) wrote :

Hello Ray, or anyone else affected,

Accepted ifupdown into raring-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/ifupdown/0.7.5ubuntu2.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Brian Murray (brian-murray) wrote :

Hello Ray, or anyone else affected,

Accepted ifupdown into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/ifupdown/0.7.2ubuntu4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Brian Murray (brian-murray) wrote :

Hello Ray, or anyone else affected,

Accepted ifupdown into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/ifupdown/0.7~beta2ubuntu10 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-done-precise
removed: verification-failed-precise
tags: added: verification-done-raring
removed: verification-failed-raring
Bryan Quigley (bryanquigley) wrote :

Verified fixed on P, Q, and R; and also that the regression (1226067) was also fixed.

tags: added: verification-done-quantal
removed: verification-failed-quantal verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.7.5ubuntu2.2

---------------
ifupdown (0.7.5ubuntu2.2) raring-proposed; urgency=low

  * Backport a fix from upstream mercurial
    (http://anonscm.debian.org/hg/collab-maint/ifupdown/rev/fb3055c2c4f0)
    for a regression introduced by a93db3ecb8f0. LP: #1226067

ifupdown (0.7.5ubuntu2.1) raring; urgency=low

  * Backport a fix from upstream mercurial
    (http://anonscm.debian.org/hg/collab-maint/ifupdown/rev/a93db3ecb8f0)
    for a race condition when updating the state file. LP: #1160490
 -- Chris J Arges <email address hidden> Mon, 16 Sep 2013 11:37:31 -0500

Changed in ifupdown (Ubuntu Raring):
status: Fix Committed → Fix Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.7.2ubuntu4

---------------
ifupdown (0.7.2ubuntu4) quantal-proposed; urgency=low

  * Backport a fix from upstream mercurial
    (http://anonscm.debian.org/hg/collab-maint/ifupdown/rev/fb3055c2c4f0)
    for a regression introduced by a93db3ecb8f0. LP: #1226067

ifupdown (0.7.2ubuntu3) quantal; urgency=low

  * Backport a fix from upstream mercurial
    (http://anonscm.debian.org/hg/collab-maint/ifupdown/rev/a93db3ecb8f0)
    for a race condition when updating the state file. LP: #1160490
 -- Chris J Arges <email address hidden> Mon, 16 Sep 2013 11:37:22 -0500

Changed in ifupdown (Ubuntu Quantal):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.7~beta2ubuntu10

---------------
ifupdown (0.7~beta2ubuntu10) precise-proposed; urgency=low

  * Backport a fix from upstream mercurial
    (http://anonscm.debian.org/hg/collab-maint/ifupdown/rev/fb3055c2c4f0)
    for a regression introduced by a93db3ecb8f0. LP: #1226067

ifupdown (0.7~beta2ubuntu9) precise; urgency=low

  * Backport a fix from upstream mercurial
    (http://anonscm.debian.org/hg/collab-maint/ifupdown/rev/a93db3ecb8f0)
    for a race condition when updating the state file. LP: #1160490
 -- Chris J Arges <email address hidden> Mon, 16 Sep 2013 11:37:14 -0500

Changed in ifupdown (Ubuntu Precise):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.