Multiple lxc boostraps fail because stable is leaving stale locks

Bug #1351004 reported by Curtis Hovey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Ian Booth
1.20
Fix Released
Critical
Ian Booth

Bug Description

I am investigating a maddening cases where stale locks for the lxc template are left behind. First a test of stable (commit 1cd26425) failed because the services were stuck in pending. I found a stale lock left behind. I delete the lock, replayed the test and it passed.

Later testing commit 62e17263 on the devel branch, three other lxc deployment tests fail. All tests timed out because the services were stuck in pending. The previous test run was from the stable branch, implying that the locks were left after the env was destroyed. Those previous tests passed, but the test doesn't check for resources left behind. After cleaning the three machines, I re-ran the tests for devel an they passed. I confirmed that no locks were left behind. The test deploys two services, the lock is probably from the second service.

I going to watch the next run of stable and verify if locks are left behind. I will lower this bugs importance if this issue doesn't re-occur. It it does occur, I may add regression because this issue went from 1 per week to 3 times in 4 hours.

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: none → 1.21-alpha1
Curtis Hovey (sinzui)
no longer affects: juju-core
Revision history for this message
Curtis Hovey (sinzui) wrote :

This is a bug in stable. It always leaves a stale lock behind, which breaks the ru of other jujus. The environment is destroyed, the lock should be gone too since I can remove the template container with lxc.

Revision history for this message
Curtis Hovey (sinzui) wrote :

I think I need to write a pre/post process to clean lxc machines so that when other jujus run, they have a pristine environment.

Curtis Hovey (sinzui)
tags: added: regression
summary: - Multiple lxc boostraps fail because stable or devel is leaving stale
- locks
+ Multiple lxc boostraps fail because stable is leaving stale locks
Ian Booth (wallyworld)
Changed in juju-core:
milestone: none → 1.21-alpha1
importance: Undecided → High
status: New → Triaged
Revision history for this message
Ian Booth (wallyworld) wrote :

The new flock implementation doesn't care if they still there, and I read that removing them can cause issues if other processes hold the lock. And stress tests failed when doing so.

But leaving the files there causes older versions of Juju to get confused if a downgrade happens. So bootstrap and destroy have been changed to clean up the lock files.

Revision history for this message
Ian Booth (wallyworld) wrote :

Easiest thing for now is just to use a slightly different lock file name with flock so as not to confuse the older fslock implementation

Ian Booth (wallyworld)
Changed in juju-core:
assignee: nobody → Ian Booth (wallyworld)
status: Triaged → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.