"Lock was renamed into place, but now is missing"

Bug #680529 reported by Gary Weinfurther
128
This bug affects 21 people
Affects Status Importance Assigned to Milestone
QBzr
Confirmed
High
Unassigned

Bug Description

I'm using TortoiseBzr. I created a new branch of my project, and now when I try to commit files, I get the following error:

Run command: bzr commit -m "added search icon" public_html/img/search-icon.png
bzr: ERROR: Cannot lock LockDir(file:///L:/IPD/Client/Project/2010-11/web/source/branch-php/.bzr/branch/lock): lock was renamed into place, but now is missing!

If I try to repeat the commit, I get this:

Run command: bzr commit -m "added search icon" public_html/img/search-icon.png
Unable to obtain lock file:///L:/IPD/Client/Project/2010-11/web/source/branch-php/ held by Gary <email address hidden>
at Machine [process #4996], acquired 3 minutes, 7 seconds ago.
Will continue to try until 10:29:16, unless you press Ctrl-C.
See "bzr help break-lock" for more.

I click the Cancel button to get out of the Commit dialog and get the following error:

bzr: ERROR: [Errno 9] Bad file descriptor

I then must issue a break-lock command, which is successful. But if I attempt to commit again, the same thing happens.

Tags: error lock
Revision history for this message
Gary Weinfurther (gary-weinfurther) wrote :
Revision history for this message
Gary Weinfurther (gary-weinfurther) wrote :

I just upgraded to the latest version of Bazaar: 2.3b3. I deleted a folder in my working copy and committed it. Although the commit itself was successful, as soon as I clicked the Close button on the commit window, I got the "Lock was renamed into place, but now is missing" error again. I am attaching a screenshot.

When I clicked the "Ignore" button on the error, I got the "Bad file descriptor" error again.

tags: added: error lock
Revision history for this message
Gary Weinfurther (gary-weinfurther) wrote :

Here is a screen shot of the "Bad file descriptor" error.

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 680529] Re: "Lock was renamed into place, but now is missing"

This indicates some misbehaviour in the filesystem under bzr: we
rename A to B, and then find B doesn't exist. I think it is a dupe.

What is the L: drive?

--
Martin

Revision history for this message
Gary Weinfurther (gary-weinfurther) wrote :

The L: drive is a shared network drive where we keep the shared repository.

Revision history for this message
Gary Weinfurther (gary-weinfurther) wrote :

Each time it happens, I have to issue a break-lock.

Revision history for this message
Martin Pool (mbp) wrote :

What I'd suggest as a workaround is to store the repository on a local disk and use bzr+ssh for remote access.

We can work around this, I expect, by pausing for a bit to allow the rename to take place.

Vincent Ladeuil (vila)
Changed in bzr:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
Guy Kroizman (kroizguy) wrote :

I got a similar message:
Run command: bzr commit -m "blalabla...

bzr: ERROR: Could not acquire lock "X:/Projects/TSA_BSS/Versions/V1/Library Functions/.bzr/checkout/dirstate": (32, 'CreateFileW', 'The process cannot access the file because it is being used by another process.')

I am using XP and the repository is on a network drive.

Revision history for this message
Daniel Bela (dirtyhawk1024) wrote :

Hi folks,

we find us in the identical situation. Before finding this bug in launchpad, we did a little bit of testing and figured out that the situation only arises (in our specific layout) if Bazaar Explorer is used to commit a single file. We do not use TortoiseBzr, so I could not test it.

Interestingly commiting via command line works fine.

Our setup is a Windows Server 2008 R2 running terminal services, with the repository to be commited to being located a mapped network drive.

I hope this can help a tiny bit.

Best regeards
Bela

Revision history for this message
Daniel Bela (dirtyhawk1024) wrote :

Hi again,

one minute ago I could reproduce the behavior by committing via command line for the first time. I don't see any difference i made to former tries but this: I used the absolute file path of the file to be committed when triggering the "bzr commit".

However, the problem seems to be the same but the symptom is slightly different: The error message did not tell me there where problems with "./.bzr/branch/lock" but "./.bzr/repository/lock".

Could this be caused by the network drive's slower response speed (compared to a local disk)? Then it possibly could be solved by waiting a few milliseconds before creating/removing the lock file.

Devs: What do you think?

Best regards,
Bela

P.S.: sorry for the typos in my last message...

Revision history for this message
Martin Pool (mbp) wrote :

On 20 January 2011 05:17, Daniel Bela <email address hidden> wrote:
> one minute ago I could reproduce the behavior by committing via command
> line for the first time. I don't see any difference i made to former
> tries but this: I used the absolute file path of the file to be
> committed when triggering the "bzr commit".
>
> However, the problem seems to be the same but the symptom is slightly
> different: The error message did not tell me there where problems with
> "./.bzr/branch/lock" but "./.bzr/repository/lock".

Thanks for the data. It's the same problem, just occurring in a
different context.

> Could this be caused by the network drive's slower response speed
> (compared to a local disk)? Then it possibly could be solved by waiting
> a few milliseconds before creating/removing the lock file.

Yes, that's probably what it is, and that's probably what would fix
it. If you would like to try a patch to the point it works reliably
on your network, I will help you finish and land it. Basically you
just need to insert a loop (perhaps up to say 5 times), and a
time.sleep() in lockdir.py.

--
Martin

Revision history for this message
Daniel Bela (dirtyhawk1024) wrote :

Hi again,

> Yes, that's probably what it is, and that's probably what would fix
> it. If you would like to try a patch to the point it works reliably
> on your network, I will help you finish and land it. Basically you
> just need to insert a loop (perhaps up to say 5 times), and a
> time.sleep() in lockdir.py.

well, I probably have to give it a try. Could you point me to the function in which I would insert the loop?
Is that lock_write(), or lock_read(), or unlock()?

I'm in no way a programmer, so this is kind of hard for me. But I'll try for sure...

Regards
Bela

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 1/25/2011 7:06 AM, Daniel Bela wrote:
> Hi again,
>
>> Yes, that's probably what it is, and that's probably what would fix
>> it. If you would like to try a patch to the point it works reliably
>> on your network, I will help you finish and land it. Basically you
>> just need to insert a loop (perhaps up to say 5 times), and a
>> time.sleep() in lockdir.py.
>
> well, I probably have to give it a try. Could you point me to the function in which I would insert the loop?
> Is that lock_write(), or lock_read(), or unlock()?
>
> I'm in no way a programmer, so this is kind of hard for me. But I'll try
> for sure...
>
> Regards
> Bela
>

It would be in bzrlib/lockdir.py

As part of _attempt_lock (on line 251), you can see that it calls
"self.peek()" to check that we actually managed to lock correctly.

My guess is that the earlier "self.transport.rename()" call is
succeeding, but that by the time we try to peek() we have failed to find
the file.

You could potentially put the wait loop in either peek or in
_attempt_lock. Though I'll mention that there are quite a few callers of
peek, but I haven't thought through whether it would be appropriate for
them to block for a few seconds if they can't find the file (some of
them probably not).

To help in debugging this in the future, I would probably also add a
mutter('After successfully renaming the lock, we failed to find a lock
file, trying again in %.1f seconds.')

Or something along those lines.

As long as it doesn't slow down the common case of the lock file
appearing immediately, we could probably wait a second or two.

Note that if we can't obtain the lock in the first place (it is already
held), we default to waiting up to 30 seconds, polling every 1.0s to see
if the lock has been released. I think those numbers are now a bit too
high, but that does give a baseline we can think about.

I would probably peek every 100ms for at most 1.0s, but really it
depends on your filesystem consistency timeout. (If it takes 10s for a
renamed file to show up at the new location, then that's how long you
have to wait.)

I would probably set those numbers as values that you can get at
externally (like _DEFAULT_TIMEOUT_SECONDS), and we might consider having
them as a configuration setting.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk0/IiwACgkQJdeBCYSNAAPXgwCfXDKeKy/1hf5GYmcm74h8e1io
/SQAoJkMEoQQayCteWjbUFvUjO0Bj6QT
=uvaE
-----END PGP SIGNATURE-----

Martin Pool (mbp)
Changed in bzr:
importance: Low → High
Revision history for this message
foof (tidelipop) wrote :

I got this problem too and it is really annoying, so do you have a solution in sight soon?

Revision history for this message
Franjo Stipanovic (fritzfs) wrote :

I've decided to use bzr in my production, but this is pretty annoying issue. When will the patch be released?

Revision history for this message
John A Meinel (jameinel) wrote :

If you can reproduce this, a few more details would be helpful. When it happens, can you tell if the 'held' directory is actually present. Meaning the rename does succeed, just not immediately.

The primary workaround is to not use a file system for which doing:
 mkdir a
 echo content > a/info
 rename a b
 open b/info

fails.

Such as using bzr+ssh instead of a locally mounted filesystem.

If you do find that the directory does exist eventually, then a small wait loop in the inner function that checks after-rename would be ok. It should succeed immediately in most systems, so it won't delay anything unless you have a system which is actually either genuinely failing or just sluggish to actually handle the rename operation.

Revision history for this message
Franjo Stipanovic (fritzfs) wrote :

John, I can reproduce it because it happens every time :-(

Yes, held directory is really present.

This was the scenario (all using Bazaar Explorer):

  Run command: bzr commit -m "Initial import." x.txt
  Committing to: Z:/test/trunk/
  added x.txt
  Committed revision 1.

Now I hit the "Close" button and get this message:

  bzr: ERROR: Cannot lock LockDir(file:///Z:/test/trunk/.bzr/branch/lock): lock was renamed into place, but now is missing.

After that I hit "Ignore" button.

I modify the file and try to commit a change and I get:

  Run command: bzr commit -m Added. x.txt
  Unable to obtain lock file:///Z:/test/trunk/ held by ...
  at ... [process #6836], acquired 2 minutes, 15 seconds ago.
  Will continue to try until 13:33:20, unless you press Ctrl-C.
  See "bzr help break-lock" for more.

I've checked, the lock file exists at z:\test\trunk\.bzr\branch\lock\held\info.

Revision history for this message
Martin Pool (mbp) wrote :

Franjo, for the sake of having a record, can you tell us what OS,
filesystem, network filesystem, etc you're using.

Martin

Revision history for this message
Franjo Stipanovic (fritzfs) wrote :

Martin, sure, no problem.

Repository is stored on Windows Server 2008 R2 and partition type is NTFS. I've mapped repository directory on my local computer as Z drive, I'm using Windows 7. Network speed (tested with LAN Speed) for writing is 439 Mbps and reading is 405 Mbps.

Do you need anything else?

Revision history for this message
Kevin Blain (kevin+) wrote :

Hi. I recently downloaded the Bazaar system to evaluate, and it looks like it fits my needs just right, however, I also get this problem.

I have also replicated the problem on the same network but from Client to Client simply using a shared folder, though I did not get the error the very first time I tried it.

Also using Windows 7 clients and SBS 2008.]

I'll watch this with interest.

Revision history for this message
Mark Brown (mark-mailsolve) wrote :

Just to comment that I have this bug as well - using a network share for the repository (Server is Windows Server 2008, clients are XP & Vista).

For the moment I'll stick with the break-lock method but hopefully this is a quick resolution for this problem.

Revision history for this message
Barney Gwyther (barney-gwyther) wrote :

I'm also seeing this issue. Repository is on a network share (Windows Server 2008) and my clients are running a mix of Windows 7 and XP.

If I do my commits from the command line they succeed. I only see the locking problem when committing via Bazaar Explorer.

I hope a satisfactory resolution can be found soon. Apart from this issue it's proving to be a hit with my users.

Martin Pool (mbp)
Changed in bzr:
assignee: nobody → canonical-bazaar (canonical-bazaar)
Revision history for this message
Franjo Stipanovic (fritzfs) wrote :

Are there any news regarding this? Thanks!

Revision history for this message
Martin Pool (mbp) wrote :

No fix yet sorry, but it is on our shortlist. If you want to have a
go at adding a retry when this happens we will help you land it.

Martin

Revision history for this message
Franjo Stipanovic (fritzfs) wrote :

Deal. How can I do it?

Revision history for this message
Martin Pool (mbp) wrote :

That's great. This seems to consistently hit some people but we can't reproduce it, so your help would be great.

The first thing I would try is just adding a sleep(10) before the code that checks the lock and raises this exception. If that fixes it, we know it's definitely timing related. In that case we can probably just sleep and retry a few times until the lock turns up.

If it is not timing related, we need to work out what is going wrong: perhaps by going into the python debugger before that line and seeing what is present inthe directory.

Revision history for this message
Martin Pool (mbp) wrote :

... if you want to talk about it more, or you have some more data, just ask here or on the bazaar list or in #bzr on freenode.

Revision history for this message
Franjo Stipanovic (fritzfs) wrote :

Martin, sorry for the delay :(

I've prepared modified version of lockdir.py, but I wanted to test this case further with original bzr and this is what I discovered.

If I do: bzr commit -m "foo" I don't get error.

If I do: bzr qcommit -m "foo" I get error/warning every time.

Is there a difference in commit and qcommit implementation? It shouldn't be since qcommit is only gui version?

Revision history for this message
John A Meinel (jameinel) wrote :

bzr qcommit uses different code to commit than just 'bzr commit'. I think the final step is to spawn "bzr commit" with special arguments. However, they might still be holding a lock or something else when they do so.

Revision history for this message
Franjo Stipanovic (fritzfs) wrote :

I see, so there's a problem. Anyone familiar with qbzr code? :)
Perhaps you can reproduce the bug now if you use qbzr?

Revision history for this message
Barney Gwyther (barney-gwyther) wrote :

Coincidentally, a colleague and I sat down and had a bit of a look at this yesterday. We narrowed the problem down to the commit part of qbzr too.

It appears to be related to some functionality around qbzr trying to populate the commit message text box: if you attempted a commit previously but failed, that commit message gets written to a variable called 'message' in branch.conf. The lock issue seems to be around updating that file (either to remove the message on successful commit or to add the message on commit failure).

Commenting out line 724 (self._save_or_wipe_commit_data()) and line 720 (self._save_or_wipe_commit_data()) of plugins\qbzr\lib\commit.py appears to eliminate the problem for me. Of course, I lose the functionality that stores my commit message between failed commits but this is something I can live with easier than the locking issue.

Please note that I'm not especially proficient with Python nor am I familiar with the bazaar or qbzr source. The change I made may well have an undesirable impact elsewhere!

Fingers crossed that's useful and if there's anything else I can do to help please let me know.

Revision history for this message
Franjo Stipanovic (fritzfs) wrote :

Not related directly to this issue, but I think qbzr should use bzr underneath. Sorry for off-topic comment.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 6/9/2011 11:37 AM, Barney Gwyther wrote:
> Coincidentally, a colleague and I sat down and had a bit of a look at
> this yesterday. We narrowed the problem down to the commit part of qbzr
> too.
>
> It appears to be related to some functionality around qbzr trying to
> populate the commit message text box: if you attempted a commit
> previously but failed, that commit message gets written to a variable
> called 'message' in branch.conf. The lock issue seems to be around
> updating that file (either to remove the message on successful commit or
> to add the message on commit failure).
>
> Commenting out line 724 (self._save_or_wipe_commit_data()) and line 720
> (self._save_or_wipe_commit_data()) of plugins\qbzr\lib\commit.py appears
> to eliminate the problem for me. Of course, I lose the functionality
> that stores my commit message between failed commits but this is
> something I can live with easier than the locking issue.
>
> Please note that I'm not especially proficient with Python nor am I
> familiar with the bazaar or qbzr source. The change I made may well have
> an undesirable impact elsewhere!
>
> Fingers crossed that's useful and if there's anything else I can do to
> help please let me know.
>

If this is the cause, then probably the other open question is why qbzr
isn't sharing the branch lock. My guess is that qbzr is asynchronously
performing the commit via a subprocess, and then going on to update
branch.conf. Recently we changed the bzrlib code, so that updates to
config files take a write lock on the branch, so that you don't get
concurrent writes overwriting eachother. I'm guessing the qbzr was
assuming it could write to the .conf file without a lock (which used to
be the case.)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3wn7IACgkQJdeBCYSNAANF6gCfT0BcQM1AVojvnYBrlqkMQa+G
7d0An2uFyg/5i2fPSbxuD6IbP8Cato1e
=i9JW
-----END PGP SIGNATURE-----

affects: bzr → qbzr
Revision history for this message
Alexander Belchenko (bialix) wrote :

John, I'm not sure what do you mean.

    def wipe_commit_data(self):
        if (self.tree.branch.get_physical_lock_status()
            or self.tree.branch.is_locked()):
            # XXX maybe show this in a GUI MessageBox (information box)???
            from bzrlib.trace import warning
            warning("Cannot wipe commit data because the branch is locked.")
            return
        self.ci_data.wipe()

We don't try to save commit data if branch is still locked, because on save we have to get the lock and therefore we got deadlock in the past.

Actual wipe is:

    def wipe(self):
        """Delete saved data from branch/tree config."""
        self._set_new_commit_data({})
        # clear old data
        self._wipe_old_data()

    def _get_branch(self):
        """Return branch object if either branch or tree was specified on init.
        Raise BzrInternalError otherwise.
        """
        if self._branch:
            return self._branch
        if self._tree:
            return self._tree.branch
        # too bad
        from bzrlib import errors
        raise errors.BzrInternalError("CommitData has no saved branch or tree.")

    def _get_branch_config(self):
        return self._get_branch().get_config()

    def _set_new_commit_data(self, new_data):
        config = self._get_branch_config()
        old_data = config.get_user_option('commit_data')
        if old_data == new_data:
            return
        try:
            config.set_user_option('commit_data', new_data)
        except AttributeError:
            pass

We have a branch object and get its config. It used to work in earlier versions of bzr, and I'm sure set_user_option takes care about locking the branch, as I explained above otherwise we got deadlock.

What's wrong now?

Revision history for this message
Alexander Belchenko (bialix) wrote :

I've read all comments and as I understand the problem: qcommit launch `bzr commit` in the subprocess and it worked fine. Then we're tyring to update branch.conf and that's failed, but only with network mapped drives. Therefore qbzr triggers some bug inside bzrlib related to such setup. Very interesting. Based on .bzr.log this problem exists since bzr 2.2, but I've never hit it. Why? My Windows XP is somewhat different from yours Windowses?

@Franjo Stipanovic: we're using bzr under the hood. Your comment is not very nice, you know?

Revision history for this message
Alexander Belchenko (bialix) wrote :

@John Meinel:

As I can see we're triggering this code path in bzrlib/config.py: TreeConfig:

    def set_option(self, value, name, section=None):
        """Set a per-branch configuration option"""
        # FIXME: We shouldn't need to lock explicitly here but rather rely on
        # higher levels providing the right lock -- vila 20101004
        self.branch.lock_write()
        try:
            self._config.set_option(value, name, section)
        finally:
            self.branch.unlock()

As you can see the branch is excplicitly locked by bzrlib itself. So, why did you blame qbzr for not taking the lock?

Revision history for this message
Franjo Stipanovic (fritzfs) wrote :

@Alexander Belchenko, I'm sorry for the way I said that, I didn't meant anything disrespectful. I wanted to ask why qbzr commit window isn't executing this command: bzr commit -m message? That way bzr will handle whole commit process (lock, write, error notification, etc).

Revision history for this message
Alexander Belchenko (bialix) wrote :

@Barney Gwyther: this should work, but you'd better to comment out our uncommit hook in qbzr/__init__.py

Branch.hooks.install_named_hook('post_uncommit', post_uncommit_hook,
    'Remember uncomitted revision data for qcommit')

Just comment or remove those lines.

Revision history for this message
Alexander Belchenko (bialix) wrote :

I meant: comment our uncommit hook as well as disabling save commit message support in qcommit.

Revision history for this message
Alexander Belchenko (bialix) wrote :

Franjo Stipanovic пишет:
> @Alexander Belchenko, I'm sorry for the way I said that, I didn't meant
> anything disrespectful. I wanted to ask why qbzr commit window isn't
> executing this command: bzr commit -m message? That way bzr will handle
> whole commit process (lock, write, error notification, etc).

Franjo: because qcommit is *actually* run the command you see in the
status window. qcommit runs this command, but after the command finished
it does something else. And that *else* cause the problem. If you look
closer you should see that your actual commit is always succeed, you
should see that new revisions appear in history (qlog), right?

Revision history for this message
Alexander Belchenko (bialix) wrote :

Guys, I'm unable to reproduce this problem with my network and mapped drive from Windows 2000 machine.
But here is possible patch to fix the problem. You should be able to apply it with

bzr patch commit_data.diff

in your qbzr tree (you should have GNU patch utility in the PATH).

Please test it and say me if it helps or not.

For core dev: I'm explicitly taking the write lock for the entire branch.conf update procedure instead of leaving that to bzrlib. Therefore I should lock the branch once for 2 operations (2 locks one after another). I hope that reduce the possible race condition here.

Revision history for this message
Vincent Ladeuil (vila) wrote :

@bialix: isn't it just that you try to lock the branch too soon ?

Well, rather, the file system tells you can't but it's wrong so the same approach Franjo was trying should also work no ?

Revision history for this message
Alexander Belchenko (bialix) wrote :

Vincent Ladeuil пишет:
> @bialix: isn't it just that you try to lock the branch too soon ?

I'm not quite understand what do you mean.

> Well, rather, the file system tells you can't but it's wrong so the same
> approach Franjo was trying should also work no ?

I don't see what Franjo was trying?

--
All the dude wanted was his rug back

Revision history for this message
Alexander Belchenko (bialix) wrote :
Download full text (3.8 KiB)

OK, back to the original bug report by Gary Weinfurther. In his .bzr.log I see the regular error pattern:

Tue 2010-11-23 10:31:39 -0500
0.063 bazaar version: 2.2.1
0.063 bzr arguments: [u'qsubprocess', u'--bencode', u'l6:commit2:-m17:added search icon31:public_html/img/search-icon.pnge']
0.063 looking for plugins in C:/Users/Gary/AppData/Roaming/bazaar/2.0/plugins
0.063 looking for plugins in C:/Program Files (x86)/Bazaar/plugins
0.156 encoding stdout as osutils.get_user_encoding() 'cp1252'
0.203 bazaar version: 2.2.1
0.203 bzr arguments: [u'commit', u'-m', u'added search icon', u'public_html/img/search-icon.png']
0.203 encoding stdout as osutils.get_user_encoding() 'cp1252'
0.281 opening working tree 'D:/Projects/Lincoln/Digital Frontline/2010-11/web/source'
2.730 Traceback (most recent call last):
  File "bzrlib\commands.pyo", line 912, in exception_to_return_code
  File "bzrlib\commands.pyo", line 1112, in run_bzr
  File "bzrlib\commands.pyo", line 690, in run_argv_aliases
  File "bzrlib\commands.pyo", line 705, in run
  File "bzrlib\cleanup.pyo", line 135, in run_simple
  File "bzrlib\cleanup.pyo", line 165, in _do_with_cleanups
  File "C:/Program Files (x86)/Bazaar/plugins\qbzr\lib\commands.py", line 767, in run
  File "C:/Program Files (x86)/Bazaar/plugins\qbzr\lib\subprocess.py", line 888, in run_subprocess_command
  File "bzrlib\commands.pyo", line 1112, in run_bzr
  File "bzrlib\commands.pyo", line 690, in run_argv_aliases
  File "bzrlib\commands.pyo", line 705, in run
  File "bzrlib\cleanup.pyo", line 135, in run_simple
  File "bzrlib\cleanup.pyo", line 165, in _do_with_cleanups
  File "bzrlib\builtins.pyo", line 3200, in run
  File "bzrlib\decorators.pyo", line 192, in write_locked
  File "bzrlib\workingtree_4.pyo", line 629, in lock_write
  File "bzrlib\branch.pyo", line 2452, in lock_write
  File "bzrlib\lockable_files.pyo", line 187, in lock_write
  File "bzrlib\lockdir.pyo", line 648, in lock_write
  File "bzrlib\lockdir.pyo", line 563, in wait_lock
  File "bzrlib\lockdir.pyo", line 524, in attempt_lock
  File "bzrlib\lockdir.pyo", line 254, in _attempt_lock
LockFailed: Cannot lock LockDir(file:///L:/IPD/Lincoln/Digital%20Frontline/2010-11/web/source/branch-php/.bzr/branch/lock): lock was renamed into place, but now is missing!

2.730 Transferred: 0kB (0.0kB/s r:0kB w:0kB)
2.730 return code 3
[ 4996] 2010-11-23 10:32:32.572 WARNING: Cannot save commit data because the branch is locked.
[ 4996] 2010-11-23 10:32:32.572 WARNING: Cannot save commit data because the branch is locked.
7395.142 opening working tree 'D:/Projects/Lincoln/Digital Frontline/2010-11/web/source'
60.514 None

The first part comes from `bzr commit` that we run as subprocess. Therefore this problem should be repeatable with command-line `bzr commit`. If it's not I'd like to see other .bzr.logs

The second part, specifically
 WARNING: Cannot save commit data because the branch is locked.
that's from qcommit trying to update commit_data in branch.conf.
This warning could be triggered only from

    def save_commit_data(self):
        if (self.tree.branch.control_files.get_physical_lock_status()
            or self.tree.branch.is_locked()):
     ...

Read more...

Changed in qbzr:
assignee: canonical-bazaar (canonical-bazaar) → nobody
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 6/9/2011 3:18 PM, Alexander Belchenko wrote:
> ** Also affects: bzr
> Importance: Undecided
> Status: New
>
> ** Changed in: qbzr
> Assignee: canonical-bazaar (canonical-bazaar) => (unassigned)
>

If you are getting "Lock renamed into place, but now is missing", that
means that something might be *unlocking* the file behind our backs.

The other possibility with network systems is that we rename "pending...
to held", and that doesn't give us an error. But we follow that up
quickly with "open(held/info)", and that is failing.

It is *possible* that the network fs isn't being very atomic for us. And
it is letting the rename succeed, but not actually committing it/seeing
the update when we got to read afterwards.

What seems suspicious is that they are only seeing this behavior when
running 'bzr qcommit', and not from just 'bzr commit'.

I realize we used to deadlock. It makes me wonder, though, if the fact
that the new code tries to lock, and then aborts if it fails, isn't
causing poor behavior from the network mounted filesystem. (ie. the qbzr
logic is technically correct, but it causes the mounted filesystem to
misbehave.)

I don't know the specifics, as I've never been able to reproduce it. But
the fact that people say it reliably fails when run from 'bzr qcommit'
but reliably passes from 'bzr commit' at least hints to an interaction
causing the problem.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3w0NoACgkQJdeBCYSNAAMYXwCfSqVGd3rjvgJXQoCWiPNkuTR9
T+cAnj23BylwpihFWNqRIHGc47jiKxeQ
=jPni
-----END PGP SIGNATURE-----

Revision history for this message
Alexander Belchenko (bialix) wrote :

John, comment #10 said that error can be reproduced with command-line too: https://bugs.launchpad.net/qbzr/+bug/680529/comments/10

Revision history for this message
Alexander Belchenko (bialix) wrote :

OK, as I understand the problem with qcommit.

When qcommit starts it opens the tree to get the list of modified files, also it tries to read saved commit message from the previous time. This is preparation steps, and after them we should left branch/tree unlocked. Could it be that unlock is slow?

When user presses OK (Commit) button we're launching `bzr commit -m ...` as subprocess, during that time no locking is performed. But! If qcommit has been invoked from Bazaar Explorer then I can't guarantee that Explorer does not trying to refresh the state (and therefore take the lock).

If you're using Bazaar Explorer I'd recommend to disable auto-refresh feature (Tools - Options - Behavior - Automatically refresh status report). If this workaround will help you, it will be nice to know.

So, if plain `bzr commit` crashed when invoked from qcommit I can see only those 2 reasons for that.

Revision history for this message
Alexander Belchenko (bialix) wrote :

John, Vincent: how can we enable trace messages for locks? -Dlock? I suspect -Dlock won't propagate to subprocess. Maybe some environment variable?

Revision history for this message
Alexander Belchenko (bialix) wrote :

Also, comment #17 https://bugs.launchpad.net/qbzr/+bug/680529/comments/17 said that after this error the lock directory still present on the disk (held still there). IIUC, that means the branch is still locked? For some reason bzr haven't seen that at some point, but later this directory magically appears where it should be.

Revision history for this message
Franjo Stipanovic (fritzfs) wrote :

Alexander Belchenko, I've tried disabling "Automatically refresh status report", but the warning still appears :(

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> Alexander Belchenko <email address hidden> writes:

    > John, Vincent: how can we enable trace messages for locks? -Dlock? I
    > suspect -Dlock won't propagate to subprocess. Maybe some environment
    > variable?

Well env variables are not that friendly for windows users so... Err you
know that ;)

There is 'debug_flags' in bazaar.conf !

Revision history for this message
Daniel Bela (dirtyhawk1024) wrote :

John, Alexander: As poster of named comment #10 i can say that this happened only once and we are using Bazaar (via the terminal) quite regularly and stable since then.

From my point of view, it may be qbzr that is bugging me, and this one incident could have happened because of network problems or whatever.

Vincent Ladeuil (vila)
Changed in bzr:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → canonical-bazaar (canonical-bazaar)
Revision history for this message
Franjo Stipanovic (fritzfs) wrote :

Sorry, are there any news regarding this?

Revision history for this message
Dominic Bisset (dominicbisset) wrote :

Hi there,

I've got another setup you might want to try for reproducing the bug:

O/S: Arch linux, running on a VirtualBox VM, within a Windows 7 host.

I get the "...lock was renamed into place..." error when trying to bzr init-repo in a folder shared between host and guest. VirtualBox shared folders are mounted in the guest like a drive, and I imagine this will be the root of the problem. This folder is also my Dropbox folder on the Windows 7 host, though I doubt that is the cause.

I also get the same error when trying to bzr push from a repo in a non-shared folder to this folder.

On a related note, I acknowledge I might be misunderstanding the push command here. This is the first time I've used a DVCS, and my aim is to follow the Personal Version Control workflow, preferably writing stuff straight to Dropbox as a backup. My push command usage is based off a single comment on a blog post that I can now not find, describing a similar setup. Instead of doing the work in the folder directly they would "push/pull" (or some other combination of those two words) their code to dropbox periodically, if I recall correctly. I'm not totally sure how to apply that here.

Anyway, hope this information can help you. Bazaar does look good, and I do appreciate not having to faff around with servers for the small-scale projects I'm doing.

Revision history for this message
Franjo Stipanovic (fritzfs) wrote :

Sorry, any news regarding this?

Revision history for this message
Anders Rune Jensen (anders-gnulinux) wrote :

Bump.

I tried uncommitting the lines that post #38 specifies. Doesn't help.

I'm running bzr explorer (qbzr) 1.2.1 and bzr 2.4.2.

I'm reevaluating if using bzr was the right choice, I've used it before on Linux and I really like it, but seeing the bug was confirmed and changed to status High ½ a year ago and nothing has happened since starts some alarm bells for me.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Am 26/12/11 10:59, schrieb Anders Rune Jensen:
> Bump.
>
> I tried uncommitting the lines that post #38 specifies. Doesn't help.
>
> I'm running bzr explorer (qbzr) 1.2.1 and bzr 2.4.2.
>
> I'm reevaluating if using bzr was the right choice, I've used it before
> on Linux and I really like it, but seeing the bug was confirmed and
> changed to status High ½ a year ago and nothing has happened since
> starts some alarm bells for me.
Can you reproduce this with the bzr command-line, or is this specific to
bzr-explorer?

Cheers,

Jelmer

Revision history for this message
Anders Rune Jensen (anders-gnulinux) wrote :

Commit'ing from command line with bzr works. So it must be specific to bzr-explorer and qbzr in particular I think. Thanks Jelmer.

Jelmer Vernooij (jelmer)
no longer affects: bzr
Revision history for this message
Peter Wentworth (p-wentworth) wrote :

Is this just a problem with Windows Server 2008 shares?

I have 80 students in a class, all getting "lock was renamed into place, but now is missing!" consistently. (I'm using bazaar as part of a workflow to get them to download skeleton projects, and to submit their completed assignments. Each student has their own dedicated local branch, and their own dedicated parent repository where their submissions happen.)

All commits work fine from the command line, so I can confirm that we've only seen problems when using Bazaar Explorer.

But, even Bazaar Explorer works fine for some NTFS shares! Our domain has a mix of newer and older servers, and it seems to work fine if the local branches are hosted on Windows Server 2003, but fails if the students' home directories are on a Windows Server 2008. (We've tried turning the shared drive read cache settings on and off, but it doesn't seem to make any difference.)

So if you've had this Bazaar Explorer issue on Windows network shares, or if you have not had this issue and Windows shares work fine for you, please let me (or this bug list) know whether your evidence supports the conjecture that this is an error specific to the GUI tools (Bazaar Explorer) and only on Windows 2008 shares (ir later?).

Thanks - some confirmation of what we think we see might help us narrow it down.

Revision history for this message
Peter Wentworth (p-wentworth) wrote :

New evidence:

I have now seen the problem occur on the command-line twice, i.e. very infrequently, when the backing share is provided by a Windows 2008 server. So it is not just a GUI tool issue, although it occurs almost constantly when we use the GUI tools.

I am in the process of getting the students to migrate their 80 repositories to a share that is hosted on Windows Server 2003 instead of Windows 2008, and to move back to the GUI tools. We will see whether things improve dramatically in the next week or two!

A deeper issue is that if this problem has to do with potential delays or caching in the file system, then John A Meinel's premise in comment #16 - that a small wait loop might solve the symptoms - might be a work-around. But what it really means is that acquiring the lock is not atomic, so we could expect race conditions in which two processes enter the critical section simultaneously. That sounds like a deeper issue ...

Revision history for this message
Peter Wentworth (p-wentworth) wrote :

6 weeks later: having moved the shares onto Windows Server 2003 instead of 2008 has make the problem go away for the 70 students who have repositories.

It hasn't made my problems go away entirely though :-)

So my advice is as follows: Don't use shares for hosting your upstream repositories. Put your upstream repository behind a proper ssh server. Here are some problems that I ran into:

Windows 2008 shares do not work for bazaar GUI tools - this thread is all about why that is.

In order for a downstream user to push changes to an upstream repository on a share, they need write permissions on the share. So that permits them to directly edit (or wreck, or otherwise fiddle with) the upstream repo in "non-approved" ways!

Bazaar hard-codes the user's current drive mappings into the parent/push path info. So if you happen to have Z: mapped to your share \\server01\someshare\ , it remembers the path as Z:\\repo. Then it falls over on the day you have some other drive letter mapped to that share.

Some students live outside our university firewall. You cannot expose the share unless you go the server route.

So my advice would be "Pay the price of setting up an SSH server upfront. It is a more sensible transport mechanism".

Revision history for this message
Jan Thor (marfisa) wrote :

I got the same error using a Windows network drive, so although I know nothing
about the codebase of bzr or qbzr, I started some experimenting. I found that
adding a generous timeout at the start of the wipe-method of the CommitData
class within module commit_data.py made the problem disappear. At the start
of the timeout, there is a subfolder "held" within .bzr/branch/lock which gets
renamed and then deleted during the timeout.

What makes this strange is that I can't see this subdirectory using os.listdir.
I rewrote wipe like this (just a quick hack to see what's going on):

    def wipe(self):
        """Delete saved data from branch/tree config."""
        br = self._get_branch()
        # Here starts my code...
        path = br.control_url.replace("file:///", "")
        path = os.path.join(path, "lock")
        print path, os.listdir(path)
        print ("* Locked: " + str(br.is_locked()) + ";"
                            + str(br.peek_lock_mode()) + ";"
                            + str(br.get_physical_lock_status()))
        time.sleep(3)
        print os.listdir(path)
        # ...here ends my code
        br.lock_write()
        ...

Obviously, I also had to add imports for os and time at the start of the
module. What I get is something like this on the command line:

    I:\Thor\bazaarbug>bzr qcommit
    Run command: bzr commit -m blablabla

    Committing to: I:/Thor/bazaarbug/
    Committed revision 21.
    I:/Thor/bazaarbug/.bzr/branch/lock []
    * Locked: False;None;False
    []
    I:/Thor/bazaarbug/.bzr/branch/lock []
    * Locked: False;None;False
    []
    I:\Thor\bazaarbug>_

So not only does br.is_locked() fail to report the remaining lock,
os.listdir(path) also misses a subdirectory that is clearly visible in an
Explorer window and which I suspect to be the cause for br.lock_write() to fail
(unless I add the timeout).

On a remote drive with a really slow connection, I found that time.sleep(1)
instead of time.sleep(3) was insufficient to avoid failing.

Revision history for this message
Jan Thor (marfisa) wrote :

I have to correct my previous post. When I said that I have to include a line
time.sleep(3), I actually lied. What I did was adding a line with
time.sleep(5). This difference seems crucial, and I need this timeout even
when I have a fast connection to the shared drive. Some further experimenting
showed that time.sleep(4.96) works fine, while time.sleep(4.95) leads to an
error.

I suspect that there is some other mechanism at work in the shadows which does
something and then deliberately waits for exactly 5.0 seconds. The fact that
I have to wait only 4.96 seconds is probably due to Python taking 0.04 seconds
on my machine to reach this point in code after triggering said mechanism.

Another experiment I made: since the commit calls wipe two times, I added a
global counter and invoked time.sleep(5) only the first time wipe is called.
This leads to the same error message as soon as wipe is called the second time.

I searched within bzrlib for the string pattern "(5", but without success
(this snippet appears several times, but always in unsuspicious ways). But
that’s not too surprising if those 5 seconds are something Windows Server 2008
specifically introduces, not bzr or qbzr.

I guess an ugly, symptoms-only fix would be something like this:

    def wipe(self):
        """Delete saved data from branch/tree config."""
        br = self._get_branch()
        if(branch_is_located_on_win_server_2008(br)):
            times.sleep(5)
        br.lock_write()
        ...

Revision history for this message
Jan Thor (marfisa) wrote :

Of course, not only wipe, but also save needs a time.sleep(5) before attempting to acquire the lock.

A saner approach (instead of wasting ten seconds for a simple commit) would probably be to drop support for this functionality on unsupportive network drives. This is what I finally ended up doing in commit.py on our local installation:

    def _save_or_wipe_commit_data(self):
        # jan: dirty hack for bug https://bugs.launchpad.net/qbzr/+bug/680529
        # don't save uncommit data for certain known network drives
        for letter in ["H", "I", "M"]:
            if "file:///"+letter+":/" in self.tree.branch.control_url:
                return
        if not self.process_widget.is_running():
            if self.process_widget.finished:
                self.wipe_commit_data()
            else:
                self.save_commit_data()

I know that for us, H:, I: and M: are problematic drives, and on M:, we have a shared repository. On this repository, saving aborted commit data globally instead of per-user doesn’t make much sense anyway, I think. And if someone is maintaining a copy on C: or D:, the functionality is still there.

A more general approach would be something like this:

    def _save_or_wipe_commit_data(self):
        if self.is_on_an_evil_server:
            return
        if not self.process_widget.is_running():
            if self.process_widget.finished:
                self.wipe_commit_data()
            else:
                self.save_commit_data()

...with the flag self.is_on_an_evil_server being set during initialization.

I didn’t follow the advice of post #38 of getting rid of the according hook in __init__.py. After all, I still want to use the functionality for local drives. For network drives, as far as I can see, all that happens is that no data for aborted commits is ever saved, and it shouldn’t be problematic that the hook is still present. No saved data simply means that the next attempt to commit will start with some reasonable default values. Or am I missing something, and terrible things are about to happen to me?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.