can't PUSH over an sshfs fuse file share

Bug #183948 reported by codeslinger
6
Affects Status Importance Assigned to Milestone
Bazaar
Won't Fix
Undecided
Unassigned

Bug Description

bazaar is a terrific design, but this bug almost caused me to abandon my plans to use it, drove me totally raving bonkers :-) trying to isolate the problem. The symptom is that ~apparently at random~, it fails to do a push.

os = gentoo.

bazaar = 1.0 installed from tarball because gentoo is not up to date
paramiko = 1.6.3
sshfs-fuse = 1.6

Because of bug 183705 authentication.conf I decided to use fuse to access/create a central store on another server. I use fuse all the time for other purposes and have found it to be reliable.

#Establish the connection
  mkdir -p /virt/svr1
  sshfs <email address hidden>:/some/path/ /virt/svr1

password = ~~~~~~

#Create a branch with some files in it and commit them, then do
  bzr push --remember --create-prefix /virt/svr1/snafu/bug/atest

Result: bzr: ERROR: [Errno 1] Operation not permitted

Partial contents do get written, but it's invalid. This error repro's 100% of the time, what caused me to view it as random is that I was also pushing via sftp (works) and also via cifs (works, see below).

on irc people suggested that it was a permissions problem, however I made the connection as root just to ensure that permissions would not be a factor.

Now what really drove me nuts is that I also have a samba cifs connection to a shared folder on a windows computer and it works fine over there, thus making it very hard to discern why the apparently random failures.

#install samba cifs....

  mkdir -p /virt/svr2
  mount -t cifs //server/share /virt/svr2 -o "password=,dir_mode=0775,file_mode=0664"

  bzr push --remember --create-prefix /virt/svr2/snafu/bug/atest

Result: no errors, the branch is created

I don't know why the cifs share works, but the one thing that I noticed about the push is that if I use a symbolic link it resolves that link to an absolute path name --- use bzr info to see this. Just as a wild guess, what I suspect is happening is that sometimes it is seeing the fuse mapped share and sometimes it is seeing the underlying mount point instead. I checked the mount point and it is empty.

Revision history for this message
codeslinger (codeslinger) wrote :

just spotted this bug 32669 it's probably related.

why is urlutils.local_path_to_url() doing a normpath ?? and does it deal properly with fuse points?

Revision history for this message
Alexander Belchenko (bialix) wrote : Re: [Bug 183948] can't PUSH over an sshfs fuse file share

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

codeslinger пишет:
| Public bug reported:
|
| bazaar is a terrific design, but this bug almost caused me to abandon my
| plans to use it, drove me totally raving bonkers :-) trying to isolate
| the problem. The symptom is that ~apparently at random~, it fails to do
| a push.
|
| os = gentoo.
|
| bazaar = 1.0 installed from tarball because gentoo is not up to date
| paramiko = 1.6.3
| sshfs-fuse = 1.6
|
| Because of bug 183705 authentication.conf I decided to use fuse to
| access/create a central store on another server. I use fuse all the
| time for other purposes and have found it to be reliable.
|
| #Establish the connection
| mkdir -p /virt/svr1
| sshfs <email address hidden>:/some/path/ /virt/svr1
|
| password = ~~~~~~
|
|
| #Create a branch with some files in it and commit them, then do
| bzr push --remember --create-prefix /virt/svr1/snafu/bug/atest
|
| Result: bzr: ERROR: [Errno 1] Operation not permitted

sshfs by default don't support rename over existing file.
Someone suggested me that sshfs should be running with the
flag -oworkaround=rename

sshfs -oworkaround=rename

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHkE6ozYr338mxwCURAhsmAJ9ch12BkcfDWY6KMbMd/fYgCUhIsACfZVa3
yzF0RvOLr+hhW7o/D2ttOJY=
=eDLU
-----END PGP SIGNATURE-----

Revision history for this message
codeslinger (codeslinger) wrote :

Wow!!! Thank You for the Fast Response!!!! That worked!!!!

I'm not even going to ask why bzr creates a file and then tries to rename another file over the top of it.. :-o

I tried to update the manual with a note about this, but seem to lack permissions to do this. Please turn this bug into a request to add this info to the documentation.

Thank you

Revision history for this message
codeslinger (codeslinger) wrote :

The correct syntax to use when creating the connection is:

sshfs -oworkaround=rename <email address hidden>:/remote/path/ /local/path

Revision history for this message
codeslinger (codeslinger) wrote :

[quote] I'm not even going to ask why bzr creates a file and then tries to rename another file over the top of it.. :-o

on second thought, I am going to ask.... because this is just not a good thing for the program to be doing, and people will be saddled with this problem from now until forever. it already cost me a full day of work, multiply that by however many people get zapped by it.....

The fix is trivial and it will make the program robust, right now it is very fragile and has to be coaxed into working. Just because you can get away with doing something does not mean that you should do it. I'm not a python programmer, but in php it would look like this... simply go through the code and replace every rename with a call to this function.

//move, rename file... assumes that you have already done security and sanity checks on $tgt and $src

function robust_file_rename($src, $tgt)
{
   //there should also be a sanity check here for $tgt, you would not ever want to unlink '/' or NULL

    if (file_exists($tgt)) unlink($tgt);
    if (file_exists($tgt)) {} //fail here with informative error msg about the lack of permissions

    $bResult = rename($src, $tgt);
    if (!$bResult) {} //fail here with informative error msg about the lack of permissions etc

   return $bResult;
}

Revision history for this message
codeslinger (codeslinger) wrote :

if on the other hand, you have file sharing/locking issues and need the file to always exist so as to avoid race conditions.

Then the correct way to do it would be to truncate the $tgt to zero length and do a copy of the $src followed by an unlink.

alternatively to avoid an actual copy operation, you could use some sort of locking mechanism such as a flag file.

the idea of renaming a file to something that already exists could lead to a lot of hard to find lurking bugs.

Revision history for this message
John A Meinel (jameinel) wrote :

Actually, your version introduces a race condition. If the network goes down or the machine crashed between the time you do "unlink()" and the time you do "rename()" then you have lost the file completely.

POSIX filesystems support atomic replace. Such that doing "rename(a, b)" can either succeed or fail. If it succeeds the contents are replaced, if it fails, nothing is changed. By leveraging this property, Bazaar maintains referential integrity at all times. If it is updating something and the power goes out, no partially written data is ever referenced. Which generally means that the system is left at the state it was in before we started updating. (Some completed records might be recorded, but they are referentially correct anyway.)

Since not all filesystems support POSIX rename (which is unfortunate), we do have a backup function called "fancy_rename". Which instead of just doing the unlink, does something like:

  mv target => target.tmp
  mv source => target
  unlink target.tmp

With extra handling such that if a step fails, it renames target.tmp => target. It isn't perfect, because a network disconnect would still leave target missing, but target.tmp would still exist and you could at least manually recover from that situation without data loss.

What I could see is adding one more layer. Such that we try a POSIX rename as the first step, and if that fails with EEXIST, fall back to trying fancy_rename.

Some care would still need to be taken, though. Bazaar also relies on the property that renaming a directory over another directory will fail. It is one of the few properties that work across all filesystems, and allows us to create race-condition-safe locking across multiple types of connections. (You can lock over the local filesystem, and be safe versus someone who is connecting over sftp, and someone who is connecting over ftp.)

However, it is probably better to have the filesystem support POSIX rename, which is what -oworkaround=rename is attempting to do.

So while changing it would make us more robust for filesystems that don't support atomic rename, it introduces race conditions that I would rather avoid.

Revision history for this message
codeslinger (codeslinger) wrote :

Hi John,

Thank you very much for the info. I learn something new every day, did not know that about POSIX. Dealing with all these different file systems must be quite a challenge.

if one were to use something unique for the extension such as '.@rename.[pid]' instead of a generic '.tmp' then automatic file recovery could be possible.

Revision history for this message
codeslinger (codeslinger) wrote :

[quote] if one were to use something unique for the extension such as '.@rename.[pid]'

Correction: given the distributed nature of the system, you would need to use a "session id" instead of a "pid"

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 183948] Re: can't PUSH over an sshfs fuse file share

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

codeslinger wrote:
> Hi John,
>
> Thank you very much for the info. I learn something new every day, did
> not know that about POSIX. Dealing with all these different file
> systems must be quite a challenge.
>
> if one were to use something unique for the extension such as
> '.@rename.[pid]' instead of a generic '.tmp' then automatic file
> recovery could be possible.
>

We do use a unique identifier. I was just using .tmp as a simple
example. We actually use something like .pid.randomnumber.tmp

So we are safe to recover as long as the connection is still alive. The
problem is that if we lose the connection, another process would have to
search/guess what the correct old value was.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHkSYQJdeBCYSNAAMRApcuAJ98slgh1MsOlHqHveDFmTbNoLJp3ACgv1zf
YHOYF2N4Km15GYVUEwLsLTk=
=bKMZ
-----END PGP SIGNATURE-----

Revision history for this message
codeslinger (codeslinger) wrote :

thank you for the detailed explanations, let's close this bug as won't fix.

the work-around is fine. "sshfs -oworkaround=rename"

it would be good if some mention of this issue made it into the docs.

bzr is great!!!!

Vincent Ladeuil (vila)
Changed in bzr:
status: New → Won't Fix
Revision history for this message
Callum Macdonald (chmac) wrote :

Small correction to comment 4[0], according to the sshfs man page[1]</a>:
    usage: sshfs [user@]host:[dir] mountpoint [options]

So the correct syntax is:
    sshfs [user@]host:[dir] mountpoint -oworkaround=rename

[0] https://bugs.launchpad.net/bzr/+bug/183948/comments/4
[1] http://www.digipedia.pl/man/sshfs.1.html

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.