Comment 21 for bug 819604

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 819604] Re: when an idle ssh transport is interrupted, bzrlib errors; should reconnect instead

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/10/2011 6:58 AM, Andrew Bennetts wrote:
> On Sat, Oct 08, 2011 at 11:27:13AM -0000, John A Meinel wrote:
> [...]
>> 'delete', 'bzrlib.smart.vfs', 'DeleteRequest') safe, will fail if
>> first succeeded
>
> I worry a little that repeating this (and other destructive VFS ops
> like move) might be bad in a case like the “simultaneous identical
> pull” case that the Twisted buildbot encountered. e.g.:

...

>
> I expect this case is rare, and perhaps is a risk we already run on
> some network filesystems with unusual consistency guarantees, but I
> think it's a real risk. After all I'm not sure any common network
> filesystems have this particular kind of behaviour. I'm not sure
> if it's worth avoiding retries for, but I think it is worth
> thinking carefully about just in caseexpect this case is rare, and
> perhaps is a risk we already run on some network filesystems with
> unusual consistency guarantees, but I think it's a real risk. I'm
> not sure if it's worth avoiding retries for, but I think it is
> worth thinking carefully about just in case.

I'm pretty sure we already run the risk because renaming over the file
is atomic.

>
> Basically, we assume some kinds of atomicity & consistency from
> our filesystem operations, and these retries possibly erode those
> properties. So I'd be tempted to blacklist all non-readonly VFS
> ops out of paranoia. We shouldn't be issuing many VFS calls anyway
> ;)
>

I like the idea, I'm worried about practice. Specifically, we want to
add retry code so that we can get nodowntime deployments on Launchpad
without causing large headaches for existing users.

I know that in 2.1, the first thing we do after 'get_stream' is
'get_stream_for_missing_keys' which in 2.1 is implemented in VFS
operations. Now, this particular case it should all be read operations.

And in the end we have some sort of slider and at what point is 'good
enough'?

It seems like we need to handle stuff like race conditions using the
existing primatives (like locking the repository for the final
mutating operations).

If you have the case today where process A is trying to
  rename pack/foo => ../obsolete_packs/foo
and process B is trying to
  rename upload/bar => ../pack/foo

They already race, and I'm not making that race window particularly
wider. Whatever final solution we have for that case will handle this
one as well. (IMO)

For example, *today* the second rename could complete successfully
first, and then the first rename succeeds. Causing the newly uploaded
pack file to be removed. (Which is the Twisted case, AFAICT.) Which
only triggers when we are inserting the same data in two processes and
obsoleting it at the same time. 'retry' doesn't seem relevant there.

Anyway, I'm still fine adding a bit more flexibility and just saying
'semi' is not considered safe enough. Or we can move it to mutating
vfs, but then put_bytes isn't safe, and that probably affects lots of
stuff (init still uses VFS I think, etc.)

Also if we land something we can tweak it from there :).

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6S06kACgkQJdeBCYSNAAMoRQCgwis6fuVEF5XTGyHHFUo7bLXw
40sAoKKUKhlQJuxtpWdFS/c1qz4F6cXp
=+EZP
-----END PGP SIGNATURE-----