Journalled upload

Bug #326302 reported by Petr Viktorin
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Breezy
Triaged
Wishlist
Unassigned
bzr Upload plugin
Triaged
High
Unassigned

Bug Description

Currently, if the connection is lost in the middle of an upload, the remote tree easily can be left in a state that forces the user to upload the whole tree. It would be nice if bzr-upload was able to recover from network errors.

Also, the list of changes should be prepared before the actual upload, so unsupported features (such as symlinks, bug #214825) can fail the upload before the remote tree is touched.

Also, the upload of data should be separated from modifying the tree, to reduce the time the remote tree (usually a live website) is in an inconsistent state.

This all could be solved by a more complex uploading strategy.

Tags: upload
Revision history for this message
Petr Viktorin (encukou) wrote :

The attached branch should pass all existing tests, but resuming failed uploads isn't tested.

Revision history for this message
Anthony Bush (awbush) wrote :

My issue is related: I *can't* upload the full tree because the connection dies after uploading 250-500 files. Since there are more files than this in the site I'm trying to upload I never get a full working copy up; running the `bzr upload` command starts the process all over, uploading files it already uploaded only to die again.

I've noticed a correlation between the speed of the upload and how many files it's able to upload before failing, so it could be a simple keep-alive issue, although I can't confirm that.

In my case, two things would help:

1. If connection fails while uploading a file simply retry up to some constant # of times (e.g. 10-20, possibly settable through CLI argument). try/catch/loop? although I'd think "upload_file_robustly" would already do this...

        bzr: ERROR: socket.error: (54, 'Connection reset by peer')
        ...
          File "/Users/awbush/.bazaar/plugins/upload/__init__.py", line 123, in upload_file
            self.to_transport.put_bytes(relpath, self.tree.get_file_text(id), mode)
        ...

2. Be able to resume the upload starting with the last failed file.

I've tried making a checkout of the attached branch, but I just get a "Operation not permitted" error:

    $ bzr branch ~encukou/bzr-upload/journalled bzr-upload-journalled
    $ cd bzr-upload-journalled
    $ python setup.py build
    $ mv build/lib/bzrlib/plugins/upload ~/.bazaar/plugins/upload_journalled
    $ bzr upload-j --remember ftp://user@host/upload_dir/
    No uploaded revision id found, switching to full upload
    Making control directory
    No upload directory
    Removing renames
    bzr: ERROR: Generic path error: '/upload_dir/.bzr-upload/renames': 550 /upload_dir/.bzr-upload/renames: Operation not permitted)

Revision history for this message
Petr Viktorin (encukou) wrote :

Anthony: Thanks for trying it!
I've updated the branch to handle generic path errors (instead of just NoSuchFile), see if you still get the problem.
If you do, could you please run the command with the "-Derror" option (to get a full traceback), and post the output?

(Also, you might just want to symlink ~/.bazaar/plugins/upload_journalled to the checkout to speed up the "installation"; the build step isn't necessary for this plugin)

Revision history for this message
Vincent Ladeuil (vila) wrote :

@Petr: I'm really sorry I didn't find the time to give you feedback on this :-/

Roughly (and until a more formal review of your changes), my main grip is that you changed the actual behavior instead of starting a parallel implementation (not only as a branch as you did, but *inside* the plugin itself).

Doing so will allow proposing your implementation as an alpha alternative until it's complete.

But having the ability to *do* a journalled upload is something that we need to implement anyway.

@Anthony: In paralllel to Petr effort, there are plans to enhance the bzr ftp transport so that it handles transient errors by retrying, which, IMHO could address your current problems (the bzr http transport does that with good results, and it retries only once...).

Revision history for this message
Anthony Bush (awbush) wrote :

Petr: Thanks for the tip, I sym-linked it, updated the branch, and re-ran the command.

It passed the "Removing renames" quickly and started uploading the files. It eventually died with same problem I was having (trace attached, although I didn't use -Derror, do I still need to do that?).

I re-ran the upload after the failure and instead of resuming where it left off it's actually rolling back first (removing all the files it successfully uploaded, as well as trying to remove a bunch of files it *didn't* upload). It's taking awhile. I wonder if it could list the files on the remote server before trying to blindly remove them? That's missing the big picture though: It is not possible to upload a large web site with connection failures every 5 minutes b/c no resume is implemented.

Thanks for your time and attention!

Revision history for this message
Anthony Bush (awbush) wrote :

@Vincent: Sweet, can I subscribe to be notified anywhere? I just noticed https://bugs.launchpad.net/bzr/+bug/127164

I consider the retry feature secondary to being able to resume. Resume solves the problem in whole: if connection is dropped in middle and does not return (e.g. power outage) a simple retry does not solve this.

It sounds like retry goes in the transport layer, and resume in the journalling layer?

Can I look forward to resume being adding anytime soon?

Thank you.

Revision history for this message
Vincent Ladeuil (vila) wrote :

bug #127164 will certainly be updated when the retry-once-on-failure is implemented in the bzr ftp transport, so subscribing to it is a sure way to get updates.

I agree that retry and resume features are different, the former being also far easier to implement :)

The missing bits to work on the retry feature was a sufficiently reliable ftp test server for which I intend to send a patch next week.

The test server will certainly be useful to implement the resume feature though.

Revision history for this message
Anthony Bush (awbush) wrote :

Excellent, I subscribed.

In other news, my upload-j attempt with -Derror finished, but the output isn't any different than what I already attached so I'll leave it off.

In the mean time I guess I can try adding, committing, and uploading only part of the code at a time.

Revision history for this message
Petr Viktorin (encukou) wrote :

@Anthony: A quick and dirty workaround would be to "bzr upload" to a local directory, and then upload that manually. As long as you don't forget the control file (.bzr-upload.revid), you'll be able to do partial uploads later on. You'll probably be better off not using my branch now, if you do that.
Thanks for the feedback though, I'll think more about this problem as I work on this.

@Vincent: All right, I'll do that. Just to make sure I follow: you'd like me to keep the current behavior, and add my changes to a separate module? (or function?) And maybe add a command-line option to activate it?

Revision history for this message
Vincent Ladeuil (vila) wrote :

@Anthony, Peter is right, if you can upload your working tree "manually" and then put the right revision id in the .bzr-upload.revid remote file, you will then be able to upload incrementally.

Revision history for this message
Anthony Bush (awbush) wrote :

@Petr, @Vincent: Thank you, your help is much appreciated.

Consider also "bzr upload --journalled ..."

Vincent Ladeuil (vila)
Changed in bzr-upload:
status: New → Confirmed
Vincent Ladeuil (vila)
Changed in bzr-upload:
importance: Undecided → Medium
importance: Medium → High
Martin Albisetti (beuno)
Changed in bzr-upload:
status: Confirmed → Triaged
Revision history for this message
TransmogriBenno (transmogribenno) wrote :

It's been almost two years since this was triaged. Is there any progress to report?

Revision history for this message
Callum Macdonald (chmac) wrote :

Now it's nearly 3 years, any movement on this? I'm not sure I'll get any more response than last year, but figured I'd try anyway. This is a pretty big issue for me personally as I'm regularly on unreliable connections and uploading routinely fails and has to start over.

Revision history for this message
Petr Viktorin (encukou) wrote :

The short answer is, unforunately: no.

AFAIR the implementation was writing tests to ensuring that the upload really does the right thing in all possible situations.
I became inactive in this project because I've moved, changed school, and had less free time. This project had low priority for me. Before things calmed down, most projects I use already moved to Git.

To the people I've let down: I'm sorry. Maybe someone else can take over.

Revision history for this message
Vincent Ladeuil (vila) wrote :

This feature is more complex to handle right than it appears, I'm still thinking about it and may come with a solution at some point (no ETA, sorry).

Jelmer Vernooij (jelmer)
Changed in brz:
status: New → Triaged
importance: Undecided → Wishlist
tags: added: upload
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.