Ok, so Danilo's suspicion was that, if there're roughly concurrent transfers from finishing jobs, then reshuffle-files run for one transfer may "clean up" files being transferred by another job (because reshuffle-files cleaned up entire uplaod area). We fixed this by explicitly passing job name to reshuffle-files, so it operates only on a specific dir. This was deployed last Thu, and since then no UNSTABLE builds popped up, so apparently the issue was indeed that (and not random transport errors). Closing.
Ok, so Danilo's suspicion was that, if there're roughly concurrent transfers from finishing jobs, then reshuffle-files run for one transfer may "clean up" files being transferred by another job (because reshuffle-files cleaned up entire uplaod area). We fixed this by explicitly passing job name to reshuffle-files, so it operates only on a specific dir. This was deployed last Thu, and since then no UNSTABLE builds popped up, so apparently the issue was indeed that (and not random transport errors). Closing.