I'm investigating the issue and I managed to get a better understanding of what happens: the copy operation for the snap data takes a long time, and when systemd wants to restart snapd, the "cp" process is not killed because snapd has "KillMode=process" in its systemd unit file.
Then snapd restarts and spawns a "cp" command again, while the previous one is still running: so, for some time we have two "cp" processes performing the same recursive copy; I'm not sure how cp is implemented, but if it doesn't use the *at() family of functions (openat, chownat), then I can imagine that we could get some data corruption.
I think we should not let the "cp" processes outlive snapd, in order to make sure that on the next iteration the copy will be restarted with no interferences.
I'm investigating the issue and I managed to get a better understanding of what happens: the copy operation for the snap data takes a long time, and when systemd wants to restart snapd, the "cp" process is not killed because snapd has "KillMode=process" in its systemd unit file.
Then snapd restarts and spawns a "cp" command again, while the previous one is still running: so, for some time we have two "cp" processes performing the same recursive copy; I'm not sure how cp is implemented, but if it doesn't use the *at() family of functions (openat, chownat), then I can imagine that we could get some data corruption.
I think we should not let the "cp" processes outlive snapd, in order to make sure that on the next iteration the copy will be restarted with no interferences.