Comment 4 for bug 1992324

Revision history for this message
Alberto Mardegan (mardy) wrote :

I'm investigating the issue and I managed to get a better understanding of what happens: the copy operation for the snap data takes a long time, and when systemd wants to restart snapd, the "cp" process is not killed because snapd has "KillMode=process" in its systemd unit file.

Then snapd restarts and spawns a "cp" command again, while the previous one is still running: so, for some time we have two "cp" processes performing the same recursive copy; I'm not sure how cp is implemented, but if it doesn't use the *at() family of functions (openat, chownat), then I can imagine that we could get some data corruption.

I think we should not let the "cp" processes outlive snapd, in order to make sure that on the next iteration the copy will be restarted with no interferences.