Comment 313 for bug 1470250

Revision history for this message
faulpeltz (mg-h) wrote :

I spent some time investigating our issue further.
As far as I can tell, the main issue is that ioctl(FIFREEZE) can take a long time when running VSS backups, and the default timeout is 10s.
This is very noticeable under load, with rare peaks of >5s seen, so 10s seem plausible

If the timeout is hit in the kernel module, the hv_vss_daemon doesnt recover and quits, with the FS still frozen.
This fixes some HV VSS daemon behavior where it doesnt recover on a write failed if the previous request timed out (e.g. THAW takes too long)
We are currently running this patch including @AlexNg 's patch (1 of 2) in the usual backup loop
We already hit the bug at least 5 times, which causes the VSS backup to fail, but subsequent backups work without problems, and the guest systems continue to work normally