Comment 16 for bug 1814832

Revision history for this message
Oliver Kurth (okurth-1) wrote : Re: [Bug 1814832] Re: Correct and/or improve handling of certain quiesced snapshot failures

I think this is good to go, and does not need any more additional testing, assuming you have applied the changes from the stable-10.3.5-quiesced-snapshot branch . All these changes are also in the 10.3.10 release, which has gone through our testing already. I also did test your packages before with the same changes.

Thanks,
Oliver

________________________________
From: <email address hidden> <email address hidden> on behalf of Christian Ehrhardt  <email address hidden>
Sent: Tuesday, April 9, 2019 11:10 PM
To: Oliver Kurth
Subject: [Bug 1814832] Re: Correct and/or improve handling of certain quiesced snapshot failures

@Oliver - any update on this testing?
That is the only one missing still to release this.

--
You received this bug notification because you are subscribed to the bug
report.
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.launchpad.net%2Fbugs%2F1814832&amp;data=02%7C01%7Cokurth%40vmware.com%7C4609d9c88c5845489b4408d6bd7cae0e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636904740527856210&amp;sdata=mH6Si7A4rok4WelYvcvT42Y5nQY85huMEf8udAfGsSY%3D&amp;reserved=0

Title:
  Correct and/or improve handling of certain quiesced snapshot failures

Status in open-vm-tools package in Ubuntu:
  Fix Released
Status in open-vm-tools source package in Bionic:
  Fix Committed
Status in open-vm-tools source package in Cosmic:
  Fix Committed
Status in open-vm-tools package in Debian:
  Fix Released

Bug description:
  [Impact]

   * Upstream identified an issue that can occur on aborted (or
     due to communication issues while doing) quiesced snapshots.

   * Backport the upstream changes as part of our work getting the latest
     10.3.5 to the latest Ubuntu LTS (Bionic)

  [Test Case]

   * This is hard to test, but fortunately VMWare who have the right setup
     for this tested our change from a PPA. I'll ask for that again on SRU.
     Never the less I'll outline roughly what is needed to trigger [1]:
     1. Use the host side interface to trigger a quiesced snapshot
     2. this is the hard part - have communication failures between vmtools
        (guest) and VMX (host) while this is ongoing.
     3. From the Hosts POV the operation is aborted, but vmtools sends a
        manifest eventually
     4. Receiving this will make VMX reply a error (as it didn't wait for
        anything like it)
     5. Finally this broke the state machine and in subsequent cases vmtools
        will not send a manifest again
   * Further related fixes make sure vmtoolsd give up if VMX aborted the
     snapshot [2] and another [3] makes sure manifests are always sent to
     avoid any desync between VMX and vmtoolsd

  [1]: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvmware%2Fopen-vm-tools%2Fcommit%2Fa1306fcbb6de6eae5344d5d74747068ea89aa5fc&amp;data=02%7C01%7Cokurth%40vmware.com%7C4609d9c88c5845489b4408d6bd7cae0e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636904740527856210&amp;sdata=jfveRSloyf1AgyBJxg8jmLLGo3RJv7e%2F3%2FTbkEHPcTI%3D&amp;reserved=0
  [2]: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvmware%2Fopen-vm-tools%2Fcommit%2F0c9174716ba828899418ba07efc3aab0bff004cc&amp;data=02%7C01%7Cokurth%40vmware.com%7C4609d9c88c5845489b4408d6bd7cae0e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636904740527856210&amp;sdata=hjgIWWtoV1NsKhOODYedPsUUU8Gwobqe9LN36uEDpZE%3D&amp;reserved=0
  [3]: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvmware%2Fopen-vm-tools%2Fcommit%2Fc31710b3942f48b1c11ebde36f34e7e159d1cbf0&amp;data=02%7C01%7Cokurth%40vmware.com%7C4609d9c88c5845489b4408d6bd7cae0e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636904740527856210&amp;sdata=GpYC68GodLJXZpkqbXA0nUGAB2V95%2BuDVQ0kzZg9hi0%3D&amp;reserved=0

  [Regression Potential]

   * This is quite a change to the snapshot handling, so in theory there a
     regression has to be assumed. Due to a lack of testcases and expertise
     on our side that was handed to VMWare itself who have a much wider
     matrix of tests and setups to run them on.
     This was tested and confirmed good (even before the change made
     it upstream).
   * Furthermore those kind of snapshots are relevant to those
     who use them (and they most likely want the fix for reliability as you
     could get into a state where no further snapshots were possible). But
     OTOH the majority of users of the open-vm-tools package most likely
     don't use the feature at all. Fortunately changes are local to only the
     vmbackup functionality.

  [Other Info]

   * n/a

  ---

  Customers may hit issues with quiesced snapshots under certain
  circumstances. This is fixed in a branch forked from 10.3.5:

  https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvmware%2Fopen-vm-tools%2Ftree%2Fstable-10.3.5-quiesced-&amp;data=02%7C01%7Cokurth%40vmware.com%7C4609d9c88c5845489b4408d6bd7cae0e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636904740527856210&amp;sdata=sMJqtiwDMw%2B4Iw0YCAmbkOUn%2FKbdkLFzabcj5x3u7aM%3D&amp;reserved=0
  snapshot

  A more detailed description of the issue can be found in the
  individual commit messages.

  Also filed at Debian: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.debian.org%2Fcgi-&amp;data=02%7C01%7Cokurth%40vmware.com%7C4609d9c88c5845489b4408d6bd7cae0e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636904740527856210&amp;sdata=k3dP%2FSN1uzs0D2dlj%2B%2Blhnf3o4dR%2Fj5QUwtYwoDeBfs%3D&amp;reserved=0
  bin/bugreport.cgi?bug=921470

To manage notifications about this bug go to:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.launchpad.net%2Fubuntu%2F%2Bsource%2Fopen-vm-tools%2F%2Bbug%2F1814832%2F%2Bsubscriptions&amp;data=02%7C01%7Cokurth%40vmware.com%7C4609d9c88c5845489b4408d6bd7cae0e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636904740527856210&amp;sdata=vPhxP7gj8vffNOXR5Yv9cD%2BpiJXldEcUhUArHY%2FgqaQ%3D&amp;reserved=0