Comment 18 for bug 1814832

Revision history for this message
Oliver Kurth (okurth-1) wrote : Re: [Bug 1814832] Re: Correct and/or improve handling of certain quiesced snapshot failures

Hi Christian,

is this the the repo for the packages: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3617

I am a little confused since these are from Feb 12th, and I believe I had them tested already (but can't find the email that I sent). So I just want to make sure I am not testing the wrong ones. Also, where can I find the packages for cosmic?

Thanks,
Oliver

________________________________
From: <email address hidden> <email address hidden> on behalf of Christian Ehrhardt  <email address hidden>
Sent: Wednesday, April 10, 2019 11:50 PM
To: Oliver Kurth
Subject: [Bug 1814832] Re: Correct and/or improve handling of certain quiesced snapshot failures

Hey Oliver,
TL;DR: sorry, but please test it from proposed (again) in Bionic&Cosmic.

Detail:
Unfortunately this is not how it works.
I'd want to mark it verified per your request, but I know that this will raise SRU-Team-Eyebrows and not be released.

I'm in fact already in discussion for a couple of cases where people pre-tested on PPAs and we later released as SRU which means they need to be verified again for the final releasing. The problem is the tradeoff on non-trivial bugs, you want:
 - provide something to test for the reporter ASAP
 - be able to iterate on alternatives
 - run more tests and tests by different parties
 - all of the above you want to do without yet polluting -proposed (as there other
   builds could depend on it, it could break other proposed testing, ...)
 => This is what a PPA is used for most of the times.

But due to the above that usually means we end up with a change tested
and verified from a PPA that will then in this way be proposed as SRU
(same code + some paperwork).

There the process requires that this final build shall be verified (yes
- again).

This mostly comes from the past were people didn't pre-check with PPAs, but also is needed to cover any gap as the testing of the PPAs might have been:
- weeks ago
- ran with other dependencies at the time but would fail now
- built with other dependencies at the time
- only one release was tested but the SRU covers multiple
But for the above and more it usually is worth and by the process required to re-test the versions in proposed.

I've had a few such cases in the past now and I have started discussing
alternatives, which so far all come down to do the tradeoffs above at a
different place - or to have the SRU process accept verifications with
identical builds (as in this case). But there seems to be no easy best
answer. Resolving that is a long term effort ending in adaption of the
SRU or Developers process - nothing that is done in a minute.

Until all of that is resolved, I'd really ask you to re-test the case as
built in Bionic/Cosmic-proposed and I hope that is not too much of a
burden/effort for you.

--
You received this bug notification because you are subscribed to the bug
report.
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.launchpad.net%2Fbugs%2F1814832&amp;data=02%7C01%7Cokurth%40vmware.com%7C2214f621df154380388a08d6be4c278c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905631628753788&amp;sdata=2ENElJZSnCLmAO1tnSd4y6gcHr9jJNwTvL8YzUjIVK4%3D&amp;reserved=0

Title:
  Correct and/or improve handling of certain quiesced snapshot failures

Status in open-vm-tools package in Ubuntu:
  Fix Released
Status in open-vm-tools source package in Bionic:
  Fix Committed
Status in open-vm-tools source package in Cosmic:
  Fix Committed
Status in open-vm-tools package in Debian:
  Fix Released

Bug description:
  [Impact]

   * Upstream identified an issue that can occur on aborted (or
     due to communication issues while doing) quiesced snapshots.

   * Backport the upstream changes as part of our work getting the latest
     10.3.5 to the latest Ubuntu LTS (Bionic)

  [Test Case]

   * This is hard to test, but fortunately VMWare who have the right setup
     for this tested our change from a PPA. I'll ask for that again on SRU.
     Never the less I'll outline roughly what is needed to trigger [1]:
     1. Use the host side interface to trigger a quiesced snapshot
     2. this is the hard part - have communication failures between vmtools
        (guest) and VMX (host) while this is ongoing.
     3. From the Hosts POV the operation is aborted, but vmtools sends a
        manifest eventually
     4. Receiving this will make VMX reply a error (as it didn't wait for
        anything like it)
     5. Finally this broke the state machine and in subsequent cases vmtools
        will not send a manifest again
   * Further related fixes make sure vmtoolsd give up if VMX aborted the
     snapshot [2] and another [3] makes sure manifests are always sent to
     avoid any desync between VMX and vmtoolsd

  [1]: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvmware%2Fopen-vm-tools%2Fcommit%2Fa1306fcbb6de6eae5344d5d74747068ea89aa5fc&amp;data=02%7C01%7Cokurth%40vmware.com%7C2214f621df154380388a08d6be4c278c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905631628753788&amp;sdata=fzW4YtHmqcg5C9emWIeERP29JxfVBtRMhP9sF57qtIE%3D&amp;reserved=0
  [2]: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvmware%2Fopen-vm-tools%2Fcommit%2F0c9174716ba828899418ba07efc3aab0bff004cc&amp;data=02%7C01%7Cokurth%40vmware.com%7C2214f621df154380388a08d6be4c278c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905631628753788&amp;sdata=6Z4K8yd0sFROzk4cNoL%2FbWgoQrLupbzOn%2BG204PSkr8%3D&amp;reserved=0
  [3]: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvmware%2Fopen-vm-tools%2Fcommit%2Fc31710b3942f48b1c11ebde36f34e7e159d1cbf0&amp;data=02%7C01%7Cokurth%40vmware.com%7C2214f621df154380388a08d6be4c278c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905631628753788&amp;sdata=WzXfhM0tdrOqall1Ni31G7nanB7s1ggboTaKXAKM5I8%3D&amp;reserved=0

  [Regression Potential]

   * This is quite a change to the snapshot handling, so in theory there a
     regression has to be assumed. Due to a lack of testcases and expertise
     on our side that was handed to VMWare itself who have a much wider
     matrix of tests and setups to run them on.
     This was tested and confirmed good (even before the change made
     it upstream).
   * Furthermore those kind of snapshots are relevant to those
     who use them (and they most likely want the fix for reliability as you
     could get into a state where no further snapshots were possible). But
     OTOH the majority of users of the open-vm-tools package most likely
     don't use the feature at all. Fortunately changes are local to only the
     vmbackup functionality.

  [Other Info]

   * n/a

  ---

  Customers may hit issues with quiesced snapshots under certain
  circumstances. This is fixed in a branch forked from 10.3.5:

  https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvmware%2Fopen-vm-tools%2Ftree%2Fstable-10.3.5-quiesced-&amp;data=02%7C01%7Cokurth%40vmware.com%7C2214f621df154380388a08d6be4c278c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905631628753788&amp;sdata=sHMXPIt7wi%2FqoVSCVgLLqUsnEpGVqJpmC%2BMcFimRW8Q%3D&amp;reserved=0
  snapshot

  A more detailed description of the issue can be found in the
  individual commit messages.

  Also filed at Debian: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.debian.org%2Fcgi-&amp;data=02%7C01%7Cokurth%40vmware.com%7C2214f621df154380388a08d6be4c278c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905631628753788&amp;sdata=aDZBgz4BXCKpEKq99C7AOkjzURW2YHWWM1iK%2BsyhBI4%3D&amp;reserved=0
  bin/bugreport.cgi?bug=921470

To manage notifications about this bug go to:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.launchpad.net%2Fubuntu%2F%2Bsource%2Fopen-vm-tools%2F%2Bbug%2F1814832%2F%2Bsubscriptions&amp;data=02%7C01%7Cokurth%40vmware.com%7C2214f621df154380388a08d6be4c278c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905631628753788&amp;sdata=ljz%2FxG8cP0g%2BgcUkw4JqMkahbZbJi3h0g0%2F3VuJpEV0%3D&amp;reserved=0