Activity log for bug #1922907

Date Who What changed Old value New value Message
2021-04-07 10:38:50 Rascal bug added bug
2021-04-07 10:38:50 Rascal attachment added Skylake https://bugs.launchpad.net/bugs/1922907/+attachment/5485199/+files/lscpu_4114.txt
2021-04-07 10:39:33 Rascal attachment added Cascadelake https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1922907/+attachment/5485200/+files/lscpu_4214.txt
2021-04-07 10:42:44 Rascal description Hi. We have OpenStack cluster with old Skylake (Skylake-Server-IBRS) Intel CPU, and try to extend cluster with new Cascadelake (Cascadelake-Server-noTSX) servers with backward compatibility mode in nova.conf cpu_mode = custom cpu_models = Skylake-Server-IBRS But got an error: CPU doesn't have compatibility Similar issue for Red Hat: https://bugzilla.redhat.com/show_bug.cgi?id=1761678 Hi. We have OpenStack cluster (ubuntu 20.04 ussuri) with old Skylake (Skylake-Server-IBRS) Intel CPU, and try to extend cluster with new Cascadelake (Cascadelake-Server-noTSX) servers with backward compatibility mode in nova.conf cpu_mode = custom cpu_models = Skylake-Server-IBRS But got an error: CPU doesn't have compatibility Similar issue for Red Hat: https://bugzilla.redhat.com/show_bug.cgi?id=1761678
2021-04-07 13:23:37 Christian Ehrhardt  nominated for series Ubuntu Groovy
2021-04-07 13:23:37 Christian Ehrhardt  bug task added libvirt (Ubuntu Groovy)
2021-04-07 13:23:37 Christian Ehrhardt  nominated for series Ubuntu Hirsute
2021-04-07 13:23:37 Christian Ehrhardt  bug task added libvirt (Ubuntu Hirsute)
2021-04-07 13:23:37 Christian Ehrhardt  nominated for series Ubuntu Focal
2021-04-07 13:23:37 Christian Ehrhardt  bug task added libvirt (Ubuntu Focal)
2021-04-07 13:23:43 Christian Ehrhardt  libvirt (Ubuntu Hirsute): status New Fix Released
2021-04-07 13:23:45 Christian Ehrhardt  libvirt (Ubuntu Groovy): status New Fix Released
2021-04-07 13:23:48 Christian Ehrhardt  libvirt (Ubuntu Focal): status New Triaged
2021-04-07 13:23:59 Christian Ehrhardt  bug added subscriber Ubuntu Server
2021-04-07 13:24:01 Christian Ehrhardt  libvirt (Ubuntu Focal): assignee Christian Ehrhardt  (paelzer)
2021-04-07 13:24:06 Christian Ehrhardt  tags server-next
2021-04-07 15:25:43 Pedro Principeza bug added subscriber Pedro Principeza
2021-04-19 10:41:59 Christian Ehrhardt  description Hi. We have OpenStack cluster (ubuntu 20.04 ussuri) with old Skylake (Skylake-Server-IBRS) Intel CPU, and try to extend cluster with new Cascadelake (Cascadelake-Server-noTSX) servers with backward compatibility mode in nova.conf cpu_mode = custom cpu_models = Skylake-Server-IBRS But got an error: CPU doesn't have compatibility Similar issue for Red Hat: https://bugzilla.redhat.com/show_bug.cgi?id=1761678 [Impact] * To avoid bugs with newer Hardware and to allow users/admins to control the KVM guests correctly we usually try to backport major CPU- detect/control features back to at least the last LTS (currently Focal) In SRU Terms this is under the second entry in https://wiki.ubuntu.com/StableReleaseUpdates#Other_safe_cases * In this particular case it is about skylake and cascade lake CPUs. Which turned out to differ on so few details that not only the new type is needed to be known, but also the feature to parse and consider the CPU stepping needs to be added. Only then can libvirt properly differ those types. [Test Plan] * First of all we'll (and have in advance) run general regression tests * Second and for this case in particular we expect libvirt to recognize skylake/cascade lake chips better. So with access to those chips you'd before the fix see both recognized as the same (wrong) and after the fix as different cpu types. $ virsh domcapabilities ^^ look for the host-model section Comment #14 in this bug has sample output data * With the fixes you can define a guest with a CascadeLake based named type [Where problems could occur] * There are two areas to look at a) compat behavior on old systems - e.g. if you used to say "host- model" on two different chips that are related to these fixes you might have ended up with the same model. But now the newer chip would get the newer better definition. That is correct and good - but for others might appear as a change in behavior they didn't expect. b) Migrations between systems - in fact this is an area we fix for some combinations. But if e.g. (a) above applies then you can't migrate from the new to the old chip if the newer one has features enable that don't exist on the old chip. Again this is what we want (better than silently failing) but for every scenario fixed there might be a combination of chips and use cases which will at first stumble over the new behavior. Since we only change the named types, but not qemu implementation those issues should not be much of a problem as you can do: "type X+Feat" == "type Y", but still those are the areas to look at. [Other Info] * The change itself that is the fix/ability to differentiate those two chips is part of a larger series that mostly does rewrite manual alloc/free code into glib based auto-free. These efforts have been started before the version in Focal so everything is in place. Backporting the fix without those was evaluated but considered more risky of a regression than also pulling (the then mostly cleanly applying) code rewrites - as they are not much functional change, but more new style to do the same. --- Hi. We have OpenStack cluster (ubuntu 20.04 ussuri) with old Skylake (Skylake-Server-IBRS) Intel CPU, and try to extend cluster with new Cascadelake (Cascadelake-Server-noTSX) servers with backward compatibility mode in nova.conf cpu_mode = custom cpu_models = Skylake-Server-IBRS But got an error: CPU doesn't have compatibility Similar issue for Red Hat: https://bugzilla.redhat.com/show_bug.cgi?id=1761678
2021-04-19 11:19:02 Christian Ehrhardt  description [Impact] * To avoid bugs with newer Hardware and to allow users/admins to control the KVM guests correctly we usually try to backport major CPU- detect/control features back to at least the last LTS (currently Focal) In SRU Terms this is under the second entry in https://wiki.ubuntu.com/StableReleaseUpdates#Other_safe_cases * In this particular case it is about skylake and cascade lake CPUs. Which turned out to differ on so few details that not only the new type is needed to be known, but also the feature to parse and consider the CPU stepping needs to be added. Only then can libvirt properly differ those types. [Test Plan] * First of all we'll (and have in advance) run general regression tests * Second and for this case in particular we expect libvirt to recognize skylake/cascade lake chips better. So with access to those chips you'd before the fix see both recognized as the same (wrong) and after the fix as different cpu types. $ virsh domcapabilities ^^ look for the host-model section Comment #14 in this bug has sample output data * With the fixes you can define a guest with a CascadeLake based named type [Where problems could occur] * There are two areas to look at a) compat behavior on old systems - e.g. if you used to say "host- model" on two different chips that are related to these fixes you might have ended up with the same model. But now the newer chip would get the newer better definition. That is correct and good - but for others might appear as a change in behavior they didn't expect. b) Migrations between systems - in fact this is an area we fix for some combinations. But if e.g. (a) above applies then you can't migrate from the new to the old chip if the newer one has features enable that don't exist on the old chip. Again this is what we want (better than silently failing) but for every scenario fixed there might be a combination of chips and use cases which will at first stumble over the new behavior. Since we only change the named types, but not qemu implementation those issues should not be much of a problem as you can do: "type X+Feat" == "type Y", but still those are the areas to look at. [Other Info] * The change itself that is the fix/ability to differentiate those two chips is part of a larger series that mostly does rewrite manual alloc/free code into glib based auto-free. These efforts have been started before the version in Focal so everything is in place. Backporting the fix without those was evaluated but considered more risky of a regression than also pulling (the then mostly cleanly applying) code rewrites - as they are not much functional change, but more new style to do the same. --- Hi. We have OpenStack cluster (ubuntu 20.04 ussuri) with old Skylake (Skylake-Server-IBRS) Intel CPU, and try to extend cluster with new Cascadelake (Cascadelake-Server-noTSX) servers with backward compatibility mode in nova.conf cpu_mode = custom cpu_models = Skylake-Server-IBRS But got an error: CPU doesn't have compatibility Similar issue for Red Hat: https://bugzilla.redhat.com/show_bug.cgi?id=1761678 [Impact]  * To avoid bugs with newer Hardware and to allow users/admins to control    the KVM guests correctly we usually try to backport major CPU-    detect/control features back to at least the last LTS (currently Focal)    In SRU Terms this is under the second entry in    https://wiki.ubuntu.com/StableReleaseUpdates#Other_safe_cases  * In this particular case it is about skylake and cascade lake CPUs.    Which turned out to differ on so few details that not only the new type    is needed to be known, but also the feature to parse and consider the    CPU stepping needs to be added.    Only then can libvirt properly differ those types. [Test Plan]  * First of all we'll (and have in advance) run general regression tests  * Second and for this case in particular we expect libvirt to recognize    skylake/cascade lake chips better. So with access to those chips you'd    before the fix see both recognized as the same (wrong) and after the    fix as different cpu types.    $ virsh domcapabilities    ^^ look for the host-model section    Comment #14 in this bug has sample output data  * With the fixes you can define a guest with a CascadeLake based    named type [Where problems could occur]  * There are two areas to look at    a) compat behavior on old systems - e.g. if you used to say "host-       model" on two different chips that are related to these fixes you       might have ended up with the same model. But now the newer chip       would get the newer better definition.       That is correct and good - but for others might appear as a change       in behavior they didn't expect.    b) Migrations between systems - in fact this is an area we fix for some       combinations. But if e.g. (a) above applies then you can't migrate       from the new to the old chip if the newer one has features enable       that don't exist on the old chip.       Again this is what we want (better than silently failing) but for       every scenario fixed there might be a combination of chips and use       cases which will at first stumble over the new behavior.     Since we only change the named types, but not qemu implementation     those issues should not be much of a problem as you can do:     "type X+Feat" == "type Y", but still those are the areas to look at. [Other Info]  * The change itself that is the fix/ability to differentiate those two    chips is part of a larger series that mostly does rewrite manual    alloc/free code into glib based auto-free. These efforts have been    started before the version in Focal so everything is in place.    Backporting the fix without those was evaluated but considered more    risky of a regression than also pulling (the then mostly cleanly    applying) code rewrites - as they are not much functional change, but    more new style to do the same. * This is not the first time new chips need quite some effort to be able to be handled - for example bug 1828495 was similar --- Hi. We have OpenStack cluster (ubuntu 20.04 ussuri) with old Skylake (Skylake-Server-IBRS) Intel CPU, and try to extend cluster with new Cascadelake (Cascadelake-Server-noTSX) servers with backward compatibility mode in nova.conf cpu_mode = custom cpu_models = Skylake-Server-IBRS But got an error: CPU doesn't have compatibility Similar issue for Red Hat: https://bugzilla.redhat.com/show_bug.cgi?id=1761678
2021-04-19 12:10:18 Launchpad Janitor merge proposal linked https://code.launchpad.net/~paelzer/ubuntu/+source/libvirt/+git/libvirt/+merge/401371
2021-04-22 05:19:53 Christian Ehrhardt  attachment added lscpu and cpuid info of another test system (same correct detection pre/post upgrade) https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1922907/+attachment/5490849/+files/cascade-test-data.txt
2021-04-23 07:52:29 Robie Basak libvirt (Ubuntu Focal): status Triaged Fix Committed
2021-04-23 07:52:31 Robie Basak bug added subscriber Ubuntu Stable Release Updates Team
2021-04-23 07:52:33 Robie Basak bug added subscriber SRU Verification
2021-04-23 07:52:36 Robie Basak tags server-next server-next verification-needed verification-needed-focal
2021-04-27 05:14:33 Christian Ehrhardt  tags server-next verification-needed verification-needed-focal server-next verification-done verification-done-focal
2021-05-06 08:55:09 Łukasz Zemczak removed subscriber Ubuntu Stable Release Updates Team
2021-05-06 08:55:04 Launchpad Janitor libvirt (Ubuntu Focal): status Fix Committed Fix Released