kolla wallaby: stopping (or restarting) nova_libvirt causes vm shutdown
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kolla-ansible |
Fix Released
|
Critical
|
Radosław Piliszek | ||
Wallaby |
Fix Released
|
Critical
|
Radosław Piliszek | ||
Xena |
Fix Released
|
Critical
|
Radosław Piliszek |
Bug Description
Hello, I tried to update kolla wallaby because new images has been released.
During the update nova_libvirt container restarted and all vm went down.
So I verified that every time I restart nova_libvirt container, all vm restarted and nova_compute reports lost connection with libvirt.
In openstack installations where docker is not used, the libvirt restart does not cause vm shutdown.
Ignazio
Mark Goddard (mgoddard) wrote : | #1 |
Changed in kolla-ansible: | |
importance: | Undecided → High |
ignazio (cassano) wrote : Re: [Bug 1941706] Re: kolla wallaby: restarting nova_libvirt causes vm shutdown | #2 |
I tested it also with centos source images: same issues
Ignazio
Il giorno gio 26 ago 2021 alle ore 10:10 Mark Goddard <
<email address hidden>> ha scritto:
> Interesting, I have not seen this before. I tend to use CentOS, but I'm
> sure we would have heard about it before if it always happened on
> Ubuntu. Maybe a recent change in kolla or libvirt/qemu caused it.
>
> ** Changed in: kolla-ansible
> Importance: Undecided => High
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> kolla wallaby: restarting nova_libvirt causes vm shutdown
>
> Status in kolla-ansible:
> New
>
> Bug description:
> Hello, I tried to update kolla wallaby because new images has been
> released.
> During the update nova_libvirt container restarted and all vm went down.
> So I verified that every time I restart nova_libvirt container, all vm
> restarted and nova_compute reports lost connection with libvirt.
> In openstack installations where docker is not used, the libvirt restart
> does not cause vm shutdown.
> Ignazio
>
> To manage notifications about this bug go to:
> https:/
>
>
Radosław Piliszek (yoctozepto) wrote : Re: kolla wallaby: restarting nova_libvirt causes vm shutdown | #3 |
Yes, I have already replied previously on the mailing list and IRC that I have not seen this either... and I was CentOS shop and now Debian. I also agree that it's likely we would have heard about it from others if it failed consistently for all. Though it may be very recent as you suggest.
With Ignazio we made machined oblivious to the existence of VMs but it did not fix the bug.
Ignazio, I see you said you used "centos source images" - but have you also used CentOS as the host machine? Could you try that? Or could you try Debian as host too?
ignazio (cassano) wrote : | #4 |
Hello, unfortunately I have not tested with centos on host machines with kolla. Since centos has changed from the past we decided to use ubuntu.
At this time we are using centos 7 in production enviroment (stein and queens) without containers (no kolla ) and when libvirt restarts vm do not shutdown.
I must verify If my collegues give me the opportunity for testing on centos kolla.
Please, look at this red hat bug:
https:/
I'm wondering if it's worth testing on centos at this point
Ignazio
Radosław Piliszek (yoctozepto) wrote : | #5 |
Could you first try Debian Buster? That should be closer to Ubuntu but I know it works fine.
ignazio (cassano) wrote : Re: [Bug 1941706] Re: kolla wallaby: restarting nova_libvirt causes vm shutdown | #6 |
Do you are suggesting to use debian instead of Ubuntu on compute and
controllers nodes ?
Ignazio
Il Gio 26 Ago 2021, 13:50 Radosław Piliszek <email address hidden> ha
scritto:
> Could you first try Debian Buster? That should be closer to Ubuntu but I
> know it works fine.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> kolla wallaby: restarting nova_libvirt causes vm shutdown
>
> Status in kolla-ansible:
> New
>
> Bug description:
> Hello, I tried to update kolla wallaby because new images has been
> released.
> During the update nova_libvirt container restarted and all vm went down.
> So I verified that every time I restart nova_libvirt container, all vm
> restarted and nova_compute reports lost connection with libvirt.
> In openstack installations where docker is not used, the libvirt restart
> does not cause vm shutdown.
> Ignazio
>
> To manage notifications about this bug go to:
> https:/
>
>
Radosław Piliszek (yoctozepto) wrote : Re: kolla wallaby: restarting nova_libvirt causes vm shutdown | #7 |
Yes, like CentOS or Ubuntu. I am just telling you what I know is working.
ignazio (cassano) wrote : | #8 |
Hello, in the support matrix I read Debian 11 is supported.
https:/
Are you sure I must try to install debian 10 on controllers and compute nodes ?
Radosław Piliszek (yoctozepto) wrote : | #9 |
Ah, yeah, 11 is recommended for Wallaby. Though 10 is what I use. I think 11 should not be breaking but who knows... I think I will set up some testing in the CI to figure out if it works there.
ignazio (cassano) wrote : Re: [Bug 1941706] Re: kolla wallaby: restarting nova_libvirt causes vm shutdown | #10 |
I had a meeting with my collegues and they are not agree for switching
operating system. Since ubuntu 20.04 it should work.
Anyone know if juju use docker ?
If yes, I presume they have the same issue.
Ignazio
Il Gio 26 Ago 2021, 16:45 Radosław Piliszek <email address hidden> ha
scritto:
> Ah, yeah, 11 is recommended for Wallaby. Though 10 is what I use. I
> think 11 should not be breaking but who knows... I think I will set up
> some testing in the CI to figure out if it works there.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> kolla wallaby: restarting nova_libvirt causes vm shutdown
>
> Status in kolla-ansible:
> New
>
> Bug description:
> Hello, I tried to update kolla wallaby because new images has been
> released.
> During the update nova_libvirt container restarted and all vm went down.
> So I verified that every time I restart nova_libvirt container, all vm
> restarted and nova_compute reports lost connection with libvirt.
> In openstack installations where docker is not used, the libvirt restart
> does not cause vm shutdown.
> Ignazio
>
> To manage notifications about this bug go to:
> https:/
>
>
ignazio (cassano) wrote : | #11 |
Sorry but I did not complete .....since ubuntu 20.04 is in the official
kolla support matrix, it should work.
Ignazio
Il Gio 26 Ago 2021, 17:43 Ignazio Cassano <email address hidden> ha
scritto:
> I had a meeting with my collegues and they are not agree for switching
> operating system. Since ubuntu 20.04 it should work.
> Anyone know if juju use docker ?
> If yes, I presume they have the same issue.
> Ignazio
>
>
> Il Gio 26 Ago 2021, 16:45 Radosław Piliszek <email address hidden>
> ha scritto:
>
>> Ah, yeah, 11 is recommended for Wallaby. Though 10 is what I use. I
>> think 11 should not be breaking but who knows... I think I will set up
>> some testing in the CI to figure out if it works there.
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https:/
>>
>> Title:
>> kolla wallaby: restarting nova_libvirt causes vm shutdown
>>
>> Status in kolla-ansible:
>> New
>>
>> Bug description:
>> Hello, I tried to update kolla wallaby because new images has been
>> released.
>> During the update nova_libvirt container restarted and all vm went down.
>> So I verified that every time I restart nova_libvirt container, all vm
>> restarted and nova_compute reports lost connection with libvirt.
>> In openstack installations where docker is not used, the libvirt
>> restart does not cause vm shutdown.
>> Ignazio
>>
>> To manage notifications about this bug go to:
>> https:/
>>
>>
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (master) | #12 |
Related fix proposed to branch: master
Review: https:/
Radosław Piliszek (yoctozepto) wrote : Re: kolla wallaby: restarting nova_libvirt causes vm shutdown | #13 |
Changed in kolla-ansible: | |
status: | New → Triaged |
Radosław Piliszek (yoctozepto) wrote : | #14 |
Debian passes, waiting for CentOS.
Radosław Piliszek (yoctozepto) wrote : | #15 |
CentOS failed like Ubuntu. Interesting...
Radosław Piliszek (yoctozepto) wrote : | #16 |
The relevant line from libvirtd logs is
error : qemuMonitorOpenUnix : failed to connect to monitor socket: Connection refused
but it's unclear if qemu died already or was forcibly killed by libvirtd due to the above (cannot be determined from CI).
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/wallaby) | #17 |
Related fix proposed to branch: stable/wallaby
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/victoria) | #18 |
Related fix proposed to branch: stable/victoria
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kolla-ansible (stable/victoria) | #19 |
Change abandoned by "Radosław Piliszek <email address hidden>" on branch: stable/victoria
Review: https:/
Reason: no need, it already passes in wallaby, hmm
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kolla-ansible (stable/wallaby) | #20 |
Change abandoned by "Radosław Piliszek <email address hidden>" on branch: stable/wallaby
Review: https:/
Reason: permasuccess, no need to keep
Radosław Piliszek (yoctozepto) wrote : Re: kolla wallaby: restarting nova_libvirt causes vm shutdown | #21 |
Ignazio, please apply the following -> https:/
That fixes the issue.
The issue was because of Docker's behaviour with cgroupsv1, it actually cleaned them up on container exit.
That's why Debian 11, using cgroupsv2, was able to avoid this issue.
Changed in kolla-ansible: | |
importance: | High → Critical |
summary: |
- kolla wallaby: restarting nova_libvirt causes vm shutdown + kolla wallaby: stopping (or restarting) nova_libvirt causes vm shutdown |
Changed in kolla-ansible: | |
status: | Triaged → In Progress |
assignee: | nobody → Radosław Piliszek (yoctozepto) |
Radosław Piliszek (yoctozepto) wrote : | #22 |
I have sent a relevant notice to the openstack-discuss mailing list to notify other operators. Inserting it here:
Dear Operators of Kolla-based deployments,
There is a critical regression in current Kolla Ansible Wallaby code
that results in an environment that shuts down VMs on each libvirtd
container stop or restart on non-cgroupsv2 distros (so CentOS, Ubuntu
and Debian Buster but not Debian Bullseye). [1]
The fix is already available. [2]
Please apply it to your Kolla Ansible installation if you are using Wallaby.
Do note the fix only applies after redeploying which means
redeployment action will still trigger the buggy behaviour that once!
What to do if you have already deployed Wallaby?
First of all, make sure you don't accidentally take an action that
stops nova_libvirt (including restarts: both manual and those applied
by Kolla Ansible due to user-requested changes).
Please apply the patch above but don't rush with redeploying!
Redeploy each compute node separately (or in batches if you prefer) -
using --limit commandline parameter - and always make sure you have
first migrated relevant VMs out of the nodes that are going to get
nova_libvirt restarted.
This way you can safely fix an existing deployment.
We will be working on improving the testing to avoid such issues in the future.
Acknowledgements
Thanks to Ignazio Cassano for noticing and reporting the issue.
I have triaged and analysed it, proposing a fix afterwards.
[1] https:/
[2] https:/
-yoctozepto
ignazio (cassano) wrote : Re: [Bug 1941706] Re: kolla wallaby: restarting nova_libvirt causes vm shutdown | #23 |
Thanks Radoslaw, tomorrow I will try to apply the patch.
Ignazio
Il Sab 28 Ago 2021, 18:50 Radosław Piliszek <email address hidden> ha
scritto:
> Ignazio, please apply the following ->
> https:/
> That fixes the issue.
>
> The issue was because of Docker's behaviour with cgroupsv1, it actually
> cleaned them up on container exit.
> That's why Debian 11, using cgroupsv2, was able to avoid this issue.
>
> ** Changed in: kolla-ansible
> Importance: High => Critical
>
> ** Summary changed:
>
> - kolla wallaby: restarting nova_libvirt causes vm shutdown
> + kolla wallaby: stopping (or restarting) nova_libvirt causes vm shutdown
>
> ** Changed in: kolla-ansible
> Status: Triaged => In Progress
>
> ** Changed in: kolla-ansible
> Assignee: (unassigned) => Radosław Piliszek (yoctozepto)
>
> ** Also affects: kolla-ansible/
> Importance: Undecided
> Status: New
>
> ** Also affects: kolla-ansible/xena
> Importance: Critical
> Assignee: Radosław Piliszek (yoctozepto)
> Status: In Progress
>
> ** Changed in: kolla-ansible/
> Status: New => In Progress
>
> ** Changed in: kolla-ansible/
> Importance: Undecided => Critical
>
> ** Changed in: kolla-ansible/
> Assignee: (unassigned) => Radosław Piliszek (yoctozepto)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> kolla wallaby: stopping (or restarting) nova_libvirt causes vm
> shutdown
>
> Status in kolla-ansible:
> In Progress
> Status in kolla-ansible wallaby series:
> In Progress
> Status in kolla-ansible xena series:
> In Progress
>
> Bug description:
> Hello, I tried to update kolla wallaby because new images has been
> released.
> During the update nova_libvirt container restarted and all vm went down.
> So I verified that every time I restart nova_libvirt container, all vm
> restarted and nova_compute reports lost connection with libvirt.
> In openstack installations where docker is not used, the libvirt restart
> does not cause vm shutdown.
> Ignazio
>
> To manage notifications about this bug go to:
> https:/
>
>
ignazio (cassano) wrote : Re: [Bug 1941706] Re: kolla wallaby: stopping (or restarting) nova_libvirt causes vm shutdown | #24 |
Hello,
it was not necessary to thank me. Plus I must say that I have never found
such reactivity in the community.
Thank you very much
Ignazio
Il Sab 28 Ago 2021, 19:15 Radosław Piliszek <email address hidden> ha
scritto:
> I have sent a relevant notice to the openstack-discuss mailing list to
> notify other operators. Inserting it here:
>
> Dear Operators of Kolla-based deployments,
>
> There is a critical regression in current Kolla Ansible Wallaby code
> that results in an environment that shuts down VMs on each libvirtd
> container stop or restart on non-cgroupsv2 distros (so CentOS, Ubuntu
> and Debian Buster but not Debian Bullseye). [1]
>
> The fix is already available. [2]
> Please apply it to your Kolla Ansible installation if you are using
> Wallaby.
> Do note the fix only applies after redeploying which means
> redeployment action will still trigger the buggy behaviour that once!
>
> What to do if you have already deployed Wallaby?
> First of all, make sure you don't accidentally take an action that
> stops nova_libvirt (including restarts: both manual and those applied
> by Kolla Ansible due to user-requested changes).
> Please apply the patch above but don't rush with redeploying!
> Redeploy each compute node separately (or in batches if you prefer) -
> using --limit commandline parameter - and always make sure you have
> first migrated relevant VMs out of the nodes that are going to get
> nova_libvirt restarted.
> This way you can safely fix an existing deployment.
>
> We will be working on improving the testing to avoid such issues in the
> future.
>
> Acknowledgements
> Thanks to Ignazio Cassano for noticing and reporting the issue.
> I have triaged and analysed it, proposing a fix afterwards.
>
> [1] https:/
> [2] https:/
>
> -yoctozepto
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> kolla wallaby: stopping (or restarting) nova_libvirt causes vm
> shutdown
>
> Status in kolla-ansible:
> In Progress
> Status in kolla-ansible wallaby series:
> In Progress
> Status in kolla-ansible xena series:
> In Progress
>
> Bug description:
> Hello, I tried to update kolla wallaby because new images has been
> released.
> During the update nova_libvirt container restarted and all vm went down.
> So I verified that every time I restart nova_libvirt container, all vm
> restarted and nova_compute reports lost connection with libvirt.
> In openstack installations where docker is not used, the libvirt restart
> does not cause vm shutdown.
> Ignazio
>
> To manage notifications about this bug go to:
> https:/
>
>
ignazio (cassano) wrote : | #25 |
Hello, applyed the patch and now restarting nova_libvirt, virtual machhines remain up and running.
Thanks
Ignazio
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master) | #26 |
Fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master) | #27 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 34c49b9dbed81c6
Author: Radosław Piliszek <email address hidden>
Date: Mon Aug 30 09:33:31 2021 +0000
Restore libvirtd cgroupfs mount
It was removed in [1] as part of cgroupsv2 cleanup.
However, the testing did not catch the fact that the legacy
cgroups behaviour was actually still breaking despite latest
Docker and setting to use host's cgroups namespace.
[1] 286a03bad20955a
Closes-Bug: #1941706
Change-Id: I629bb9e70a3fd6
Changed in kolla-ansible: | |
status: | In Progress → Fix Released |
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/wallaby) | #28 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/wallaby
commit 8692c315a9b5d04
Author: Radosław Piliszek <email address hidden>
Date: Mon Aug 30 09:33:31 2021 +0000
Restore libvirtd cgroupfs mount
It was removed in [1] as part of cgroupsv2 cleanup.
However, the testing did not catch the fact that the legacy
cgroups behaviour was actually still breaking despite latest
Docker and setting to use host's cgroups namespace.
[1] 286a03bad20955a
Closes-Bug: #1941706
Change-Id: I629bb9e70a3fd6
(cherry picked from commit 34c49b9dbed81c6
ignazio (cassano) wrote : | #29 |
Hello, sorry If I am asking but since i don't know the release procedure, i would like to know in which version of kolla this patch will be present
Ignazio
Radosław Piliszek (yoctozepto) wrote : | #30 |
If you are using it from git directly, then the stable/wallaby branch is already patched. Otherwise, I assume the version would be 12.2.0. Thanks for reminding about the release. ;-)
Radosław Piliszek (yoctozepto) wrote : | #31 |
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 12.2.0 | #32 |
This issue was fixed in the openstack/
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (master) | #33 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit daf534b4e0b7fd0
Author: Radosław Piliszek <email address hidden>
Date: Fri Aug 27 08:42:42 2021 +0000
[CI] Test instance health after upgrade
Just like I added Cinder volume upgrade testing before, let's
also test similarly for Nova and Neutron. :-)
More robust debugging and refactor included.
Related-Bug: #1941706
Depends-On: https:/
Change-Id: Id79df44254603f
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/ussuri) | #34 |
Related fix proposed to branch: stable/ussuri
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/wallaby) | #35 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/wallaby
commit 62328e7d8cfcc56
Author: Radosław Piliszek <email address hidden>
Date: Fri Aug 27 08:42:42 2021 +0000
[CI] Test instance health after upgrade
Just like I added Cinder volume upgrade testing before, let's
also test similarly for Nova and Neutron. :-)
More robust debugging and refactor included.
Related-Bug: #1941706
Depends-On: https:/
Change-Id: Id79df44254603f
(cherry picked from commit daf534b4e0b7fd0
tags: | added: in-stable-wallaby |
tags: | added: in-stable-victoria |
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/victoria) | #36 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: stable/victoria
commit e33e75c06adee1e
Author: Radosław Piliszek <email address hidden>
Date: Fri Aug 27 08:42:42 2021 +0000
[CI] Test instance health after upgrade
Just like I added Cinder volume upgrade testing before, let's
also test similarly for Nova and Neutron. :-)
More robust debugging and refactor included.
Related-Bug: #1941706
Depends-On: https:/
Change-Id: Id79df44254603f
(cherry picked from commit daf534b4e0b7fd0
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 13.0.0.0rc1 | #37 |
This issue was fixed in the openstack/
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kolla-ansible (stable/ussuri) | #38 |
Change abandoned by "Radosław Piliszek <email address hidden>" on branch: stable/ussuri
Review: https:/
Reason: ussuri going em, thus /us not extending CI
Interesting, I have not seen this before. I tend to use CentOS, but I'm sure we would have heard about it before if it always happened on Ubuntu. Maybe a recent change in kolla or libvirt/qemu caused it.