update_available_resource will raise DiskNotFound after resize but before confirm

Bug #1774249 reported by Matthew Booth
34
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matthew Booth
Ocata
Triaged
Medium
Unassigned
Pike
Fix Released
Medium
Sasha Andonov
Queens
Fix Committed
Medium
Lee Yarwood
Rocky
Fix Released
Medium
Lee Yarwood
Stein
Fix Released
Medium
Lee Yarwood
Train
Fix Released
Undecided
Unassigned
Ubuntu Cloud Archive
Invalid
Undecided
Unassigned
Queens
Fix Released
Undecided
Unassigned
nova (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Fix Released
High
Unassigned

Bug Description

Original reported in RH Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1584315

Tested on OSP12 (Pike), but appears to be still present on master. Should only occur if nova compute is configured to use local file instance storage.

Create instance A on compute X

Resize instance A to compute Y
  Domain is powered off
  /var/lib/nova/instances/<uuid A> renamed to <uuid A>_resize on X
  Domain is *not* undefined

On compute X:
  update_available_resource runs as a periodic task
  First action is to update self
  rt calls driver.get_available_resource()
  ...calls _get_disk_over_committed_size_total
  ...iterates over all defined domains, including the ones whose disks we renamed
  ...fails because a referenced disk no longer exists

Results in errors in nova-compute.log:

    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager [req-bd52371f-c6ec-4a83-9584-c00c5377acd8 - - - - -] Error updating resources for node compute-0.localdomain.: DiskNotFound: No disk at /var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager Traceback (most recent call last):
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6695, in update_available_resource_for_node
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager rt.update_available_resource(context, nodename)
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 641, in update_available_resource
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename)
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5892, in get_available_resource
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager disk_over_committed = self._get_disk_over_committed_size_total()
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7393, in _get_disk_over_committed_size_total
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager config, block_device_info)
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7301, in _get_instance_disk_info_from_config
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager dk_size = disk_api.get_allocated_disk_size(path)
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/disk/api.py", line 156, in get_allocated_disk_size
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager return images.qemu_img_info(path).disk_size
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/images.py", line 57, in qemu_img_info
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager raise exception.DiskNotFound(location=path)
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager DiskNotFound: No disk at /var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk

And resource tracker is no longer updated. We can find lots of these in the gate.

Note that change Icec2769bf42455853cbe686fb30fda73df791b25 nearly mitigates this, but doesn't because task_state is not set while the instance is awaiting confirm.

=================================================================================
[Impact]

See above

[Test Plan]

Deploy Openstack Queens with one compute node.

Create a VM instance. Eg:
openstack server create --wait --image $image_name --flavor $flavor --key-name testkey --nic net-id=${net_id} test-instance-1234

Get the details for that instance and copy the instance_name. Eg:
openstack server show test-instance-1234 -c OS-EXT-SRV-ATTR:instance_name -f value

Get the disk location used based on the instance name we retrieved before. Eg:
disk_location=`juju run -a nova-compute -- virsh domblklist $var_name | grep nova | awk -v N=2 '{print $N}'`

Move that file in a different location. Eg:
juju run -a nova-compute -- mv $disk_location "$disk_location"_backup

Check the nova compute logs on the compute node for a warning. Eg:
juju run -a nova-compute -- grep "DiskNotFound" /var/log/nova/nova-compute.log

The output should look like the following:
```
2021-09-22 11:07:46.009 26176 WARNING nova.virt.libvirt.driver [req-6e8eb87e-4024-4908-9b7f-0648ecd03eaf - - - - -] Periodic task is updating the host stats, it is trying to get disk info for instance-00000001, but the backing disk storage was removed by a concurrent operation such as resize. Error: No disk at /var/lib/nova/instances/3bd9578f-e7d7-48bc-bdef-d2d4cb25ea29/disk: DiskNotFound: No disk at /var/lib/nova/instances/3bd9578f-e7d7-48bc-bdef-d2d4cb25ea29/disk
```

[Where problems could occur]

Users which were relying on an error could be affected.

jichenjc (jichenjc)
Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → jichenjc (jichenjc)
Changed in nova:
status: Confirmed → In Progress
Changed in nova:
assignee: jichenjc (jichenjc) → Lee Yarwood (lyarwood)
Changed in nova:
assignee: Lee Yarwood (lyarwood) → jichenjc (jichenjc)
Changed in nova:
assignee: jichenjc (jichenjc) → Lee Yarwood (lyarwood)
Changed in nova:
assignee: Lee Yarwood (lyarwood) → Vladyslav Drok (vdrok)
Revision history for this message
LWQ (lwqcz) wrote :

Any news regarding this issue? I've read the whole history here and on RedHat's Bugzilla and I assume that this issue is not fixed yet, am I correct? We are experiencing a quite significant level of log records regarding this issue. Please update info here, thank you.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Note https://review.openstack.org/#/c/553067/ is related for a race during server delete.

tags: added: libvirt resize
Changed in nova:
assignee: Vladyslav Drok (vdrok) → nobody
status: In Progress → Confirmed
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in nova:
status: Confirmed → In Progress
assignee: nobody → Vladyslav Drok (vdrok)
Changed in nova:
assignee: Vladyslav Drok (vdrok) → Lee Yarwood (lyarwood)
Matt Riedemann (mriedem)
Changed in nova:
assignee: Lee Yarwood (lyarwood) → Vladyslav Drok (vdrok)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/571410
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=966192704c20d1b4e9faf384c8dafac8ea6e06ea
Submitter: Zuul
Branch: master

commit 966192704c20d1b4e9faf384c8dafac8ea6e06ea
Author: jichenjc <email address hidden>
Date: Mon May 21 02:03:51 2018 +0800

    libvirt: Do not reraise DiskNotFound exceptions during resize

    When an instance has VERIFY_RESIZE status, the instance disk on the
    source compute host has moved to <instance_path>/<instance_uuid>_resize
    folder, which leads to disk not found errors if the update available
    resource periodic task on the source compute runs before resize is
    actually confirmed.

    Icec2769bf42455853cbe686fb30fda73df791b25 almost fixed this issue but it
    will only set reraise to False when task_state is not None, that isn't
    the case when an instance is resized but resize is not yet confirmed.
    This patch adds a condition based on vm_state to ensure we don't
    reraise DiskNotFound exceptions while resize is not confirmed.

    Closes-Bug: 1774249
    Co-Authored-By: Vladyslav Drok <email address hidden>
    Change-Id: Id687e11e235fd6c2f99bb647184310dfdce9a08d

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/660361

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/660362

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/660363

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/660361
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f1280ab849d20819791f7c4030f570a917d3e91d
Submitter: Zuul
Branch: stable/stein

commit f1280ab849d20819791f7c4030f570a917d3e91d
Author: jichenjc <email address hidden>
Date: Mon May 21 02:03:51 2018 +0800

    libvirt: Do not reraise DiskNotFound exceptions during resize

    When an instance has VERIFY_RESIZE status, the instance disk on the
    source compute host has moved to <instance_path>/<instance_uuid>_resize
    folder, which leads to disk not found errors if the update available
    resource periodic task on the source compute runs before resize is
    actually confirmed.

    Icec2769bf42455853cbe686fb30fda73df791b25 almost fixed this issue but it
    will only set reraise to False when task_state is not None, that isn't
    the case when an instance is resized but resize is not yet confirmed.
    This patch adds a condition based on vm_state to ensure we don't
    reraise DiskNotFound exceptions while resize is not confirmed.

    Closes-Bug: 1774249
    Co-Authored-By: Vladyslav Drok <email address hidden>
    Change-Id: Id687e11e235fd6c2f99bb647184310dfdce9a08d
    (cherry picked from commit 966192704c20d1b4e9faf384c8dafac8ea6e06ea)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.1

This issue was fixed in the openstack/nova 19.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/660362
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fd5c45473823105d8572d7940980163c6f09169c
Submitter: Zuul
Branch: stable/rocky

commit fd5c45473823105d8572d7940980163c6f09169c
Author: jichenjc <email address hidden>
Date: Mon May 21 02:03:51 2018 +0800

    libvirt: Do not reraise DiskNotFound exceptions during resize

    When an instance has VERIFY_RESIZE status, the instance disk on the
    source compute host has moved to <instance_path>/<instance_uuid>_resize
    folder, which leads to disk not found errors if the update available
    resource periodic task on the source compute runs before resize is
    actually confirmed.

    Icec2769bf42455853cbe686fb30fda73df791b25 almost fixed this issue but it
    will only set reraise to False when task_state is not None, that isn't
    the case when an instance is resized but resize is not yet confirmed.
    This patch adds a condition based on vm_state to ensure we don't
    reraise DiskNotFound exceptions while resize is not confirmed.

    Closes-Bug: 1774249
    Co-Authored-By: Vladyslav Drok <email address hidden>
    Change-Id: Id687e11e235fd6c2f99bb647184310dfdce9a08d
    (cherry picked from commit 966192704c20d1b4e9faf384c8dafac8ea6e06ea)
    (cherry picked from commit f1280ab849d20819791f7c4030f570a917d3e91d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.2.1

This issue was fixed in the openstack/nova 18.2.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/660363
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d3bdeb26155c2d3b53850b790d3800a2dd78cada
Submitter: Zuul
Branch: stable/queens

commit d3bdeb26155c2d3b53850b790d3800a2dd78cada
Author: jichenjc <email address hidden>
Date: Mon May 21 02:03:51 2018 +0800

    libvirt: Do not reraise DiskNotFound exceptions during resize

    When an instance has VERIFY_RESIZE status, the instance disk on the
    source compute host has moved to <instance_path>/<instance_uuid>_resize
    folder, which leads to disk not found errors if the update available
    resource periodic task on the source compute runs before resize is
    actually confirmed.

    Icec2769bf42455853cbe686fb30fda73df791b25 almost fixed this issue but it
    will only set reraise to False when task_state is not None, that isn't
    the case when an instance is resized but resize is not yet confirmed.
    This patch adds a condition based on vm_state to ensure we don't
    reraise DiskNotFound exceptions while resize is not confirmed.

    Closes-Bug: 1774249
    Co-Authored-By: Vladyslav Drok <email address hidden>
    Change-Id: Id687e11e235fd6c2f99bb647184310dfdce9a08d
    (cherry picked from commit 966192704c20d1b4e9faf384c8dafac8ea6e06ea)
    (cherry picked from commit f1280ab849d20819791f7c4030f570a917d3e91d)
    (cherry picked from commit fd5c45473823105d8572d7940980163c6f09169c)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.11

This issue was fixed in the openstack/nova 17.0.11 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.0.0.0rc1

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

Revision history for this message
Matthew Booth (mbooth-9) wrote :

This is not fixed. We've just had a report where we appear to be hitting the race reported in review here:

https://review.opendev.org/#/c/571410/7/nova/virt/libvirt/driver.py

Changed in nova:
status: Fix Released → In Progress
Changed in nova:
assignee: Vladyslav Drok (vdrok) → Matthew Booth (mbooth-9)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/685391
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6198f317be549e6d2bd324a48f226b379556e945
Submitter: Zuul
Branch: master

commit 6198f317be549e6d2bd324a48f226b379556e945
Author: Matthew Booth <email address hidden>
Date: Fri Sep 27 16:51:02 2019 +0100

    libvirt: Ignore DiskNotFound during update_available_resource

    There was a previous attempt to fix this in
    change Id687e11e235fd6c2f99bb647184310dfdce9a08d. However, there were 2
    problems with the previous fix:

    1. The handling of missing volumes and disks, while typically having the
       same cause, was inconsistent.

    2. It failed to consider the very wide race opportunity in
       _get_disk_over_committed_size_total between initially fetching the
       instance list from the DB and later getting disk sizes.

    Because _get_disk_over_committed_size_total() can be a very long
    operation, we found that we were reliably hitting this race in CI.
    It might be possible to fix the race, but this would add unnecessary
    complication to code which isn't critical. It's far more robust just to
    log it and ignore it, which is also consistent with the handling of
    missing volumes.

    Closes-Bug: #1774249

    Change-Id: I48719c02713113a41176b8f5cc3c5831f1284a39

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/711276

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/711277

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/711278

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/711279

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/711276
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=73d9b6e5f622dc645ac6ad322c836ffbe4045072
Submitter: Zuul
Branch: stable/train

commit 73d9b6e5f622dc645ac6ad322c836ffbe4045072
Author: Matthew Booth <email address hidden>
Date: Fri Sep 27 16:51:02 2019 +0100

    libvirt: Ignore DiskNotFound during update_available_resource

    There was a previous attempt to fix this in
    change Id687e11e235fd6c2f99bb647184310dfdce9a08d. However, there were 2
    problems with the previous fix:

    1. The handling of missing volumes and disks, while typically having the
       same cause, was inconsistent.

    2. It failed to consider the very wide race opportunity in
       _get_disk_over_committed_size_total between initially fetching the
       instance list from the DB and later getting disk sizes.

    Because _get_disk_over_committed_size_total() can be a very long
    operation, we found that we were reliably hitting this race in CI.
    It might be possible to fix the race, but this would add unnecessary
    complication to code which isn't critical. It's far more robust just to
    log it and ignore it, which is also consistent with the handling of
    missing volumes.

    Closes-Bug: #1774249

    Change-Id: I48719c02713113a41176b8f5cc3c5831f1284a39
    (cherry picked from commit 6198f317be549e6d2bd324a48f226b379556e945)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/711277
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4700b3658e5983a731d0da259365317e230c4a52
Submitter: Zuul
Branch: stable/stein

commit 4700b3658e5983a731d0da259365317e230c4a52
Author: Matthew Booth <email address hidden>
Date: Fri Sep 27 16:51:02 2019 +0100

    libvirt: Ignore DiskNotFound during update_available_resource

    There was a previous attempt to fix this in
    change Id687e11e235fd6c2f99bb647184310dfdce9a08d. However, there were 2
    problems with the previous fix:

    1. The handling of missing volumes and disks, while typically having the
       same cause, was inconsistent.

    2. It failed to consider the very wide race opportunity in
       _get_disk_over_committed_size_total between initially fetching the
       instance list from the DB and later getting disk sizes.

    Because _get_disk_over_committed_size_total() can be a very long
    operation, we found that we were reliably hitting this race in CI.
    It might be possible to fix the race, but this would add unnecessary
    complication to code which isn't critical. It's far more robust just to
    log it and ignore it, which is also consistent with the handling of
    missing volumes.

    Closes-Bug: #1774249

    Change-Id: I48719c02713113a41176b8f5cc3c5831f1284a39
    (cherry picked from commit 6198f317be549e6d2bd324a48f226b379556e945)
    (cherry picked from commit 73d9b6e5f622dc645ac6ad322c836ffbe4045072)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/711278
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1962633328dc7227dd040c1cf3a9cbe97b36ea37
Submitter: Zuul
Branch: stable/rocky

commit 1962633328dc7227dd040c1cf3a9cbe97b36ea37
Author: Matthew Booth <email address hidden>
Date: Fri Sep 27 16:51:02 2019 +0100

    libvirt: Ignore DiskNotFound during update_available_resource

    There was a previous attempt to fix this in
    change Id687e11e235fd6c2f99bb647184310dfdce9a08d. However, there were 2
    problems with the previous fix:

    1. The handling of missing volumes and disks, while typically having the
       same cause, was inconsistent.

    2. It failed to consider the very wide race opportunity in
       _get_disk_over_committed_size_total between initially fetching the
       instance list from the DB and later getting disk sizes.

    Because _get_disk_over_committed_size_total() can be a very long
    operation, we found that we were reliably hitting this race in CI.
    It might be possible to fix the race, but this would add unnecessary
    complication to code which isn't critical. It's far more robust just to
    log it and ignore it, which is also consistent with the handling of
    missing volumes.

    Closes-Bug: #1774249

    Change-Id: I48719c02713113a41176b8f5cc3c5831f1284a39
    (cherry picked from commit 6198f317be549e6d2bd324a48f226b379556e945)
    (cherry picked from commit 73d9b6e5f622dc645ac6ad322c836ffbe4045072)
    (cherry picked from commit 4700b3658e5983a731d0da259365317e230c4a52)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/711279
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=05f4bf0e6738093d79c6a8ffb9ca3ccb189c6658
Submitter: Zuul
Branch: stable/queens

commit 05f4bf0e6738093d79c6a8ffb9ca3ccb189c6658
Author: Matthew Booth <email address hidden>
Date: Fri Sep 27 16:51:02 2019 +0100

    libvirt: Ignore DiskNotFound during update_available_resource

    There was a previous attempt to fix this in
    change Id687e11e235fd6c2f99bb647184310dfdce9a08d. However, there were 2
    problems with the previous fix:

    1. The handling of missing volumes and disks, while typically having the
       same cause, was inconsistent.

    2. It failed to consider the very wide race opportunity in
       _get_disk_over_committed_size_total between initially fetching the
       instance list from the DB and later getting disk sizes.

    Because _get_disk_over_committed_size_total() can be a very long
    operation, we found that we were reliably hitting this race in CI.
    It might be possible to fix the race, but this would add unnecessary
    complication to code which isn't critical. It's far more robust just to
    log it and ignore it, which is also consistent with the handling of
    missing volumes.

    Closes-Bug: #1774249

    Change-Id: I48719c02713113a41176b8f5cc3c5831f1284a39
    (cherry picked from commit 6198f317be549e6d2bd324a48f226b379556e945)
    (cherry picked from commit 73d9b6e5f622dc645ac6ad322c836ffbe4045072)
    (cherry picked from commit 4700b3658e5983a731d0da259365317e230c4a52)
    (cherry picked from commit 1962633328dc7227dd040c1cf3a9cbe97b36ea37)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.opendev.org/742181

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.opendev.org/742181
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9202b2e6cfeb5dd67b54d7eaa6187725a34aeae6
Submitter: Zuul
Branch: stable/pike

commit 9202b2e6cfeb5dd67b54d7eaa6187725a34aeae6
Author: jichenjc <email address hidden>
Date: Mon May 21 02:03:51 2018 +0800

    libvirt: Do not reraise DiskNotFound exceptions during resize

    When an instance has VERIFY_RESIZE status, the instance disk on the
    source compute host has moved to <instance_path>/<instance_uuid>_resize
    folder, which leads to disk not found errors if the update available
    resource periodic task on the source compute runs before resize is
    actually confirmed.

    Icec2769bf42455853cbe686fb30fda73df791b25 almost fixed this issue but it
    will only set reraise to False when task_state is not None, that isn't
    the case when an instance is resized but resize is not yet confirmed.
    This patch adds a condition based on vm_state to ensure we don't
    reraise DiskNotFound exceptions while resize is not confirmed.

    Closes-Bug: 1774249
    Co-Authored-By: Vladyslav Drok <email address hidden>
    Change-Id: Id687e11e235fd6c2f99bb647184310dfdce9a08d
    (cherry picked from commit 966192704c20d1b4e9faf384c8dafac8ea6e06ea)
    (cherry picked from commit f1280ab849d20819791f7c4030f570a917d3e91d)
    (cherry picked from commit fd5c45473823105d8572d7940980163c6f09169c)
    (cherry picked from commit d3bdeb26155c2d3b53850b790d3800a2dd78cada)

description: updated
Revision history for this message
Alin-Gabriel Serdean (alin-serdean) wrote :
Revision history for this message
Corey Bryant (corey.bryant) wrote :

This is already fixed in bionic and the queens cloud archive for Ubuntu.

Changed in cloud-archive:
status: New → Invalid
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Apologies, it looks like:
https://review.opendev.org/660363
is fixed in Ubuntu bionic and queens, but
https://review.opendev.org/c/openstack/nova/+/711279/
is not yet fixed in bionic (nor queens or rocky in case we need it for upgrades).

Changed in nova (Ubuntu Bionic):
importance: Undecided → High
status: New → Triaged
Changed in nova (Ubuntu):
status: New → Invalid
Revision history for this message
Corey Bryant (corey.bryant) wrote :

An updated package for nova in Ubuntu has been uploaded to the bionic unapproved queue:
https://launchpad.net/ubuntu/bionic/+queue?queue_state=1&queue_text=nova

Revision history for this message
Robie Basak (racb) wrote :

It looks like the patch being applied here is applied in Ubuntu Impish, so would "Fix Released" for the Ubuntu development release task be more accurate? Either way, this looks good from an SRU perspective, so accepting - thanks.

Changed in nova (Ubuntu Bionic):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
Robie Basak (racb) wrote : Please test proposed package

Hello Matthew, or anyone else affected,

Accepted nova into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/2:17.0.13-0ubuntu4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Alin-Gabriel Serdean (alin-serdean) wrote (last edit ):

Hello Robie,

I have tested the proposed package using the test case.

We need to keep in mind that this patch does not allow the exception
to be raised, but we still the error as a WARNING.

I am adding the packages used for testing together with the logs.

ubuntu@test:~/home$ juju run -a nova-compute -- "apt list nova*" | grep installed
nova-api-metadata/bionic-proposed,now 2:17.0.13-0ubuntu4 all [installed]
nova-common/bionic-proposed,now 2:17.0.13-0ubuntu4 all [installed,automatic]
nova-compute/bionic-proposed,now 2:17.0.13-0ubuntu4 all [installed]
nova-compute-kvm/bionic-proposed,now 2:17.0.13-0ubuntu4 all [installed]
nova-compute-libvirt/bionic-proposed,now 2:17.0.13-0ubuntu4 all [installed,automatic]

ubuntu@test:~/home$ juju run -a nova-compute -- grep "DiskNotFound" /var/log/nova/nova-compute.log
2021-10-19 09:22:11.422 19060 WARNING nova.virt.libvirt.driver [req-60b887da-6da1-4463-b754-6d389d7f5df3 - - - - -] Periodic task is updating the host stats, it is trying to get disk info for instance-00000001, but the backing disk storage was removed by a concurrent operation such as resize. Error: No disk at /var/lib/nova/instances/5f5d2c95-afb0-4e8c-9fd1-fc3f29d3a5f8/disk: DiskNotFound: No disk at /var/lib/nova/instances/5f5d2c95-afb0-4e8c-9fd1-fc3f29d3a5f8/disk

tags: removed: verification-needed verification-needed-bionic
tags: added: verification-done-bionic verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nova - 2:17.0.13-0ubuntu4

---------------
nova (2:17.0.13-0ubuntu4) bionic; urgency=medium

  * d/p/libvirt-Ignore-DiskNotFound-during-update_available.patch: Ignore
    DiskNotFound during update_available_resource (LP: #1774249).

 -- Alin-Gabriel Serdean <email address hidden> Tue, 21 Sep 2021 18:29:56 +0000

Changed in nova (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for nova has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote : Please test proposed package

Hello Matthew, or anyone else affected,

Accepted nova into queens-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:queens-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-queens-needed to verification-queens-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-queens-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-queens-needed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova pike-eol

This issue was fixed in the openstack/nova pike-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova queens-eol

This issue was fixed in the openstack/nova queens-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova rocky-eol

This issue was fixed in the openstack/nova rocky-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.