I was able to confirm that nova (not sure if nova api) quiesce the domain before attempting to call volume_snapshot_create. Then, on the volume_snapshot_create function, nova calls libvirt virDomainSnapshotCreateXML() with the flag VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE=true, but this call fails because the domain was frozen already. I believe that's why the snapshot ends up failing.
Here is an example log (I manually added first line to debug the issue):
2022-10-28 14:22:49.968 7 INFO nova.virt.libvirt.driver [req-b642987d-b4ce-4a84-a080-18dd6e3a3c62 c83b20bf05a74781aed3f71d5754016e 09db9c0e63c14e8284d2bd0c25808adb - default default] [instance: b2d46a43-c6c7-4dd7-b524-0b018b689d98] Quiesce called from quiesce_instance
2022-10-28 14:22:51.481 7 DEBUG nova.virt.libvirt.driver [req-98e61afa-5ef6-4717-aea4-c80193880df2 899761a635c847e483536855cd6a9af9 3bfb8905aa84474c9e8611749f5f5329 - default default] [instance: b2d46a43-c6c7-4dd7-b524-0b018b689d98] volume_snapshot_create: create_info: {'type': 'qcow2', 'new_file': 'volume-07df969b-b237-498b-8bbb-0b064e273f07.077fe32d-f93c-4fd0-8a1d-c48e13b71dfd', 'snapshot_id': '077fe32d-f93c-4fd0-8a1d-c48e13b71dfd'} volume_snapshot_create /var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py:3401
2022-10-28 14:22:51.483 7 DEBUG nova.virt.libvirt.driver [req-98e61afa-5ef6-4717-aea4-c80193880df2 899761a635c847e483536855cd6a9af9 3bfb8905aa84474c9e8611749f5f5329 - default default] [instance: b2d46a43-c6c7-4dd7-b524-0b018b689d98] snap xml: <domainsnapshot>
<disks>
<disk name="/var/lib/nova/mnt/e4ffe64b9de2f7551ebb601d06e7891b/volume-07df969b-b237-498b-8bbb-0b064e273f07" snapshot="external" type="file">
<source file="/var/lib/nova/mnt/e4ffe64b9de2f7551ebb601d06e7891b/volume-07df969b-b237-498b-8bbb-0b064e273f07.077fe32d-f93c-4fd0-8a1d-c48e13b71dfd"/>
</disk>
</disks>
</domainsnapshot>
_volume_snapshot_create /var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py:3341
2022-10-28 14:22:51.483 7 DEBUG nova.objects.instance [req-98e61afa-5ef6-4717-aea4-c80193880df2 899761a635c847e483536855cd6a9af9 3bfb8905aa84474c9e8611749f5f5329 - default default] Lazy-loading 'system_metadata' on Instance uuid b2d46a43-c6c7-4dd7-b524-0b018b689d98 obj_load_attr /var/lib/kolla/venv/lib/python3.6/site-packages/nova/objects/instance.py:1101
2022-10-28 14:22:51.508 7 ERROR nova.virt.libvirt.driver [req-98e61afa-5ef6-4717-aea4-c80193880df2 899761a635c847e483536855cd6a9af9 3bfb8905aa84474c9e8611749f5f5329 - default default] [instance: b2d46a43-c6c7-4dd7-b524-0b018b689d98] Error occurred during volume_snapshot_create, sending error status to Cinder.: libvirt.libvirtError: internal error: unable to execute QEMU agent command 'guest-fsfreeze-freeze': The command guest-fsfreeze-freeze has been disabled for this instance
The snapshot succeeds if I temporarily set VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE=false, but I don't know if this is safe. What would be the best approach to avoid this double quiesce attempt?
I was able to confirm that nova (not sure if nova api) quiesce the domain before attempting to call volume_ snapshot_ create. Then, on the volume_ snapshot_ create function, nova calls libvirt virDomainSnapsh otCreateXML( ) with the flag VIR_DOMAIN_ SNAPSHOT_ CREATE_ QUIESCE= true, but this call fails because the domain was frozen already. I believe that's why the snapshot ends up failing.
Here is an example log (I manually added first line to debug the issue): libvirt. driver [req-b642987d- b4ce-4a84- a080-18dd6e3a3c 62 c83b20bf05a7478 1aed3f71d575401 6e 09db9c0e63c14e8 284d2bd0c25808a db - default default] [instance: b2d46a43- c6c7-4dd7- b524-0b018b689d 98] Quiesce called from quiesce_instance libvirt. driver [req-98e61afa- 5ef6-4717- aea4-c80193880d f2 899761a635c847e 483536855cd6a9a f9 3bfb8905aa84474 c9e8611749f5f53 29 - default default] [instance: b2d46a43- c6c7-4dd7- b524-0b018b689d 98] volume_ snapshot_ create: create_info: {'type': 'qcow2', 'new_file': 'volume- 07df969b- b237-498b- 8bbb-0b064e273f 07.077fe32d- f93c-4fd0- 8a1d-c48e13b71d fd', 'snapshot_id': '077fe32d- f93c-4fd0- 8a1d-c48e13b71d fd'} volume_ snapshot_ create /var/lib/ kolla/venv/ lib/python3. 6/site- packages/ nova/virt/ libvirt/ driver. py:3401 libvirt. driver [req-98e61afa- 5ef6-4717- aea4-c80193880d f2 899761a635c847e 483536855cd6a9a f9 3bfb8905aa84474 c9e8611749f5f53 29 - default default] [instance: b2d46a43- c6c7-4dd7- b524-0b018b689d 98] snap xml: <domainsnapshot> var/lib/ nova/mnt/ e4ffe64b9de2f75 51ebb601d06e789 1b/volume- 07df969b- b237-498b- 8bbb-0b064e273f 07" snapshot="external" type="file"> var/lib/ nova/mnt/ e4ffe64b9de2f75 51ebb601d06e789 1b/volume- 07df969b- b237-498b- 8bbb-0b064e273f 07.077fe32d- f93c-4fd0- 8a1d-c48e13b71d fd"/> snapshot_ create /var/lib/ kolla/venv/ lib/python3. 6/site- packages/ nova/virt/ libvirt/ driver. py:3341 instance [req-98e61afa- 5ef6-4717- aea4-c80193880d f2 899761a635c847e 483536855cd6a9a f9 3bfb8905aa84474 c9e8611749f5f53 29 - default default] Lazy-loading 'system_metadata' on Instance uuid b2d46a43- c6c7-4dd7- b524-0b018b689d 98 obj_load_attr /var/lib/ kolla/venv/ lib/python3. 6/site- packages/ nova/objects/ instance. py:1101 libvirt. driver [req-98e61afa- 5ef6-4717- aea4-c80193880d f2 899761a635c847e 483536855cd6a9a f9 3bfb8905aa84474 c9e8611749f5f53 29 - default default] [instance: b2d46a43- c6c7-4dd7- b524-0b018b689d 98] Error occurred during volume_ snapshot_ create, sending error status to Cinder.: libvirt. libvirtError: internal error: unable to execute QEMU agent command 'guest- fsfreeze- freeze' : The command guest-fsfreeze- freeze has been disabled for this instance
2022-10-28 14:22:49.968 7 INFO nova.virt.
2022-10-28 14:22:51.481 7 DEBUG nova.virt.
2022-10-28 14:22:51.483 7 DEBUG nova.virt.
<disks>
<disk name="/
<source file="/
</disk>
</disks>
</domainsnapshot>
_volume_
2022-10-28 14:22:51.483 7 DEBUG nova.objects.
2022-10-28 14:22:51.508 7 ERROR nova.virt.
The snapshot succeeds if I temporarily set VIR_DOMAIN_ SNAPSHOT_ CREATE_ QUIESCE= false, but I don't know if this is safe. What would be the best approach to avoid this double quiesce attempt?