Comment 5 for bug 1874032

Revision history for this message
Alexandre arents (aarents) wrote :

Thanks for the link https://review.opendev.org/#/c/609180/, It helps to avoid overload by limiting concurrent IO task running.

But it does not solve my issue, let me explain further my use case:

Imagine a compute host with 100 vms, at night customer run backup (instance snaphot).
We want to allow 5/10 simultanous snapshot even if snapshot dir is busy,
As backup is best effort, it is not an issue for us if it takes hours.

And unfortunetly I can have the issue with max_concurrent_disk_ops=1 if snap directory is on a network shared filesystem, as network drive can be busy regardless of my compute activity.

-> resource contention on the snapshot directory should just slowdown upload, not make hang nova-compute.

Currently nova-compute process own IO during upload to glance(it uses glance client)

To scale nova-compute, we need to offload nova-compute from doing those IO.
We workaround this by running snapshot upload to glance in an utils.execute(*curl) in order to get it fork().
this prevent us to have hundreds of nova-compute flapping as describe in bugs report.

As you suggested, we can discuss that during VPTG!