Activity log for bug #1932127

Date Who What changed Old value New value Message
2021-06-16 08:33:29 Olaf Seibert bug added bug
2021-06-16 08:33:29 Olaf Seibert attachment added Log file extracts https://bugs.launchpad.net/bugs/1932127/+attachment/5504996/+files/Bugreport
2021-06-16 09:40:46 Olaf Seibert description Reproduction scenario: - create a large base image (50 GB or so). This could be done for example by creating a VM, filling its ephemeral storage (which is 50 GB in our case) with lots of junk data, shutting down the VM, and creating an image from that. - create a VM based on this image. While the image is downloaded by nova-compute from glance, it seems that other threads are locked out (too long). Network connection failures get logged an if it lasts long enough, creating the VM often fails. We first saw this on Queens, but I reproduced the same issue on Ussuri. As hypervisor we use Libvirt + KVM. For storage we use Quobyte (shared storage). For networking we use Midonet (on Queens) and OVS on Ussuri. My solution was to put "greenthreads.sleep(0)" in the inner loop, like so: From: Olaf Seibert <o.seibert@syseleven.de> Date: Thu, 10 Jun 2021 11:38:16 +0000 Subject: Allow other threads to run. While downloading a base image from Glance, other threads don't get enough of a chance to run, and network connections start to time out. See os-9400. --- nova/image/glance.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/nova/image/glance.py b/nova/image/glance.py index 13bfd90..8231351 100644 --- a/nova/image/glance.py +++ b/nova/image/glance.py @@ -30,6 +30,7 @@ import time import cryptography from cursive import exception as cursive_exception from cursive import signature_utils +from eventlet import greenthread import glanceclient import glanceclient.exc from glanceclient.v2 import schemas @@ -384,6 +385,7 @@ class GlanceImageServiceV2(object): try: for chunk in image_chunks: verifier.update(chunk) + greenthread.sleep(0) verifier.verify() LOG.info('Image signature verification succeeded ' @@ -400,6 +402,7 @@ class GlanceImageServiceV2(object): if verifier: verifier.update(chunk) data.write(chunk) + greenthread.sleep(0) if verifier: verifier.verify() LOG.info('Image signature verification succeeded ' However, the download happens in chunks of only 64 KB, so the sleep(0) is called extremely frequently. Maybe there is a better solution, but it should not be too complicated for such a tight loop. I have attached some log files, since this bug tracker thinks the bug description is too long. Reproduction scenario: - create a large base image (50 GB or so). This could be done for example by creating a VM, filling its ephemeral storage (which is 50 GB in our case) with lots of junk data, shutting down the VM, and creating an image from that. - create a VM based on this image. While the image is downloaded by nova-compute from glance, it seems that other threads are locked out (too long). Network connection failures get logged an if it lasts long enough, creating the VM often fails. We first saw this on Queens, but I reproduced the same issue on Ussuri. As hypervisor we use Libvirt + KVM. For storage we use Quobyte (shared storage). For networking we use Midonet (on Queens) and OVS on Ussuri. My solution was to put "greenthread.sleep(0)" in the inner loop, like so: From: Olaf Seibert <o.seibert@syseleven.de> Date: Thu, 10 Jun 2021 11:38:16 +0000 Subject: Allow other threads to run. While downloading a base image from Glance, other threads don't get enough of a chance to run, and network connections start to time out. See os-9400. ---  nova/image/glance.py | 3 +++  1 file changed, 3 insertions(+) diff --git a/nova/image/glance.py b/nova/image/glance.py index 13bfd90..8231351 100644 --- a/nova/image/glance.py +++ b/nova/image/glance.py @@ -30,6 +30,7 @@ import time  import cryptography  from cursive import exception as cursive_exception  from cursive import signature_utils +from eventlet import greenthread  import glanceclient  import glanceclient.exc  from glanceclient.v2 import schemas @@ -384,6 +385,7 @@ class GlanceImageServiceV2(object):                  try:                      for chunk in image_chunks:                          verifier.update(chunk) + greenthread.sleep(0)                      verifier.verify()                      LOG.info('Image signature verification succeeded ' @@ -400,6 +402,7 @@ class GlanceImageServiceV2(object):                      if verifier:                          verifier.update(chunk)                      data.write(chunk) + greenthread.sleep(0)                  if verifier:                      verifier.verify()                      LOG.info('Image signature verification succeeded ' However, the download happens in chunks of only 64 KB, so the sleep(0) is called extremely frequently. Maybe there is a better solution, but it should not be too complicated for such a tight loop. I have attached some log files, since this bug tracker thinks the bug description is too long.
2021-06-16 09:44:42 Olaf Seibert description Reproduction scenario: - create a large base image (50 GB or so). This could be done for example by creating a VM, filling its ephemeral storage (which is 50 GB in our case) with lots of junk data, shutting down the VM, and creating an image from that. - create a VM based on this image. While the image is downloaded by nova-compute from glance, it seems that other threads are locked out (too long). Network connection failures get logged an if it lasts long enough, creating the VM often fails. We first saw this on Queens, but I reproduced the same issue on Ussuri. As hypervisor we use Libvirt + KVM. For storage we use Quobyte (shared storage). For networking we use Midonet (on Queens) and OVS on Ussuri. My solution was to put "greenthread.sleep(0)" in the inner loop, like so: From: Olaf Seibert <o.seibert@syseleven.de> Date: Thu, 10 Jun 2021 11:38:16 +0000 Subject: Allow other threads to run. While downloading a base image from Glance, other threads don't get enough of a chance to run, and network connections start to time out. See os-9400. ---  nova/image/glance.py | 3 +++  1 file changed, 3 insertions(+) diff --git a/nova/image/glance.py b/nova/image/glance.py index 13bfd90..8231351 100644 --- a/nova/image/glance.py +++ b/nova/image/glance.py @@ -30,6 +30,7 @@ import time  import cryptography  from cursive import exception as cursive_exception  from cursive import signature_utils +from eventlet import greenthread  import glanceclient  import glanceclient.exc  from glanceclient.v2 import schemas @@ -384,6 +385,7 @@ class GlanceImageServiceV2(object):                  try:                      for chunk in image_chunks:                          verifier.update(chunk) + greenthread.sleep(0)                      verifier.verify()                      LOG.info('Image signature verification succeeded ' @@ -400,6 +402,7 @@ class GlanceImageServiceV2(object):                      if verifier:                          verifier.update(chunk)                      data.write(chunk) + greenthread.sleep(0)                  if verifier:                      verifier.verify()                      LOG.info('Image signature verification succeeded ' However, the download happens in chunks of only 64 KB, so the sleep(0) is called extremely frequently. Maybe there is a better solution, but it should not be too complicated for such a tight loop. I have attached some log files, since this bug tracker thinks the bug description is too long. Reproduction scenario: - create a large base image (50 GB or so). This could be done for example by creating a VM, filling its ephemeral storage (which is 50 GB in our case) with lots of junk data, shutting down the VM, and creating an image from that. - create a VM based on this image. While the image is downloaded by nova-compute from glance, it seems that other threads are locked out (too long). Network connection failures get logged an if it lasts long enough, creating the VM often fails. We first saw this on Queens, but I reproduced the same issue on Ussuri. As hypervisor we use Libvirt + KVM. For storage we use Quobyte (shared storage), but these tests were done on compute nodes with local storage (using LVM). For networking we use Midonet (on Queens) and OVS on Ussuri. My solution was to put "greenthreads.sleep(0)" in the inner loop, like so: From: Olaf Seibert <o.seibert@syseleven.de> Date: Thu, 10 Jun 2021 11:38:16 +0000 Subject: Allow other threads to run. While downloading a base image from Glance, other threads don't get enough of a chance to run, and network connections start to time out. See os-9400. ---  nova/image/glance.py | 3 +++  1 file changed, 3 insertions(+) diff --git a/nova/image/glance.py b/nova/image/glance.py index 13bfd90..8231351 100644 --- a/nova/image/glance.py +++ b/nova/image/glance.py @@ -30,6 +30,7 @@ import time  import cryptography  from cursive import exception as cursive_exception  from cursive import signature_utils +from eventlet import greenthread  import glanceclient  import glanceclient.exc  from glanceclient.v2 import schemas @@ -384,6 +385,7 @@ class GlanceImageServiceV2(object):                  try:                      for chunk in image_chunks:                          verifier.update(chunk) + greenthread.sleep(0)                      verifier.verify()                      LOG.info('Image signature verification succeeded ' @@ -400,6 +402,7 @@ class GlanceImageServiceV2(object):                      if verifier:                          verifier.update(chunk)                      data.write(chunk) + greenthread.sleep(0)                  if verifier:                      verifier.verify()                      LOG.info('Image signature verification succeeded ' However, the download happens in chunks of only 64 KB, so the sleep(0) is called extremely frequently. Maybe there is a better solution, but it should not be too complicated for such a tight loop. I have attached some log files, since this bug tracker thinks the bug description is too long.
2021-06-16 12:04:38 Silvan Kaiser bug added subscriber Silvan Kaiser
2021-06-16 15:40:36 Matthias Bernhardt bug added subscriber Matthias Bernhardt
2022-04-27 02:32:28 melanie witt nova: importance Undecided Medium
2022-04-27 02:32:28 melanie witt nova: status New Triaged