Bug #1847225 “Same container image layers are fetched multiple t...” : Bugs : tripleo

Bogdan Dobrelya (bogdando) on 2019-10-08

tags:	added: containers
Changed in tripleo:
status:	New → Triaged
importance:	Undecided → High
milestone:	none → train-rc1

Bogdan Dobrelya (bogdando) on 2019-10-08

summary:

- Same container image layers are fetched multiple times and linked too
- late in the image-uploader pipeline
+ Same container image layers are fetched multiple times in the image-
+ uploader pipeline

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-08: Fix proposed to tripleo-common (master)

#1

Fix proposed to branch: master
Review: https://review.opendev.org/687288

Changed in tripleo:
assignee:	nobody → Bogdan Dobrelya (bogdando)
status:	Triaged → In Progress

Revision history for this message

Alex Schultz (alex-schultz) wrote on 2019-10-08:

#2

Not sure this is actually an issue as it's just finding the existing layers later and linking them to the existing file rather than re-fetching. Not sure we gain a significant time by changing this part. The problem in upstream CI is really the yum update taking forever due to how long it takes to install the rpms.

Changed in tripleo:
milestone:	train-rc1 → ussuri-1

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2019-10-09:

#3

Download full text (5.6 KiB)

I checked if bug is still relevant after this change, and it seems so, see the results I collected here https://pastebin.com/JPxiWCFz

tl;dr many of the layers with the size of 80-180 Mb are still getting read/fetched multiple times, like 5-28 (!!) times:

44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46 Read 135085240 bytes for sha256:44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46
+44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46 Read 135085240 bytes for sha256:44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46
+44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46 Read 135085240 bytes for sha256:44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46
+44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46 Read 135085240 bytes for sha256:44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46
+44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46 Read 135085240 bytes for sha256:44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46

11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba11...

I checked if bug is still relevant after this change, and it seems so, see the results I collected here https://pastebin.com/JPxiWCFz

tl;dr many of the layers with the size of 80-180 Mb are still getting read/fetched multiple times, like 5-28 (!!) times:

44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46 Read 135085240 bytes for sha256:44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46
+44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46 Read 135085240 bytes for sha256:44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46
+44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46 Read 135085240 bytes for sha256:44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46
+44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46 Read 135085240 bytes for sha256:44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46
+44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46 Read 135085240 bytes for sha256:44497292b8dc302ffe9c183df957f6d2c7f42c692a9b9e983981ed198d6b3a46

11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c
+11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c Read 83089391 bytes for sha256:11784319bf28d1ba118c27f3aa10dfebdee531cf1043df30e05131bcfcd3ad8c

Powered by Gerrit Code Review (2.13.12-11-g1707fec) | Get Help | Press '?' to view keyboard shortcuts

Bogdan Dobrelya (bogdando) on 2019-10-16

tags:

added: stein-backport-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-31: Change abandoned on tripleo-common (master)

#6

Change abandoned by Bogdan Dobrelya (bogdando) (<email address hidden>) on branch: master
Review: https://review.opendev.org/687288
Reason: this doesn't work, I ran out of ideas

Bogdan Dobrelya (bogdando) on 2019-10-31

Changed in tripleo:
status:	In Progress → Triaged
assignee:	Bogdan Dobrelya (bogdando) → nobody
tags:	added: tech-debt

OpenStack Infra (hudson-openstack) on 2019-11-04

Changed in tripleo:
assignee:	nobody → Bogdan Dobrelya (bogdando)
status:	Triaged → In Progress

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2019-11-13:

#7

Benchmarking Image Uploader.pdf Edit (99.6 KiB, application/pdf)

Here is benchmarking results for the proposed change

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-18: Fix merged to tripleo-common (master)

#8

Reviewed: https://review.opendev.org/687288
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=46f81298948865a2c15c4a5035e95a0c77adb5d5
Submitter: Zuul
Branch: master

commit 46f81298948865a2c15c4a5035e95a0c77adb5d5
Author: Bogdan Dobrelya <email address hidden>
Date: Tue Oct 15 11:33:16 2019 +0200

Make upload workers faster on processing layers

    Make upload workers processing image layers only once (as the best
    effort). This also reworks and simplifies locks management for
    individual tasks now managed for the PythonImageUploader class
    namespace only.

    When fetching source layer, cross-link it for the target
    local image, whenever that source is already exists. When pushing a
    layer to a target registry, do not repeat transfering the same data,
    if already pushed earlier for another image.

    The 1st time a layer gets uploaded/fetched for an image, that image and
    its known path (local or remote) becomes a reference for future
    cross-referencing by other images.

Store such information about already processed layers in global view
shared for all workers to speed-up data transfering jobs they execute.

    Having that global view, uploading the 1st image in the tasks list as a
    separate (and non-concurrent) job becomes redundant and now will be
    executed concurently with other images.

Based on the dynamically picked multi-workers mode, provide the global
view as a graf with its MP/MT state synchronization as the following:

    * use globally shared locking info also containing global layers view
      for MP-workers. With the shared global view state we can no longer
      use local locking objects individual for each task.
    * if cannot use multi-process workers, like when executing it via
      Mistral by monkey patched eventlet greenthreads, choose threadinglock
      and multi-threads-safe standard dictionary in the shared class
      namespace to store the global view there
    * if it can do MP, pick processlock also containing a safe from data
      races Manager().dict() as the global view shared among cooperating OS
      processes.
    * use that global view in a transparent fashion, provided by a special
      classmethod proxying access to the internal state shared for workers.

Ultimately, all that optimizes:

    * completion time
    * re-fetching of the already processed layers
    * local deduplication of layers
    * the amount of outbound HTTP requests to registries
    * if-layer-exists and other internal logic check executed against the
      in-memory cache firstly.

As layers locking and unlocking becomes a popular action, reduce the
noise of the debug messages it produces.

Closes-bug: #1847225
Related-bug: #1844446

Change-Id: Ie5ef4045b7e22c06551e886f9f9b6f22c8d4bd21
Signed-off-by: Bogdan Dobrelya <email address hidden>

Reviewed:  https://review.opendev.org/687288
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=46f81298948865a2c15c4a5035e95a0c77adb5d5
Submitter: Zuul
Branch:    master

commit 46f81298948865a2c15c4a5035e95a0c77adb5d5
Author: Bogdan Dobrelya <bdobreli@redhat.com>
Date:   Tue Oct 15 11:33:16 2019 +0200

Make upload workers faster on processing layers
    
    Make upload workers processing image layers only once (as the best
    effort). This also reworks and simplifies locks management for
    individual tasks now managed for the PythonImageUploader class
    namespace only.
    
    When fetching source layer, cross-link it for the target
    local image, whenever that source is already exists. When pushing a
    layer to a target registry, do not repeat transfering the same data,
    if already pushed earlier for another image.
    
    The 1st time a layer gets uploaded/fetched for an image, that image and
    its known path (local or remote) becomes a reference for future
    cross-referencing by other images.
    
    Store such information about already processed layers in global view
    shared for all workers to speed-up data transfering jobs they execute.
    
    Having that global view, uploading the 1st image in the tasks list as a
    separate (and non-concurrent) job becomes redundant and now will be
    executed concurently with other images.
    
    Based on the dynamically picked multi-workers mode, provide the global
    view as a graf with its MP/MT state synchronization as the following:
    
    * use globally shared locking info also containing global layers view
      for MP-workers. With the shared global view state we can no longer
      use local locking objects individual for each task.
    * if cannot use multi-process workers, like when executing it via
      Mistral by monkey patched eventlet greenthreads, choose threadinglock
      and multi-threads-safe standard dictionary in the shared class
      namespace to store the global view there
    * if it can do MP, pick processlock also containing a safe from data
      races Manager().dict() as the global view shared among cooperating OS
      processes.
    * use that global view in a transparent fashion, provided by a special
      classmethod proxying access to the internal state shared for workers.
    
    Ultimately, all that optimizes:
    
    * completion time
    * re-fetching of the already processed layers
    * local deduplication of layers
    * the amount of outbound HTTP requests to registries
    * if-layer-exists and other internal logic check executed against the
      in-memory cache firstly.
    
    As layers locking and unlocking becomes a popular action, reduce the
    noise of the debug messages it produces.
    
    Closes-bug: #1847225
    Related-bug: #1844446
    
    Change-Id: Ie5ef4045b7e22c06551e886f9f9b6f22c8d4bd21
    Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>

Changed in tripleo:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-19: Fix proposed to tripleo-common (stable/train)

#9

Fix proposed to branch: stable/train
Review: https://review.opendev.org/694963

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-19: Fix proposed to tripleo-common (stable/stein)

#10

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/694965

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-20: Fix merged to tripleo-common (stable/train)

#11

Reviewed: https://review.opendev.org/694963
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=a5768f0d75638325230a673427a6f554cca05d47
Submitter: Zuul
Branch: stable/train

commit a5768f0d75638325230a673427a6f554cca05d47
Author: Bogdan Dobrelya <email address hidden>
Date: Tue Oct 15 11:33:16 2019 +0200

Make upload workers faster on processing layers

    Make upload workers processing image layers only once (as the best
    effort). This also reworks and simplifies locks management for
    individual tasks now managed for the PythonImageUploader class
    namespace only.

    When fetching source layer, cross-link it for the target
    local image, whenever that source is already exists. When pushing a
    layer to a target registry, do not repeat transfering the same data,
    if already pushed earlier for another image.

    The 1st time a layer gets uploaded/fetched for an image, that image and
    its known path (local or remote) becomes a reference for future
    cross-referencing by other images.

Store such information about already processed layers in global view
shared for all workers to speed-up data transfering jobs they execute.

    Having that global view, uploading the 1st image in the tasks list as a
    separate (and non-concurrent) job becomes redundant and now will be
    executed concurently with other images.

Based on the dynamically picked multi-workers mode, provide the global
view as a graf with its MP/MT state synchronization as the following:

    * use globally shared locking info also containing global layers view
      for MP-workers. With the shared global view state we can no longer
      use local locking objects individual for each task.
    * if cannot use multi-process workers, like when executing it via
      Mistral by monkey patched eventlet greenthreads, choose threadinglock
      and multi-threads-safe standard dictionary in the shared class
      namespace to store the global view there
    * if it can do MP, pick processlock also containing a safe from data
      races Manager().dict() as the global view shared among cooperating OS
      processes.
    * use that global view in a transparent fashion, provided by a special
      classmethod proxying access to the internal state shared for workers.

Ultimately, all that optimizes:

    * completion time
    * re-fetching of the already processed layers
    * local deduplication of layers
    * the amount of outbound HTTP requests to registries
    * if-layer-exists and other internal logic check executed against the
      in-memory cache firstly.

As layers locking and unlocking becomes a popular action, reduce the
noise of the debug messages it produces.

Closes-bug: #1847225
Related-bug: #1844446

    Change-Id: Ie5ef4045b7e22c06551e886f9f9b6f22c8d4bd21
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 46f81298948865a2c15c4a5035e95a0c77adb5d5)

Reviewed:  https://review.opendev.org/694963
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=a5768f0d75638325230a673427a6f554cca05d47
Submitter: Zuul
Branch:    stable/train

commit a5768f0d75638325230a673427a6f554cca05d47
Author: Bogdan Dobrelya <bdobreli@redhat.com>
Date:   Tue Oct 15 11:33:16 2019 +0200

Make upload workers faster on processing layers
    
    Make upload workers processing image layers only once (as the best
    effort). This also reworks and simplifies locks management for
    individual tasks now managed for the PythonImageUploader class
    namespace only.
    
    When fetching source layer, cross-link it for the target
    local image, whenever that source is already exists. When pushing a
    layer to a target registry, do not repeat transfering the same data,
    if already pushed earlier for another image.
    
    The 1st time a layer gets uploaded/fetched for an image, that image and
    its known path (local or remote) becomes a reference for future
    cross-referencing by other images.
    
    Store such information about already processed layers in global view
    shared for all workers to speed-up data transfering jobs they execute.
    
    Having that global view, uploading the 1st image in the tasks list as a
    separate (and non-concurrent) job becomes redundant and now will be
    executed concurently with other images.
    
    Based on the dynamically picked multi-workers mode, provide the global
    view as a graf with its MP/MT state synchronization as the following:
    
    * use globally shared locking info also containing global layers view
      for MP-workers. With the shared global view state we can no longer
      use local locking objects individual for each task.
    * if cannot use multi-process workers, like when executing it via
      Mistral by monkey patched eventlet greenthreads, choose threadinglock
      and multi-threads-safe standard dictionary in the shared class
      namespace to store the global view there
    * if it can do MP, pick processlock also containing a safe from data
      races Manager().dict() as the global view shared among cooperating OS
      processes.
    * use that global view in a transparent fashion, provided by a special
      classmethod proxying access to the internal state shared for workers.
    
    Ultimately, all that optimizes:
    
    * completion time
    * re-fetching of the already processed layers
    * local deduplication of layers
    * the amount of outbound HTTP requests to registries
    * if-layer-exists and other internal logic check executed against the
      in-memory cache firstly.
    
    As layers locking and unlocking becomes a popular action, reduce the
    noise of the debug messages it produces.
    
    Closes-bug: #1847225
    Related-bug: #1844446
    
    Change-Id: Ie5ef4045b7e22c06551e886f9f9b6f22c8d4bd21
    Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
    (cherry picked from commit 46f81298948865a2c15c4a5035e95a0c77adb5d5)

tags:	added: in-stable-train
tags:	added: in-stable-stein

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-20: Fix merged to tripleo-common (stable/stein)

#12

Reviewed: https://review.opendev.org/694965
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=7b46f4d70b3e36b9d5b4c6f8eaca439404f7480e
Submitter: Zuul
Branch: stable/stein

commit 7b46f4d70b3e36b9d5b4c6f8eaca439404f7480e
Author: Bogdan Dobrelya <email address hidden>
Date: Tue Oct 15 11:33:16 2019 +0200

Make upload workers faster on processing layers

    Make upload workers processing image layers only once (as the best
    effort). This also reworks and simplifies locks management for
    individual tasks now managed for the PythonImageUploader class
    namespace only.

    When fetching source layer, cross-link it for the target
    local image, whenever that source is already exists. When pushing a
    layer to a target registry, do not repeat transfering the same data,
    if already pushed earlier for another image.

    The 1st time a layer gets uploaded/fetched for an image, that image and
    its known path (local or remote) becomes a reference for future
    cross-referencing by other images.

Store such information about already processed layers in global view
shared for all workers to speed-up data transfering jobs they execute.

    Having that global view, uploading the 1st image in the tasks list as a
    separate (and non-concurrent) job becomes redundant and now will be
    executed concurently with other images.

Based on the dynamically picked multi-workers mode, provide the global
view as a graf with its MP/MT state synchronization as the following:

    * use globally shared locking info also containing global layers view
      for MP-workers. With the shared global view state we can no longer
      use local locking objects individual for each task.
    * if cannot use multi-process workers, like when executing it via
      Mistral by monkey patched eventlet greenthreads, choose threadinglock
      and multi-threads-safe standard dictionary in the shared class
      namespace to store the global view there
    * if it can do MP, pick processlock also containing a safe from data
      races Manager().dict() as the global view shared among cooperating OS
      processes.
    * use that global view in a transparent fashion, provided by a special
      classmethod proxying access to the internal state shared for workers.

Ultimately, all that optimizes:

    * completion time
    * re-fetching of the already processed layers
    * local deduplication of layers
    * the amount of outbound HTTP requests to registries
    * if-layer-exists and other internal logic check executed against the
      in-memory cache firstly.

As layers locking and unlocking becomes a popular action, reduce the
noise of the debug messages it produces.

Closes-bug: #1847225
Related-bug: #1844446

    Change-Id: Ie5ef4045b7e22c06551e886f9f9b6f22c8d4bd21
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 46f81298948865a2c15c4a5035e95a0c77adb5d5)

Reviewed:  https://review.opendev.org/694965
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=7b46f4d70b3e36b9d5b4c6f8eaca439404f7480e
Submitter: Zuul
Branch:    stable/stein

commit 7b46f4d70b3e36b9d5b4c6f8eaca439404f7480e
Author: Bogdan Dobrelya <bdobreli@redhat.com>
Date:   Tue Oct 15 11:33:16 2019 +0200

Make upload workers faster on processing layers
    
    Make upload workers processing image layers only once (as the best
    effort). This also reworks and simplifies locks management for
    individual tasks now managed for the PythonImageUploader class
    namespace only.
    
    When fetching source layer, cross-link it for the target
    local image, whenever that source is already exists. When pushing a
    layer to a target registry, do not repeat transfering the same data,
    if already pushed earlier for another image.
    
    The 1st time a layer gets uploaded/fetched for an image, that image and
    its known path (local or remote) becomes a reference for future
    cross-referencing by other images.
    
    Store such information about already processed layers in global view
    shared for all workers to speed-up data transfering jobs they execute.
    
    Having that global view, uploading the 1st image in the tasks list as a
    separate (and non-concurrent) job becomes redundant and now will be
    executed concurently with other images.
    
    Based on the dynamically picked multi-workers mode, provide the global
    view as a graf with its MP/MT state synchronization as the following:
    
    * use globally shared locking info also containing global layers view
      for MP-workers. With the shared global view state we can no longer
      use local locking objects individual for each task.
    * if cannot use multi-process workers, like when executing it via
      Mistral by monkey patched eventlet greenthreads, choose threadinglock
      and multi-threads-safe standard dictionary in the shared class
      namespace to store the global view there
    * if it can do MP, pick processlock also containing a safe from data
      races Manager().dict() as the global view shared among cooperating OS
      processes.
    * use that global view in a transparent fashion, provided by a special
      classmethod proxying access to the internal state shared for workers.
    
    Ultimately, all that optimizes:
    
    * completion time
    * re-fetching of the already processed layers
    * local deduplication of layers
    * the amount of outbound HTTP requests to registries
    * if-layer-exists and other internal logic check executed against the
      in-memory cache firstly.
    
    As layers locking and unlocking becomes a popular action, reduce the
    noise of the debug messages it produces.
    
    Closes-bug: #1847225
    Related-bug: #1844446
    
    Change-Id: Ie5ef4045b7e22c06551e886f9f9b6f22c8d4bd21
    Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
    (cherry picked from commit 46f81298948865a2c15c4a5035e95a0c77adb5d5)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-27: Fix included in openstack/tripleo-common 12.0.0

#13

This issue was fixed in the openstack/tripleo-common 12.0.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-12-12: Fix included in openstack/tripleo-common 10.8.2

#14

This issue was fixed in the openstack/tripleo-common 10.8.2 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-01-06: Fix included in openstack/tripleo-common 11.3.2

#15

This issue was fixed in the openstack/tripleo-common 11.3.2 release.

tripleo

Same container image layers are fetched multiple times in the image-uploader pipeline

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches