multiple tripleo jobs timing out upstream causing gate resets in train

Bug #1844446 reported by wes hayutin
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Alex Schultz

Bug Description

http://zuul.openstack.org/builds?job_name=tripleo-ci-centos-7-containers-multinode&result=TIMED_OUT

possible root cause:

patch to increase the verbosity of container download / upload to resolve ovh failures: https://review.opendev.org/#/c/674919

and

close httpd sessions
https://review.opendev.org/#/c/676387/

tags: added: alert
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to python-tripleoclient (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/682905

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/682943

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/683001

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart (master)

Reviewed: https://review.opendev.org/683001
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart/commit/?id=97e0c940f317f3ab880bf5b63887ef0654676686
Submitter: Clark Boylan (<email address hidden>)
Branch: master

commit 97e0c940f317f3ab880bf5b63887ef0654676686
Author: Emilien Macchi <email address hidden>
Date: Wed Sep 18 17:25:46 2019 -0400

    fs010: disable validations on the undercloud

    The validations cause timeouts, we should have them tested outside of
    gate/check for now.

    Change-Id: I198939058b9d09b18e914db42b1bf896dc243d63
    Related-Bug: #1844446

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to python-tripleoclient (master)

Reviewed: https://review.opendev.org/682905
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=89be0d17c5d380777070e8f3dd6369289697d2a9
Submitter: Clark Boylan (<email address hidden>)
Branch: master

commit 89be0d17c5d380777070e8f3dd6369289697d2a9
Author: Emilien Macchi <email address hidden>
Date: Wed Sep 18 10:07:59 2019 -0400

    Disable inflight validations by default

    The inflight validations caused timeouts in the TripleO CI.
    They should not be enabled by default. New features should not be
    enabled by default in general, they should be optional.

    So this patch changes the option to be --inflight-validations,
    and make sure the inflights are disabled by default everywhere it's
    called.

    Change-Id: I082628a9480686d6a7801056c3b4bf332b4e3d95
    Related-Bug: #1844446

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/682943
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=64af3ae1fb9adda02efcfd93d99cf5deca1104cb
Submitter: Clark Boylan (<email address hidden>)
Branch: master

commit 64af3ae1fb9adda02efcfd93d99cf5deca1104cb
Author: Emilien Macchi <email address hidden>
Date: Wed Sep 18 12:15:39 2019 -0400

    Introduce --inflight-validations for standalone / undercloud

    Like we do for the overcloud, add the --inflight-validations option,
    disabled by default. Disable by default, we'll skip the
    "opendev-validations" Ansible tags when running the playbooks.

    Related-Bug: #1844446
    Change-Id: Ia37b3d4cc657d994b6a63412d5792930d54a14dd

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to python-tripleoclient (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/683027

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to python-tripleoclient (stable/stein)

Reviewed: https://review.opendev.org/683027
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=499b487c51aa91e5b035e9fadcb2e148ad4e8508
Submitter: Zuul
Branch: stable/stein

commit 499b487c51aa91e5b035e9fadcb2e148ad4e8508
Author: Cédric Jeanneret <email address hidden>
Date: Thu Aug 15 11:44:32 2019 +0200

    (squash) Clean backport of inflight-validations for UC/OC/standalone

    Adds new "--no-inflight-validations" option to deploy CLI

    This provides an independant way to activate or not the in-flight
    validations within a deploy.
    The default is to have them running, and this option allows to
    deactivate the in-flight ones.

    Change-Id: I81e934e2978cad4e2713d54e19a57c84a6ac0b52
    (cherry picked from commit bf48dbc84405208dd86ae3dd4879fc7735b99838)

    Disable inflight validations by default

    The inflight validations caused timeouts in the TripleO CI.
    They should not be enabled by default. New features should not be
    enabled by default in general, they should be optional.

    So this patch changes the option to be --inflight-validations,
    and make sure the inflights are disabled by default everywhere it's
    called.

    Change-Id: I082628a9480686d6a7801056c3b4bf332b4e3d95
    Related-Bug: #1844446
    (cherry picked from commit 89be0d17c5d380777070e8f3dd6369289697d2a9)

    Introduce --inflight-validations for standalone / undercloud

    Like we do for the overcloud, add the --inflight-validations option,
    disabled by default. Disable by default, we'll skip the
    "opendev-validations" Ansible tags when running the playbooks.

    Related-Bug: #1844446
    Change-Id: Ia37b3d4cc657d994b6a63412d5792930d54a14dd
    (cherry picked from commit 64af3ae1fb9adda02efcfd93d99cf5deca1104cb)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/683936

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/684391

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/684411

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/684713

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/684786

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (master)

Reviewed: https://review.opendev.org/684391
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=5cbb6bee6a15ed8f6cc249ea5b9d9157cac40a2a
Submitter: Zuul
Branch: master

commit 5cbb6bee6a15ed8f6cc249ea5b9d9157cac40a2a
Author: Alex Schultz <email address hidden>
Date: Tue Sep 24 09:58:55 2019 -0600

    Remove chunk size for url stream

    Related-Bug: #1844446
    Change-Id: I0eea178cc4fe037dd7478c512c41ea75eae719a1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/684824

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-common (master)

Change abandoned by Cédric Jeanneret (Tengu) (<email address hidden>) on branch: master
Review: https://review.opendev.org/684713
Reason: abandoned in favor of https://review.opendev.org/684786/

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (stable/stein)

Reviewed: https://review.opendev.org/684824
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=c5729bb84704dfc7dbd46f6d93b85f4aca114d43
Submitter: Zuul
Branch: stable/stein

commit c5729bb84704dfc7dbd46f6d93b85f4aca114d43
Author: Alex Schultz <email address hidden>
Date: Tue Sep 24 09:58:55 2019 -0600

    Remove chunk size for url stream

    Related-Bug: #1844446
    Change-Id: I0eea178cc4fe037dd7478c512c41ea75eae719a1
    (cherry picked from commit 5cbb6bee6a15ed8f6cc249ea5b9d9157cac40a2a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/685134

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/685175

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-common (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.opendev.org/685134

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (master)

Reviewed: https://review.opendev.org/685134
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=3adfefa13aebd7bd9e48c952df9107d51a557011
Submitter: Zuul
Branch: master

commit 3adfefa13aebd7bd9e48c952df9107d51a557011
Author: Alex Schultz <email address hidden>
Date: Thu Sep 26 11:05:15 2019 -0600

    Randomize the container list for uploads

    When we work through the list of containers in an alphabetical fashion,
    we end up duplicating much of the layer fetching because it can occur at
    the same time. Things like cinder-api, cinder-backup, cinder-volume
    share many of the same layers. Since we don't ensure that we only do
    a single fetching of a layer hash durring the multiprocessing, we end
    up duplicating the fetches of layers. By randomizing the fetches, we
    reduce the likelihood that we'll be fetching the same family of service
    containers concurrently.

    Change-Id: Ifbcd55de52c9e2283203b1c6e2adeb266d43eca6
    Related-Bug: #1844446

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/684411
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=a1d89c7d6359a388f46dc3e71b16ebb11851f571
Submitter: Zuul
Branch: master

commit a1d89c7d6359a388f46dc3e71b16ebb11851f571
Author: Alex Schultz <email address hidden>
Date: Tue Sep 24 12:04:42 2019 -0600

    Improve ThreadPoolExecutor usage

    This change addresses a few different issues with out ThreadPoolExecutor
    usage.

    Previously we were inefficiently looping on the job results which was a
    blocking call and not asynchronous. This change switched the threadpool
    usage to use as_completed so we'll handle jobs as they complete rather
    than blocking until the the specific job we're looking at completes.

    Additionally this switches to use a with statement for the executor so
    the threads get cleaned up correctly when we're done with the block.

    See example comments:
    https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example

    The _inspect call was using a thread pool executor to run session calls
    but was essentially a serial executing function so it has been updated
    to remove the executor and just call the requests as needed.

    Change-Id: Ia7abec997f4e503e1a2db82e05d4fe6e8696defc
    Related-Bug: #1844446

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-common (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.opendev.org/682731
Reason: sounds like these tweakings aren't needed now, and they didn't seem to help.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (master)

Reviewed: https://review.opendev.org/685175
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=e57116d9c45c53ab5395025a32076c7c5bedbe45
Submitter: Zuul
Branch: master

commit e57116d9c45c53ab5395025a32076c7c5bedbe45
Author: Alex Schultz <email address hidden>
Date: Thu Sep 26 14:30:13 2019 -0600

    Implement threading locks around layers

    When we fetch layers, we shouldn't fetch the same layers multiple times.
    This change adds some locking basked on layer hashes to prevent multiple
    threads from trying to fetch the same layer at the same time.

    Change-Id: I477219b7dca1e6cfa02a278c55a0cc1a9833d007
    Related-Bug: #1844446

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/684786
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=577334235a49abb895c8236b142a874a7f076ee5
Submitter: Zuul
Branch: master

commit 577334235a49abb895c8236b142a874a7f076ee5
Author: Bogdan Dobrelya <email address hidden>
Date: Wed Sep 25 17:22:31 2019 +0200

    Always close src/dst sessions and raise to retry

    When something goes awry for the image upload process,
    always ensure the sessions are getting closed, something gets
    logged and whenever necessary, an exception is re-raised for the future
    retries with tenacity wrappers.

    Change-Id: Id0602b0f51d4376b94e0bf5ae4e7fb34f085ed6c
    Related-Bug: #1844446
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/686181

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/686756

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/686785

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/686786

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/686787

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/686789

Changed in tripleo:
status: Triaged → In Progress
assignee: nobody → Alex Schultz (alex-schultz)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/687288

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (master)

Reviewed: https://review.opendev.org/686181
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=60afc0eec44f698dd95d9d6ec80dad94a4b07329
Submitter: Zuul
Branch: master

commit 60afc0eec44f698dd95d9d6ec80dad94a4b07329
Author: Alex Schultz <email address hidden>
Date: Wed Oct 2 09:05:58 2019 -0600

    Make executor type dynamic

    When we run the tripleo-container-image-prepare script, it performs
    better under python2 when the process leverages a ProcessPoolExecutor.
    Rather than using threading, we should be using processes to handle the
    image upload processing. Currently when we're processing the images, we
    end up being single threaded due to the GIL when processing the data. By
    switching to the ProcessPoolExecutor, we eliminate the locking that is
    occuring during the data processing as it'll be handled in each process.

    Unfortunately, we cannot leverage the ProcessPoolExecutor when the same
    code is run under Mistral. In order to make the code work for both
    methods, we need to make the execution type dynamic. This change creates
    two types of lock objects that are used to determine what type of
    executor to ultimately use when processing the images for uploading.

    Additionally this change limits the number of concurrent image upload
    processes to 4 if using the ProcessPoolExecutor and caps the number of
    threads at a max of 8 based on (cpu count / 2)

    Change-Id: I60507eba9884a0660fe269da5ad27b0e57a70ca8
    Related-Bug: #1844446

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-common (master)

Change abandoned by Bogdan Dobrelya (bogdando) (<email address hidden>) on branch: master
Review: https://review.opendev.org/686756
Reason: We no longer execute the subject code via Mistral, so there is no more need to tweak Multi-Threading cooperation. Multi Process had shown much better results (expected for Python with its GIL problems)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/688201

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (stable/stein)

Reviewed: https://review.opendev.org/686785
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=0c3c2623de8418ad7470d2369c26f27ef651aefc
Submitter: Zuul
Branch: stable/stein

commit 0c3c2623de8418ad7470d2369c26f27ef651aefc
Author: Alex Schultz <email address hidden>
Date: Thu Sep 26 11:05:15 2019 -0600

    Randomize the container list for uploads

    When we work through the list of containers in an alphabetical fashion,
    we end up duplicating much of the layer fetching because it can occur at
    the same time. Things like cinder-api, cinder-backup, cinder-volume
    share many of the same layers. Since we don't ensure that we only do
    a single fetching of a layer hash durring the multiprocessing, we end
    up duplicating the fetches of layers. By randomizing the fetches, we
    reduce the likelihood that we'll be fetching the same family of service
    containers concurrently.

    Change-Id: Ifbcd55de52c9e2283203b1c6e2adeb266d43eca6
    Related-Bug: #1844446
    (cherry picked from commit 3adfefa13aebd7bd9e48c952df9107d51a557011)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/686786
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=c882901f992f3253936281ae510e07fdbbe3fb71
Submitter: Zuul
Branch: stable/stein

commit c882901f992f3253936281ae510e07fdbbe3fb71
Author: Alex Schultz <email address hidden>
Date: Tue Sep 24 12:04:42 2019 -0600

    Improve ThreadPoolExecutor usage

    This change addresses a few different issues with out ThreadPoolExecutor
    usage.

    Previously we were inefficiently looping on the job results which was a
    blocking call and not asynchronous. This change switched the threadpool
    usage to use as_completed so we'll handle jobs as they complete rather
    than blocking until the the specific job we're looking at completes.

    Additionally this switches to use a with statement for the executor so
    the threads get cleaned up correctly when we're done with the block.

    See example comments:
    https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example

    The _inspect call was using a thread pool executor to run session calls
    but was essentially a serial executing function so it has been updated
    to remove the executor and just call the requests as needed.

    Change-Id: Ia7abec997f4e503e1a2db82e05d4fe6e8696defc
    Related-Bug: #1844446
    (cherry picked from commit a1d89c7d6359a388f46dc3e71b16ebb11851f571)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/686787
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=55df4dec4efdc938e3fb685cd038aa6f9c099e65
Submitter: Zuul
Branch: stable/stein

commit 55df4dec4efdc938e3fb685cd038aa6f9c099e65
Author: Bogdan Dobrelya <email address hidden>
Date: Wed Sep 25 17:22:31 2019 +0200

    Always close src/dst sessions and raise to retry

    When something goes awry for the image upload process,
    always ensure the sessions are getting closed, something gets
    logged and whenever necessary, an exception is re-raised for the future
    retries with tenacity wrappers.

    Change-Id: Id0602b0f51d4376b94e0bf5ae4e7fb34f085ed6c
    Related-Bug: #1844446
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 577334235a49abb895c8236b142a874a7f076ee5)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/686789
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=74abbd1f0e6eb0e62ccade1cb08865ec84896393
Submitter: Zuul
Branch: stable/stein

commit 74abbd1f0e6eb0e62ccade1cb08865ec84896393
Author: Alex Schultz <email address hidden>
Date: Thu Sep 26 14:30:13 2019 -0600

    Implement threading locks around layers

    When we fetch layers, we shouldn't fetch the same layers multiple times.
    This change adds some locking basked on layer hashes to prevent multiple
    threads from trying to fetch the same layer at the same time.

    Change-Id: I477219b7dca1e6cfa02a278c55a0cc1a9833d007
    Related-Bug: #1844446
    (cherry picked from commit e57116d9c45c53ab5395025a32076c7c5bedbe45)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/688201
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=7994173916ec2b4fd5885944120ce3896b2bd413
Submitter: Zuul
Branch: stable/stein

commit 7994173916ec2b4fd5885944120ce3896b2bd413
Author: Alex Schultz <email address hidden>
Date: Wed Oct 2 09:05:58 2019 -0600

    Make executor type dynamic

    When we run the tripleo-container-image-prepare script, it performs
    better under python2 when the process leverages a ProcessPoolExecutor.
    Rather than using threading, we should be using processes to handle the
    image upload processing. Currently when we're processing the images, we
    end up being single threaded due to the GIL when processing the data. By
    switching to the ProcessPoolExecutor, we eliminate the locking that is
    occuring during the data processing as it'll be handled in each process.

    Unfortunately, we cannot leverage the ProcessPoolExecutor when the same
    code is run under Mistral. In order to make the code work for both
    methods, we need to make the execution type dynamic. This change creates
    two types of lock objects that are used to determine what type of
    executor to ultimately use when processing the images for uploading.

    Additionally this change limits the number of concurrent image upload
    processes to 4 if using the ProcessPoolExecutor and caps the number of
    threads at a max of 8 based on (cpu count / 2)

     Conflicts:
     tripleo_common/image/image_uploader.py

    Change-Id: I60507eba9884a0660fe269da5ad27b0e57a70ca8
    Related-Bug: #1844446
    (cherry picked from commit 60afc0eec44f698dd95d9d6ec80dad94a4b07329)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.opendev.org/683936
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=746d0700c7487f2e4e10da4c2a3060593127c61c
Submitter: Zuul
Branch: master

commit 746d0700c7487f2e4e10da4c2a3060593127c61c
Author: Bogdan Dobrelya <email address hidden>
Date: Mon Sep 23 14:01:32 2019 +0200

    Use cache for yum update in containers prepare

    For the releases later than Stein, whereas a better podman versions
    used, use the cached package manager contents.
    In order to enable the container-image-prepare caching, we need to
    specify a cache directory. Let's use
    /var/tmp/tripleo-container-image-prepare-cache as a persistent directory
    for our container modification process in CI.

    Related-Bug: #1844446
    Change-Id: I5ce878205ddb1854552937d99bb60a421091bd54
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/690056

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/690061

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to python-tripleoclient (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/690389

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-common (master)

Change abandoned by Bogdan Dobrelya (bogdando) (<email address hidden>) on branch: master
Review: https://review.opendev.org/690061

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Bogdan Dobrelya (bogdando) (<email address hidden>) on branch: master
Review: https://review.opendev.org/690056
Reason: this doesn't work

Changed in tripleo:
milestone: train-rc1 → ussuri-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Bogdan Dobrelya (bogdando) (<email address hidden>) on branch: master
Review: https://review.opendev.org/687288
Reason: this doesn't work, I ran out of ideas

Revision history for this message
wes hayutin (weshayutin) wrote :

Closing this out.. let's open new bugs on more specific issues

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (master)

Reviewed: https://review.opendev.org/687288
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=46f81298948865a2c15c4a5035e95a0c77adb5d5
Submitter: Zuul
Branch: master

commit 46f81298948865a2c15c4a5035e95a0c77adb5d5
Author: Bogdan Dobrelya <email address hidden>
Date: Tue Oct 15 11:33:16 2019 +0200

    Make upload workers faster on processing layers

    Make upload workers processing image layers only once (as the best
    effort). This also reworks and simplifies locks management for
    individual tasks now managed for the PythonImageUploader class
    namespace only.

    When fetching source layer, cross-link it for the target
    local image, whenever that source is already exists. When pushing a
    layer to a target registry, do not repeat transfering the same data,
    if already pushed earlier for another image.

    The 1st time a layer gets uploaded/fetched for an image, that image and
    its known path (local or remote) becomes a reference for future
    cross-referencing by other images.

    Store such information about already processed layers in global view
    shared for all workers to speed-up data transfering jobs they execute.

    Having that global view, uploading the 1st image in the tasks list as a
    separate (and non-concurrent) job becomes redundant and now will be
    executed concurently with other images.

    Based on the dynamically picked multi-workers mode, provide the global
    view as a graf with its MP/MT state synchronization as the following:

    * use globally shared locking info also containing global layers view
      for MP-workers. With the shared global view state we can no longer
      use local locking objects individual for each task.
    * if cannot use multi-process workers, like when executing it via
      Mistral by monkey patched eventlet greenthreads, choose threadinglock
      and multi-threads-safe standard dictionary in the shared class
      namespace to store the global view there
    * if it can do MP, pick processlock also containing a safe from data
      races Manager().dict() as the global view shared among cooperating OS
      processes.
    * use that global view in a transparent fashion, provided by a special
      classmethod proxying access to the internal state shared for workers.

    Ultimately, all that optimizes:

    * completion time
    * re-fetching of the already processed layers
    * local deduplication of layers
    * the amount of outbound HTTP requests to registries
    * if-layer-exists and other internal logic check executed against the
      in-memory cache firstly.

    As layers locking and unlocking becomes a popular action, reduce the
    noise of the debug messages it produces.

    Closes-bug: #1847225
    Related-bug: #1844446

    Change-Id: Ie5ef4045b7e22c06551e886f9f9b6f22c8d4bd21
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/694963

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/694965

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (stable/train)

Reviewed: https://review.opendev.org/694963
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=a5768f0d75638325230a673427a6f554cca05d47
Submitter: Zuul
Branch: stable/train

commit a5768f0d75638325230a673427a6f554cca05d47
Author: Bogdan Dobrelya <email address hidden>
Date: Tue Oct 15 11:33:16 2019 +0200

    Make upload workers faster on processing layers

    Make upload workers processing image layers only once (as the best
    effort). This also reworks and simplifies locks management for
    individual tasks now managed for the PythonImageUploader class
    namespace only.

    When fetching source layer, cross-link it for the target
    local image, whenever that source is already exists. When pushing a
    layer to a target registry, do not repeat transfering the same data,
    if already pushed earlier for another image.

    The 1st time a layer gets uploaded/fetched for an image, that image and
    its known path (local or remote) becomes a reference for future
    cross-referencing by other images.

    Store such information about already processed layers in global view
    shared for all workers to speed-up data transfering jobs they execute.

    Having that global view, uploading the 1st image in the tasks list as a
    separate (and non-concurrent) job becomes redundant and now will be
    executed concurently with other images.

    Based on the dynamically picked multi-workers mode, provide the global
    view as a graf with its MP/MT state synchronization as the following:

    * use globally shared locking info also containing global layers view
      for MP-workers. With the shared global view state we can no longer
      use local locking objects individual for each task.
    * if cannot use multi-process workers, like when executing it via
      Mistral by monkey patched eventlet greenthreads, choose threadinglock
      and multi-threads-safe standard dictionary in the shared class
      namespace to store the global view there
    * if it can do MP, pick processlock also containing a safe from data
      races Manager().dict() as the global view shared among cooperating OS
      processes.
    * use that global view in a transparent fashion, provided by a special
      classmethod proxying access to the internal state shared for workers.

    Ultimately, all that optimizes:

    * completion time
    * re-fetching of the already processed layers
    * local deduplication of layers
    * the amount of outbound HTTP requests to registries
    * if-layer-exists and other internal logic check executed against the
      in-memory cache firstly.

    As layers locking and unlocking becomes a popular action, reduce the
    noise of the debug messages it produces.

    Closes-bug: #1847225
    Related-bug: #1844446

    Change-Id: Ie5ef4045b7e22c06551e886f9f9b6f22c8d4bd21
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 46f81298948865a2c15c4a5035e95a0c77adb5d5)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (stable/stein)

Reviewed: https://review.opendev.org/694965
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=7b46f4d70b3e36b9d5b4c6f8eaca439404f7480e
Submitter: Zuul
Branch: stable/stein

commit 7b46f4d70b3e36b9d5b4c6f8eaca439404f7480e
Author: Bogdan Dobrelya <email address hidden>
Date: Tue Oct 15 11:33:16 2019 +0200

    Make upload workers faster on processing layers

    Make upload workers processing image layers only once (as the best
    effort). This also reworks and simplifies locks management for
    individual tasks now managed for the PythonImageUploader class
    namespace only.

    When fetching source layer, cross-link it for the target
    local image, whenever that source is already exists. When pushing a
    layer to a target registry, do not repeat transfering the same data,
    if already pushed earlier for another image.

    The 1st time a layer gets uploaded/fetched for an image, that image and
    its known path (local or remote) becomes a reference for future
    cross-referencing by other images.

    Store such information about already processed layers in global view
    shared for all workers to speed-up data transfering jobs they execute.

    Having that global view, uploading the 1st image in the tasks list as a
    separate (and non-concurrent) job becomes redundant and now will be
    executed concurently with other images.

    Based on the dynamically picked multi-workers mode, provide the global
    view as a graf with its MP/MT state synchronization as the following:

    * use globally shared locking info also containing global layers view
      for MP-workers. With the shared global view state we can no longer
      use local locking objects individual for each task.
    * if cannot use multi-process workers, like when executing it via
      Mistral by monkey patched eventlet greenthreads, choose threadinglock
      and multi-threads-safe standard dictionary in the shared class
      namespace to store the global view there
    * if it can do MP, pick processlock also containing a safe from data
      races Manager().dict() as the global view shared among cooperating OS
      processes.
    * use that global view in a transparent fashion, provided by a special
      classmethod proxying access to the internal state shared for workers.

    Ultimately, all that optimizes:

    * completion time
    * re-fetching of the already processed layers
    * local deduplication of layers
    * the amount of outbound HTTP requests to registries
    * if-layer-exists and other internal logic check executed against the
      in-memory cache firstly.

    As layers locking and unlocking becomes a popular action, reduce the
    noise of the debug messages it produces.

    Closes-bug: #1847225
    Related-bug: #1844446

    Change-Id: Ie5ef4045b7e22c06551e886f9f9b6f22c8d4bd21
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 46f81298948865a2c15c4a5035e95a0c77adb5d5)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to python-tripleoclient (master)

Reviewed: https://review.opendev.org/690389
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=3ca472c24fde83bbb48cab33f12af63d815c992c
Submitter: Zuul
Branch: master

commit 3ca472c24fde83bbb48cab33f12af63d815c992c
Author: Alex Schultz <email address hidden>
Date: Tue Oct 22 13:41:05 2019 -0600

    Switch to use process executor

    We switched the tripleo-container-image-prepare script in
    tripleo-common, but a user will likely run the 'openstack tripleo
    container image prepare' command. Currently it uses the default which is
    the threading executor.

    Poke tripleo common lower/requirements as well.

    Change-Id: Ifc5b46633a1f9fc9378eaa17170f6664d566c3c4
    Related-Bug: #1844446

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to python-tripleoclient (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/695871

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to python-tripleoclient (stable/train)

Reviewed: https://review.opendev.org/695871
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=51d589b766f9854e8e23bf40ed1b1b44f3b0517b
Submitter: Zuul
Branch: stable/train

commit 51d589b766f9854e8e23bf40ed1b1b44f3b0517b
Author: Alex Schultz <email address hidden>
Date: Tue Oct 22 13:41:05 2019 -0600

    Switch to use process executor

    We switched the tripleo-container-image-prepare script in
    tripleo-common, but a user will likely run the 'openstack tripleo
    container image prepare' command. Currently it uses the default which is
    the threading executor.

    Poke tripleo common lower/requirements as well.

    Change-Id: Ifc5b46633a1f9fc9378eaa17170f6664d566c3c4
    Related-Bug: #1844446

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to python-tripleoclient (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/701570

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to python-tripleoclient (stable/stein)

Reviewed: https://review.opendev.org/701570
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=9206ff9a853ae62aee29750d0aaf88f1b1dd1d04
Submitter: Zuul
Branch: stable/stein

commit 9206ff9a853ae62aee29750d0aaf88f1b1dd1d04
Author: Alex Schultz <email address hidden>
Date: Tue Oct 22 13:41:05 2019 -0600

    Switch to use process executor

    We switched the tripleo-container-image-prepare script in
    tripleo-common, but a user will likely run the 'openstack tripleo
    container image prepare' command. Currently it uses the default which is
    the threading executor.

    Poke tripleo common lower/requirements as well.

    Change-Id: Ifc5b46633a1f9fc9378eaa17170f6664d566c3c4
    Related-Bug: #1844446

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.