CI: undercloud takes long time which causes job fail with timeout

Bug #1799895 reported by Sagi (Sergey) Shnaidman on 2018-10-25
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Critical
Unassigned
Bogdan Dobrelya (bogdando) wrote :

A known issue https://bugs.launchpad.net/bugs/1797525, moved critical prio and alert there

tags: removed: alert
Bogdan Dobrelya (bogdando) wrote :

4:58:58 PM GMT+2 - sshnaidm|ruck: bogdando, I think it's different reason, as I see in logs the yum update takes sometimes up to 10 minutes for container
4:59:20 PM GMT+2 - sshnaidm|ruck: bogdando, let's separate bugs, but leave comment in last one that it could be related

tags: added: alert
Bogdan Dobrelya (bogdando) wrote :

it is possible a dup of bug 1797525

As I can see from the logs:
the most expensive operations are "yum update"s in every container, they could take from 1 to 10(!) minutes for each container. Downloading updates is not the issue, it takes no longer than 2 seconds, we use proxy properly.
So what remains is actual installation and removing yum cache. I'd like to add timestamps to "yum update" script to be sure which operations are expensive in the script. But it looks like sometimes it take a long time to install packages. Maybe it could be a disk issue(?)

Bogdan Dobrelya (bogdando) wrote :

Se my previous attempt for adding **individual** caches to containers update https://review.openstack.org/#/c/575742/4/scripts/container-update.py@146

but I'm not sure it'd be good idea to use a common cache with concurrent workers doing updates there. Doing updates of containers serialized would allow having a shared for all cache... But updating 50+ containers one by one, even with shared cache, might take even more time in the end.

Bogdan Dobrelya (bogdando) wrote :

Another idea, is to rebuild all kolla containers just in the job, from the tags what we have after original images prepared. Then we could only install yum updates for the base layer perhaps?

Bogdan Dobrelya (bogdando) wrote :

Shared yum cache for individual containers' update-retry, with improved failback for retries should help as well (https://review.openstack.org/575742)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.openstack.org/613577
Reason: https://review.openstack.org/613640

Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
assignee: Rafael Folco (rafaelfolco) → nobody
tags: removed: alert

Reviewed: https://review.openstack.org/613575
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=c073cdb80d8351606accc722c7d18eac1d3dabca
Submitter: Zuul
Branch: master

commit c073cdb80d8351606accc722c7d18eac1d3dabca
Author: Emilien Macchi <email address hidden>
Date: Fri Oct 26 09:02:09 2018 -0400

    undercloud/hieradata: configure nova-scheduler workers too

    The new parameter in puppet-nova will allow us to reduce the number of
    workers to 1 in our CI and eventually reduce the timeouts.

    Change-Id: I9c2ea60960f1652f62f7f05879ebebddf3f8664e
    Related-Bug: #1799895

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: master
Review: https://review.openstack.org/575742

Changed in tripleo:
milestone: stein-2 → stein-3
Changed in tripleo:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers