CI/oooq: some Ansible tasks run multiple times (causing job timeouts)

Bug #1736634 reported by Emilien Macchi on 2017-12-06
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
wes hayutin
tags: added: alert quickstart
summary: - CI/oooq: some Ansible tasks run multiple times
+ CI/oooq: some Ansible tasks run multiple times (causing job timeouts)
wes hayutin (weshayutin) wrote :
tags: removed: alert ci
Changed in tripleo:
status: Triaged → Incomplete
wes hayutin (weshayutin) wrote :

We may want to open a new bug on tripleo-ci-centos-7-ovb-ha-oooq timing out and work it from that angle.

Emilien Macchi (emilienm) wrote :

Wes, we had this discussion 10 times. "timeout isn't a bug report. This report explain why a timeouts happen.

Changed in tripleo:
status: Incomplete → Triaged
Emilien Macchi (emilienm) wrote :

Close this bug if you think some roles have to be executed multiple times, but I'll need to understand why because right now I have no idea why we do that. Thanks!

wes hayutin (weshayutin) wrote :

OK.. we can work it from here.
The root cause provided afaict is not correct. modify-image is a general purpose role that can perform various functions via libvirt on images. It is invoked with various scripts and variables to perform tasks, and why you see it called multiple times.

The original commit is https://github.com/openstack/tripleo-quickstart-extras/commit/f6e1500631e7f1201ee2574149656e39474985ea

Some data points should include the same job run in rdo-cloud. We can understand more completely the run time and look for regressions by analyzing the build time trend in [1].

I also think we have some graphana data we can use to help us here. [2]

[1] https://review.rdoproject.org/jenkins/job/gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master-nv/buildTimeTrend
[2] https://graphite.tripleo.org/dashboard/

Changed in tripleo:
assignee: nobody → wes hayutin (weshayutin)

Modify image role runs twice actually and twice is skipped. When it's skipped it doesn't take any time.
First time it runs when we install on image required repos and run yum update there to update the image to last packages from these repos. (former tripleo.sh --repo-setup task)
Second time we install there repo with patched project and install this project on image if needed.

These tasks were always executed in this way for last years (at least for last 2 years), including tripleo.sh jobs.

We can think about optimizing here and installing both repos and patched project together - it should be done without breaking logic of oooq though. But it's not bug or something new. I could save us about 5 minutes. (one modify-image task)

So 10 minutes is not a "waste", it's actual job run.

I'm not sure it's so critical for jobs timeouts. For knowing better we should finally merge these patches [1] and start looking what takes time in jobs.

[1] https://review.openstack.org/#/c/479882/
https://review.openstack.org/#/c/480121/

wes hayutin (weshayutin) on 2017-12-06
tags: added: alert
Emilien Macchi (emilienm) wrote :

removing alert, we don't have much timeouts today, no need to pollute the IRC channel with it. The bug is triaged and some discussion is happening.

tags: removed: alert
Changed in tripleo:
milestone: queens-3 → queens-rc1
Alex Schultz (alex-schultz) wrote :

Sounds like this wasn't actually the issue. Closing this bug out for now. Feel free to reopen with new details.

Changed in tripleo:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers