tripleo upgrade jobs timeout on stable/ocata

Bug #1702955 reported by Emilien Macchi on 2017-07-07
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
tripleo
Critical
mathieu bultel

Bug Description

Alex and I noticed that TripleO upgrade jobs timeout quite often in stable/ocata (and also probably on master).
We found out that it's when running on RAX public cloud (iad and ord AZs).

Here are some numbers:
https://www.diffchecker.com/14g29Xoh

We can see that it's not only the steps where packages are installed (in that case we could say it's networking related) but also when doing basic workflows such as setting up SSH keys (see SshHostPubKey). We think it's probably an overcommittion on the hypervisor that would create bad performances on the VMs.

Any help from RAX folks is welcome to figure out if we can improve the situation.

Tags: ci Edit Tag help
Changed in tripleo:
milestone: pike-3 → pike-rc1
summary: - tripleo upgrade jobs timeout when running in RAX cloud
+ tripleo upgrade jobs timeout on stable/ocata
tags: added: alert
Changed in tripleo:
assignee: nobody → Emilien Macchi (emilienm)
Changed in tripleo:
assignee: Emilien Macchi (emilienm) → nobody
Changed in tripleo:
assignee: nobody → mathieu bultel (mat-bultel)
Changed in tripleo:
milestone: pike-rc1 → pike-rc2
tags: removed: alert
Changed in tripleo:
milestone: pike-rc2 → queens-1

We need to use tripleo upgrade role for upgrades, then we can continue to work on this.

Tom Barron (tpb) wrote :

Is there an ETA for tripleo upgrade role here, if that blocks the fix for this one? I'm asking b/c I have an important backport for manila ( https://bugs.launchpad.net/tripleo/+bug/1712842 ) that is stuck in stable/ocata on timeout failures on gate-tripleo-ci-centos-7-multinode-upgrades job.

Tom Barron (tpb) wrote :

Got lucky on a recheck of fix for LP:1712842 so it's no longer blocked.

wes hayutin (weshayutin) wrote :

starting to migrate these jobs to rdo sf https://review.rdoproject.org/r/#/c/9831/

Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
Changed in tripleo:
milestone: queens-3 → queens-rc1
Alex Schultz (alex-schultz) wrote :

So we dropped the upgrade jobs from upstream infra and moved them to rdo cloud. Closing this bug as it's no longer an issue.

Changed in tripleo:
status: Triaged → Invalid
status: Invalid → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers