tripleo upgrade jobs timeout on stable/ocata

Bug #1702955 reported by Emilien Macchi
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
mathieu bultel

Bug Description

Alex and I noticed that TripleO upgrade jobs timeout quite often in stable/ocata (and also probably on master).
We found out that it's when running on RAX public cloud (iad and ord AZs).

Here are some numbers:
https://www.diffchecker.com/14g29Xoh

We can see that it's not only the steps where packages are installed (in that case we could say it's networking related) but also when doing basic workflows such as setting up SSH keys (see SshHostPubKey). We think it's probably an overcommittion on the hypervisor that would create bad performances on the VMs.

Any help from RAX folks is welcome to figure out if we can improve the situation.

Tags: ci
Revision history for this message
Emilien Macchi (emilienm) wrote :
Revision history for this message
Alex Schultz (alex-schultz) wrote :
Changed in tripleo:
milestone: pike-3 → pike-rc1
summary: - tripleo upgrade jobs timeout when running in RAX cloud
+ tripleo upgrade jobs timeout on stable/ocata
tags: added: alert
Changed in tripleo:
assignee: nobody → Emilien Macchi (emilienm)
Changed in tripleo:
assignee: Emilien Macchi (emilienm) → nobody
Changed in tripleo:
assignee: nobody → mathieu bultel (mat-bultel)
Changed in tripleo:
milestone: pike-rc1 → pike-rc2
tags: removed: alert
Changed in tripleo:
milestone: pike-rc2 → queens-1
Revision history for this message
Tom Barron (tpb) wrote :
Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

We need to use tripleo upgrade role for upgrades, then we can continue to work on this.

Revision history for this message
Tom Barron (tpb) wrote :

Is there an ETA for tripleo upgrade role here, if that blocks the fix for this one? I'm asking b/c I have an important backport for manila ( https://bugs.launchpad.net/tripleo/+bug/1712842 ) that is stuck in stable/ocata on timeout failures on gate-tripleo-ci-centos-7-multinode-upgrades job.

Revision history for this message
Tom Barron (tpb) wrote :

Got lucky on a recheck of fix for LP:1712842 so it's no longer blocked.

Revision history for this message
wes hayutin (weshayutin) wrote :

starting to migrate these jobs to rdo sf https://review.rdoproject.org/r/#/c/9831/

Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
Changed in tripleo:
milestone: queens-3 → queens-rc1
Revision history for this message
Alex Schultz (alex-schultz) wrote :

So we dropped the upgrade jobs from upstream infra and moved them to rdo cloud. Closing this bug as it's no longer an issue.

Changed in tripleo:
status: Triaged → Invalid
status: Invalid → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.