tracker bug: post_failures in upstream jobs. Storage issues atm.

Bug #1915415 reported by wes hayutin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

jobs die here:

2021-02-10 10:49:20.634194 | TASK [upload-logs-swift : Upload logs to swift]

https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ebc/774696/1/gate/tripleo-ci-centos-7-containers-multinode/ebc4ab1/job-output.txt

https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_230/774072/1/gate/openstack-tox-linters/2304e12/job-output.txt

working w/ infra to resolve.

[07:55:31] <arxcruz|ruck> weshay|ruck: o/
[07:56:11] <weshay|ruck> zbr, fungi quick question... we have some jobs failing in the gate w/ post_failure.. the upload logs to swift is timing out. Is that something we can fix on our own by reducing the log size?
[07:56:24] <weshay|ruck> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ebc/774696/1/gate/tripleo-ci-centos-7-containers-multinode/ebc4ab1/job-output.txt
[07:56:27] <weshay|ruck> for example
[07:56:49] <weshay|ruck> or.. is there something else going on?
[07:57:07] <zbr> weshay|ruck: did the log size increase recently? last time when I seen this was caused by some switft issues, not log size.
[07:57:12] <fungi> weshay|ruck: probably, or by increasing the post-run timeout, but i'll see if the output gives me any other suspicions
[07:57:18] <weshay|ruck> zbr, checking
[07:58:21] <weshay|ruck> zbr I do track the log size.. but not a historical trend
[07:58:23] <weshay|ruck> but I should do that
[08:00:00] <weshay|ruck> ya.. noticing post_failures in a lot of non-tripleo jobs as well
[08:01:36] <zbr> weshay|ruck: last execution on collect logs looks like ~9.5min, which a lot but far below 30min limit i know for post.
[08:01:53] <fungi> there may be a problem with one of our swift donors, i'm trying to correlate
[08:05:22] <weshay|ruck> tox example https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_230/774072/1/gate/openstack-tox-linters/2304e12/job-output.txt
[08:05:28] <weshay|ruck> fungi++
[08:05:29] <weshay|ruck> thanks
[08:06:57] <fungi> yeah, i'm digging in the executor debug log to see if there were any obvious errors emitted during that part of the log upload
[08:08:44] <fungi> that's actually where the console log normally ends because that's when the console log is uploaded

Revision history for this message
wes hayutin (weshayutin) wrote :

Thanks Infra!

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.