tripleo

tracker bug: post_failures in upstream jobs. Storage issues atm.

Bug #1915415 reported by wes hayutin on 2021-02-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Fix Released	Critical	Unassigned	tripleo wallaby-3

Bug Description

jobs die here:

2021-02-10 10:49:20.634194 | TASK [upload-logs-swift : Upload logs to swift]

https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ebc/774696/1/gate/tripleo-ci-centos-7-containers-multinode/ebc4ab1/job-output.txt

https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_230/774072/1/gate/openstack-tox-linters/2304e12/job-output.txt

working w/ infra to resolve.

[07:55:31] <arxcruz|ruck> weshay|ruck: o/
[07:56:11] <weshay|ruck> zbr, fungi quick question... we have some jobs failing in the gate w/ post_failure.. the upload logs to swift is timing out. Is that something we can fix on our own by reducing the log size?
[07:56:24] <weshay|ruck> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ebc/774696/1/gate/tripleo-ci-centos-7-containers-multinode/ebc4ab1/job-output.txt
[07:56:27] <weshay|ruck> for example
[07:56:49] <weshay|ruck> or.. is there something else going on?
[07:57:07] <zbr> weshay|ruck: did the log size increase recently? last time when I seen this was caused by some switft issues, not log size.
[07:57:12] <fungi> weshay|ruck: probably, or by increasing the post-run timeout, but i'll see if the output gives me any other suspicions
[07:57:18] <weshay|ruck> zbr, checking
[07:58:21] <weshay|ruck> zbr I do track the log size.. but not a historical trend
[07:58:23] <weshay|ruck> but I should do that
[08:00:00] <weshay|ruck> ya.. noticing post_failures in a lot of non-tripleo jobs as well
[08:01:36] <zbr> weshay|ruck: last execution on collect logs looks like ~9.5min, which a lot but far below 30min limit i know for post.
[08:01:53] <fungi> there may be a problem with one of our swift donors, i'm trying to correlate
[08:05:22] <weshay|ruck> tox example https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_230/774072/1/gate/openstack-tox-linters/2304e12/job-output.txt
[08:05:28] <weshay|ruck> fungi++
[08:05:29] <weshay|ruck> thanks
[08:06:57] <fungi> yeah, i'm digging in the executor debug log to see if there were any obvious errors emitted during that part of the log upload
[08:08:44] <fungi> that's actually where the console log normally ends because that's when the console log is uploaded

Tags:

Revision history for this message

wes hayutin (weshayutin) wrote on 2021-02-12:

Thanks Infra!

Changed in tripleo:
status:	Triaged → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.