Various periodic promotion jobs failing - NODE_FAILURE or POST_FAILURE

Bug #1995851 reported by Marios Andreou
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Invalid
Low
Unassigned

Bug Description

At [1] periodic-tripleo-ci-build-containers-ubi-8-push-train NODE_FAILURE and at [2] periodic-tripleo-centos-8-wallaby-component-glance-promote-consistent-to-component-ci-testing NODE_FAILURE fail with "NODE_FAILURE".

At [3] periodic-tripleo-centos-8-buildimage-ironic-python-agent-train and at [4] periodic-tripleo-centos-8-train-promote-promoted-components-to-tripleo-ci-testing fails with POST_FAILURE.

In all cases there are no logs so not sure what/where the problem is.

So far only seen in C8 jobs so that may be a clue.

It may also be transient, but it is blocking the train integration line ([4] and then [1][2] when I tried manually rekicking train) hence this bug.

[1] https://review.rdoproject.org/zuul/build/f84b2b8149f540da8745658ade34d3bf
[2] https://review.rdoproject.org/zuul/build/085003897faf4cb8bd3a168e57617223
[3] https://review.rdoproject.org/zuul/build/3a7bd0f0f3cc4641b0b75fc0ac65b8a1
[4] https://review.rdoproject.org/zuul/buildset/fd4b023bde1648629572902bf7658075

Revision history for this message
Marios Andreou (marios-b) wrote (last edit ):

seen this a few times today so I filed it.

It may be a transient issue but really hard to tell what the problem is because of no logs.

Is it capacity issue? I saw this today when I tried to manually rekick the stable/train periodic line.

However look at [1] it is *not* an uncommon issue and also definitely not limited to centos-8 we see lots of c9 jobs in that list [1] too.

[1] https://review.rdoproject.org/zuul/builds?result=NODE_FAILURE&result=POST_FAILURE&skip=0

Revision history for this message
Rafael Castillo (rafaelcastillo) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

still seeing this at least once today 8th November there [1] for train periodic (hitting two jobs there)

        * periodic-tripleo-ci-centos-8-scenario003-standalone-train NODE_FAILURE
        * periodic-tripleo-ci-centos-8-scenario010-kvm-standalone-train NODE_FAILURE

[1] https://review.rdoproject.org/zuul/buildset/286e15de335d4e3bafcb6f2e272bea20

Revision history for this message
Rafael Castillo (rafaelcastillo) wrote :

Has kept going on. Blocking centos 8 wallaby line pretty hard [1].

[1] https://review.rdoproject.org/zuul/build/e80ae3f2cdc2482eab3506d029bf2fcd

Revision history for this message
Marios Andreou (marios-b) wrote :

moar on the 9th - example at [1] which killed the entire train periodic line again

[1] https://review.rdoproject.org/zuul/buildset/bcdc7981065248b3aa006df05c072e24

Revision history for this message
daniel.pawlik (daniel-pawlik) wrote (last edit ):

It would be good to move some tasks from the "pre-run" to "run" part in job definition to get a reason why it fails.

Revision history for this message
Marios Andreou (marios-b) wrote :

looking better today, we can only see one POST_FAILURE there [1]

periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-baremetal-zed openstack/tripleo-ci master openstack-component-baremetal 2 hrs 54 mins 25 secs 2022-11-10 00:08:47 POST_FAILURE

[1] https://review.rdoproject.org/zuul/builds?result=NODE_FAILURE&result=POST_FAILURE&skip=0

Revision history for this message
Marios Andreou (marios-b) wrote :

Followup from comment #7 On friday 11 we had more than 20 examples at [1] but then very few on 12th and 13th.

Let's watch this for a few more days before closing out.

[1] https://review.rdoproject.org/zuul/builds?result=NODE_FAILURE&result=POST_FAILURE&skip=0

Revision history for this message
Amol Kahat (amolkahat) wrote :

In last two days no new node failures observed.

Changed in tripleo:
status: Triaged → Invalid
importance: Critical → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.