centos 9 wallaby periodic jobs failing container image prepare "read |0: i/o timeout"

Bug #1964129 reported by Marios Andreou
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

At [1][2][3][4] jobs in the centos 9 wallaby integration line are failing during container image prepare, trace looks like:

  FATAL | Capture the update repos and installed rpms | localhost | error={"changed": true, "cmd": "buildah run 192.168.24.1-working-container yum list installed > /var/log/container_info.log\\n", "delta": "0:00:26.144137", "end": "2022-03-04 15:29:01.827487", "msg": "non-zero return code", "rc": 1, "start": "2022-03-04 15:28:35.683350", "stderr": "time=\\"2022-03-04T15:28:50Z\\" level=error msg=\\"did not get container create message from subprocess: read |0: i/o timeout\\"\\nerror running container: write containercreatepipe: broken pipe\\nerror while running runtime: exit status 1", "stderr_lines": ["time=\\"2022-03-04T15:28:50Z\\" level=error msg=\\"did not get container create message from subprocess: read |0: i/o timeout\\"", "error running container: write containercreatepipe: broken pipe", "error while running runtime: exit status 1"], "stdout": "", "stdout_lines": []}\n\x1b[Ke30=\x1b[4D\x1b[K\x1b[KeyJ1dWlkIjogIjdiYWQxMjI0LWI1ZmYtNDgwZS1iMTkyLTJjNTk5MWM5OWMxOSJ9\x1b[64D\x1b[K2022-03-04 15:29:01.855610 | fa163e35-531f-de0e-de63-000000000043 | TIMING | tripleo-modify-image : Capture the update repos and installed rpms | localhost | 0:01:24.806600 | 26.88s\n\x1b[Ke30=\x1b[4D\x1b[K\x1b[KeyJ1dWlkIjogIjk3MmY0ODI1LTUyNjEtNGExOC1hYzEyLTFjOTIyZmRmMzJiMyJ9\x1b[64D\x1b[K\nPLAY RECAP *********************************************************************\n\x1b[Ke30=\x1b[4D\x1b[K\x1b[KeyJ1dWlkIjogImIzOGM1ZTYzLTMxM2EtNDAwNS1hNGIwLWQ5MWM4NGU4ZjExYiJ9\x1b[64D\x1b[Klocalhost : ok=17 changed=8 unreachable=0 failed=1 skipped=13 rescued=0 ignored=0 \n\x1b[Ke30=\x1b[4D\x1b[K\x1b[KeyJ1dWlkIjogImQ2M2EyM2ZiLTBjODUtNGZjMC1hNWE5LWZhMWIwZWYzNWJkOSJ9\x1b[64D\x1b[K\n\x1b[Ke30=\x1b[4D\x1b[K\x1b[KeyJ1dWlkIjogIjdmNTJjOGIwLTg3MGUtNGI2Ni1hOWEzLWFkNmFjYzYwZWEwZiJ9\x1b[64D\x1b[K2022-03-04 15:29:01.858737 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\x1b[Ke30=\x1b[4D\x1b[K\x1b[KeyJ1dWlkIjogIjgyODEyOGZkLTRiZjYtNDRiNi1hOTc3LWJjOTVhOTVlNzdmMyJ9\x1b[64D\x1b[K2022-03-04 15:29:01.859073 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 31

This is a promotion blocker hitting multiple different jobs.

[1] https://logserver.rdoproject.org/openstack-periodic-integration-stable1-cs9/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-standalone-wallaby/7f66f67/logs/undercloud/var/log/tripleo-container-image-prepare.log.txt.gz
[2] https://logserver.rdoproject.org/openstack-periodic-integration-stable1-cs9/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-standalone-wallaby/989e1ff/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz
[3] https://logserver.rdoproject.org/c3/c3e616831cd2836019dc418b0915f1e669269326/openstack-periodic-integration-stable1-cs9/periodic-tripleo-ci-centos-9-scenario003-standalone-wallaby/2f264a3/logs/undercloud/var/log/tripleo-container-image-prepare.log.txt.gz
[4] https://logserver.rdoproject.org/openstack-periodic-integration-stable1-cs9/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-scenario001-standalone-wallaby/ea277f2/logs/undercloud/var/log/tripleo-container-image-prepare.log.txt.gz

Revision history for this message
Ronelle Landy (rlandy) wrote :

We are not seeing this error anymore.
Tests are failing in tempest now

Revision history for this message
Ronelle Landy (rlandy) wrote :
Revision history for this message
Ronelle Landy (rlandy) wrote :

Closing this out to concentrate on tempest failures

Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
Marios Andreou (marios-b) wrote :

per comment #3 - fyi the latest fail in the description was from yesterday 08th March [1]

d984-000000000043 | FATAL | Capture the update repos and installed rpms | localhost | error={"changed": true, "cmd": "buildah run 192.168.24.1-working-container-3 yum list installed > /var/log/container_info.log\\n", "delta": "0:00:34.862180", "end": "2022-03-08 13:31:28.390859", "msg": "non-zero return code", "rc": 1, "start": "2022-03-08 13:30:53.528679", "stderr": "time=\\"2022-03-08T13:31:17Z\\" level=error msg=\\"did not get container create message from subprocess: read |0: i/o timeout\\"\\nerror running container: write containercreatepipe: broken pipe\\nerror while running runtime: exit status 1", "stderr_lines": ["time=\\"2022-03-08T13:31:17Z\\" level=error msg=\\"did not get container create message from subprocess: read |0: i/o timeout\\"", "error running container: write containercreatepipe: broken pipe", "error while running runtime: exit status 1"], "stdout": "", "stdout_lines": []}\n\x1b[Ke30=\x1b[4D\x1b[K\x1b[KeyJ1dWlkIjogImNkYTFkZWUzLTdiZGYtNDFjNy1iNzgzLWU1YzY2ZTFhYzNmMyJ9\x1b[64D\x1b[K2022-03-08 13:31:28.423958 | fa163e4c-b1a2-1faf-d984-000000000043 | TIMING | tripleo-modify-image : Capture the update repos and installed rpms | localhost | 0:01:17.489513 | 37.72s\n\x1b[Ke30=\x1b[4D\x1b[K\x1b[KeyJ1dWlkIjogImY5YzU5MGIxLTA0NTQtNDA1ZS1iYWYxLTQ5ZGY2YTI2ODQ0NSJ9\x1b[64D\x1b[K\nPLAY RECAP

[1] https://logserver.rdoproject.org/openstack-periodic-integration-stable1-cs9/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-standalone-wallaby/7f66f67/logs/undercloud/var/log/tripleo-container-image-prepare.log.txt.gz

Revision history for this message
Marios Andreou (marios-b) wrote (last edit ):

ok having reviewed the latest buildset [1] I see it is all exclusively tempest related (https://bugs.launchpad.net/tripleo/+bug/1964131) so lets leave it closed for now we can re-open if we see it again.

 It *was* hitting multiple different jobs.

[1] https://review.rdoproject.org/zuul/buildset/3ae5ce4e79784fab81260584860766a2

Revision history for this message
Marios Andreou (marios-b) wrote :

moving this back to in progress

was seen again in the latest periodic integration line run from yesterday - buildset at [1]

so it is an inconsistent issue.

failing job at [2] periodic-tripleo-ci-centos-9-standalone-full-tempest-scenario-wallaby:

 [K2022-03-10 00:24:43.119167 | fa163e34-8197-f334-8ca8-000000000043 | FATAL | Capture the update repos and installed rpms | localhost | error={"changed": true, "cmd": "buildah run 192.168.24.1-working-container-1 yum list installed > /var/log/container_info.log\\n", "delta": "0:00:38.547115", "end": "2022-03-10 00:24:43.093634", "msg": "non-zero return code", "rc": 1, "start": "2022-03-10 00:24:04.546519", "stderr": "time=\\"2022-03-10T00:24:38Z\\" level=error msg=\\"did not get container create message from subprocess: read |0: i/o timeout\\"\\nerror running container: write containercreatepipe: broken pipe\\nerror while running runtime: exit status 1", "stderr_lines": ["time=\\"2022-03-10T00:24:38Z\\" level=error msg=\\"did not get container create message from subprocess: read |0: i/o timeout\\"", "error running container: write containercreatepipe: broken pipe", "error while running runtime: exit status 1"], "stdout": "", "stdout_lines": []}\n\x1b[Ke30=\x1b[4D\x1b[K\x1b[KeyJ1dWlkIjogIjQ2MzI2NGYyLThmOTItNGZkZi1iODZhLTM4MTBlNDlhYmUwMCJ9\x1b[64D\x1b[K2022-03-10 00:24:43.120362 | fa163e34-8197-f334-8ca8-000000000043 | TIMING | tripleo-modify-image : Capture the update repos and installed rpms | localhost | 0:00:53.303564 | 38.92s\n\x1b[Ke30=\x1b[4D\x1b[K\x1b[KeyJ1dWlkIjogIjAzMzI1Yzk2LWNiNzktNDI4OC1hN2YwLTE4ZWU2MDI3ZmI0MSJ9\x1b[64D\x1b[K\nPLAY RECAP *********************************************************************\n\x1b[Ke30=\x1b[4D\x1b[K\x1b[KeyJ1dWlkIjogIjZkMDQzNjU5LTczMTEtNGEzZC1hN2M3LWY0Mzk4OGNjMzkzNiJ9\x1b[64D\x1b[Klocalhost : ok=17 changed=8 unreachable=0 failed=1 skipped=13 rescued=0 ignored=0 \n\x1b[Ke30=\x1b[4D\x1b[K\x1b[KeyJ1dWlkIjogImE3NTY0YzA1LTdjMTQtNDhkYS1iMjI3LWI4OWI2MjM4ZGFjNyJ9\x1b[64D\x1b[K\n\x1b[Ke30=\x1b[4D\x1b[K\x1b[KeyJ1dWlkIjogIjE5NGJjMmU5LTRkZmEtNDMzZC05MmFjLT

[1] https://review.rdoproject.org/zuul/buildset/14370ba46f1b4eb3abd5623010905187
[2] https://logserver.rdoproject.org/48/48dcad8c810111579a41db528c5e490dd2dcc559/openstack-periodic-integration-stable1-cs9/periodic-tripleo-ci-centos-9-standalone-full-tempest-scenario-wallaby/0cca4d0/logs/undercloud/var/log/tripleo-container-image-prepare.log.txt.gz

Changed in tripleo:
status: Fix Released → Triaged
Revision history for this message
Ronelle Landy (rlandy) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

not sure what to do here... this was an inconsistent issue possibly some kind of race condition but we haven't seen it in ~10 days now based on comments here...

i am tempted to move to incomplete if we continue to not see it this week?

Revision history for this message
Marios Andreou (marios-b) wrote :

moving incomplete... if you see this again please add a comment with pointer to the logs and move the bug Status back to Triaged

Changed in tripleo:
status: Triaged → Incomplete
Revision history for this message
Marios Andreou (marios-b) wrote :

spoke too soon :/ - moving back to triaged

We have examples from yesterday there:

        * https://logserver.rdoproject.org/openstack-component-network/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-standalone-network-wallaby/bfad10b/logs/undercloud/var/log/tripleo-container-image-prepare.log.txt.gz
        * ": "2022-03-29 12:56:37.708842", "stderr": "time=\\"2022-03-29T12:56:53Z\\" level=error msg=\\"did not get container create message from subprocess: read |0: i/o timeout\\"\\nerror running container: write containercreatepipe: broken pipe\\nerror while running runtime: exit status 1", "stderr_lines": ["time=\\"2022-03-29T12:56:53Z\\" level=error msg=\\"did not get container create message from subprocess: read |0: i/o timeout\\"", "error running container: write containercreatepipe: broken pipe", "error while running runtime: exit status 1"], "stdout": "", "stdout_lines": []}\n\x1b[Ke30=\x1b[4D\x1b[K\x1b[KeyJ1dWlkIjogIjU1ZjQ2NjZlLTMyMDctNDczNC1hNjU4LWZlZmY0ZmQ3NzRjNiJ9\x1b[64D\x1b[K2022-03-29 12:57:04.818561 | fa163e8a-6d5b-ad2a-9551-000000000043 | TIMING | tripleo-modify-image : Capture the update repos and installed rpms | localhost | 0:02:13.990545 | 27.45s\n\x1b[Ke30=\x1b[4D\x1b[K\x1b[KeyJ1dWlkIjogIjI3NWExNzhiLTU5MDUtNDY0ZC05ZDAzLTZiYjBiMWI1NjU2YSJ9\x1b[64D\x1b[K\nPLAY RECAP *********************************************************************\n\x1b[

        * https://logserver.rdoproject.org/openstack-component-network/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-standalone-network-wallaby-validation/f0582ea/logs/undercloud/var/log/tripleo-container-image-prepare.log.txt.gz

 * e30=eyJ1dWlkIjogIjk1NTU5OTkyLTM0NzEtNDIzZi1iNWM4LThiZTE1N2I4OGJkOCJ92022-03-29 13:02:48.712267 | fa163e4a-5da8-70f8-6211-000000000043 | FATAL | Capture the update repos and installed rpms | localhost | error={"changed": true, "cmd": "buildah run 192.168.24.1-working-container yum list installed > /var/log/container_info.log\n", "delta": "0:00:47.014302", "end": "2022-03-29 13:02:48.686853", "msg": "non-zero return code", "rc": 1, "start": "2022-03-29 13:02:01.672551", "stderr": "time=\"2022-03-29T13:02:27Z\" level=error msg=\"did not get container create message from subprocess: read |0: i/o timeout\"\nerror running container: write containercreatepipe: broken pipe\nerror while running runtime: exit status 1", "stderr_lines": ["time=\"2022-03-29T13:02:27Z\" level=error msg=\"did not get container create message from subprocess: read |0: i/o timeout\"", "error running container: write containercreatepipe: broken pipe", "error while running runtime: exit status 1"], "stdout": "", "stdout_lines": []}

Changed in tripleo:
status: Incomplete → Triaged
Revision history for this message
Douglas Viroel (dviroel) wrote (last edit ):
Revision history for this message
chandan kumar (chkumar246) wrote :

https://gitlab.com/redhat/centos-stream/rpms/buildah/-/commit/af56ba256ddf85322323701d53a8e25df2284e8a - buildah-1.25.1-1.el9 got built a week ago. Hope to see soon in the CentOS Stream 9 compose.

Revision history for this message
Douglas Viroel (dviroel) wrote (last edit ):
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.