periodic featureset001 failing with timeout due being stucked in mount-image task

Bug #1988808 reported by Arx Cruz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Harald Jensås

Bug Description

https://logserver.rdoproject.org/50/44150/12/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-master/340140a/job-output.txt

The job is stucked in mounting image task

https://opendev.org/openstack/tripleo-quickstart-extras/src/branch/master/roles/modify-image/tasks/manual.yml#L43

2022-09-05 14:19:29.178790 | primary | TASK [modify-image : Mount image] **********************************************
2022-09-05 14:19:29.178838 | primary | Monday 05 September 2022 14:19:29 -0400 (0:00:05.739) 0:06:30.607 ******
2022-09-05 21:58:41.418583 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/openstack/tripleo-ci/playbooks/tripleo-ci/run-v3.yaml@master]
2022-09-05 21:58:41.454823 | POST-RUN START: [trusted : review.rdoproject.org/config/playbooks/tripleo-ci-periodic-base/post.yaml@master]

The same happens with other jobs, like featureset035 and featureset039

Revision history for this message
chandan kumar (chkumar246) wrote :

In this buildset https://review.rdoproject.org/zuul/buildset/1704aea5d2d54db0bed68dc388b416bc

https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset035-master/667e6b6/job-output.txt

It is the last passing job of fs035 on 2022-09-04.

On tripleo-common merged changes:
- 838972: Add chroot support to tripleo-mount-image | https://review.opendev.org/c/openstack/tripleo-common/+/838972 got merged on 5th Sept, not sure it is linked it is a new feature.

Revision history for this message
Arx Cruz (arxcruz) wrote :
Download full text (3.8 KiB)

We got a testproject and hold the node, on the mount image step, after we kill the mount image proccess, after run the mount image manually, we got this output:

[zuul@node-0003003071 ~]$ umount /tmp/tmp.s6qdHltSvU
umount: /tmp/tmp.s6qdHltSvU: must be superuser to unmount.
[zuul@node-0003003071 ~]$ sudo umount /tmp/tmp.s6qdHltSvU
umount: /tmp/tmp.s6qdHltSvU: target is busy.
[zuul@node-0003003071 ~]$ sudo tripleo-mount-image -a /home/zuul/overcloud-hardened-uefi-full.raw -m /tmp/tmp.s6qdHltSvb
mountpoint: /tmp/tmp.s6qdHltSvb: No such file or directory
+ qemu-img info --output json /home/zuul/overcloud-hardened-uefi-full.raw
+ grep '"format": "raw"'
    "format": "raw",
+ image_format='--format raw'
+ qemu-nbd --format raw --connect /dev/nbd4 /home/zuul/overcloud-hardened-uefi-full.raw
+ vgscan
  WARNING: Not using device /dev/nbd1p4 for PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh.
  WARNING: Not using device /dev/nbd2p4 for PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh.
  WARNING: Not using device /dev/nbd3p4 for PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh.
  WARNING: Not using device /dev/nbd4p4 for PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh.
  WARNING: PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh prefers device /dev/nbd0p4 because device is used by LV.
  WARNING: PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh prefers device /dev/nbd0p4 because device is used by LV.
  WARNING: PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh prefers device /dev/nbd0p4 because device is used by LV.
  WARNING: PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh prefers device /dev/nbd0p4 because device is used by LV.
  Found volume group "vg" using metadata type lvm2
+ vgchange --refresh
  WARNING: Not using device /dev/nbd1p4 for PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh.
  WARNING: Not using device /dev/nbd2p4 for PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh.
  WARNING: Not using device /dev/nbd3p4 for PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh.
  WARNING: Not using device /dev/nbd4p4 for PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh.
  WARNING: PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh prefers device /dev/nbd0p4 because device is used by LV.
  WARNING: PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh prefers device /dev/nbd0p4 because device is used by LV.
  WARNING: PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh prefers device /dev/nbd0p4 because device is used by LV.
  WARNING: PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh prefers device /dev/nbd0p4 because device is used by LV.
+ vgchange -ay
  WARNING: Not using device /dev/nbd1p4 for PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh.
  WARNING: Not using device /dev/nbd2p4 for PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh.
  WARNING: Not using device /dev/nbd3p4 for PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh.
  WARNING: Not using device /dev/nbd4p4 for PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh.
  WARNING: PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh prefers device /dev/nbd0p4 because device is used by LV.
  WARNING: PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh prefers device /dev/nbd0p4 because device is used by LV.
  WARNING: PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh prefers device /dev/nbd0p4 because device is used by LV.
  WARNING: PV t9T6FU-lPQj-YkLM-Nsps-PAQr-xY4s-cCmijh pr...

Read more...

Revision history for this message
Arx Cruz (arxcruz) wrote :
Download full text (23.1 KiB)

After killing the mount image process, the job that was running throw this output:

2022-09-06 05:24:29.654119 | primary |
2022-09-06 05:24:29.654166 | primary | TASK [modify-image : Mount image] **********************************************
2022-09-06 05:24:29.654252 | primary | Tuesday 06 September 2022 05:24:29 -0400 (0:00:07.493) 0:07:36.630 *****
2022-09-06 05:50:24.176605 | primary | fatal: [undercloud]: FAILED! => {"changed": true, "cmd": "set -ex\nif type tripleo-mount-image >/dev/null; then\n tripleo-mount-image -a /home/zuul/overcloud-hardened-uefi-full.raw -m /tmp/tmp.s6qdHltSvU\nelse\n # stable branches do not have tripleo-mount-image, and only use\n # partition images\n modprobe nbd\n if qemu-img info --output json /home/zuul/overcloud-hardened-uefi-full.raw |grep '\"format\": \"raw\"' ; then\n image_format='--format raw'\n elif qemu-img info --output json /home/zuul/overcloud-hardened-uefi-full.raw |grep '\"format\": \"qcow2\"' ; then\n image_format='--format qcow2'\n else\n image_format=''\n fi\n qemu-nbd $image_format --connect /dev/nbd0 /home/zuul/overcloud-hardened-uefi-full.raw\n mount /dev/nbd0 /tmp/tmp.s6qdHltSvU\nfi\n", "delta": "0:25:52.440895", "end": "2022-09-06 05:50:23.819050", "msg": "non-zero return code", "rc": -9, "start": "2022-09-06 05:24:31.378155", "stderr": "+ type tripleo-mount-image\n+ tripleo-mount-image -a /home/zuul/overcloud-hardened-uefi-full.raw -m /tmp/tmp.s6qdHltSvU\n+ qemu-img info --output json /home/zuul/overcloud-hardened-uefi-full.raw\n+ grep '\"format\": \"raw\"'\n+ image_format='--format raw'\n+ qemu-nbd --format raw --connect /dev/nbd0 /home/zuul/overcloud-hardened-uefi-full.raw\n+ vgscan\n+ vgchange --refresh\n+ vgchange -ay\n+ root_device=\n+ boot_device=\n+ efi_device=\n+ timeout 5 sh -c 'while ! ls /dev/nbd0p* ; do sleep 1; done'\n+ set +e\n++ ls /dev/nbd0p1 /dev/nbd0p2 /dev/nbd0p3 /dev/nbd0p4\n+ devices='/dev/nbd0p1\n/dev/nbd0p2\n/dev/nbd0p3\n/dev/nbd0p4'\n+ set -e\n++ echo /dev/nbd0p1 /dev/nbd0p2 /dev/nbd0p3 /dev/nbd0p4\n++ wc -w\n+ device_count=4\n+ '[' 4 == 0 ']'\n+ '[' 4 == 1 ']'\n+ for device in ${devices}\n+ lsblk --nodeps -P --output-all /dev/nbd0p1\n++ lsblk --all --nodeps --noheadings --output FSTYPE /dev/nbd0p1\n+ fstype=vfat\n++ lsblk --all --nodeps --noheadings --output LABEL /dev/nbd0p1\n+ label=MKFS_ESP\n++ lsblk --all --nodeps --noheadings --output PARTTYPENAME /dev/nbd0p1\n+ part_type_name='EFI System'\n++ lsblk --all --nodeps --noheadings --output PARTTYPE /dev/nbd0p1\n+ part_type=c12a7328-f81f-11d2-ba4b-00a0c93ec93b\n+ '[' -z vfat ']'\n+ '[' -z '' ']'\n+ [[ c12a7328-f81f-11d2-ba4b-00a0c93ec93b == c12a7328-f81f-11d2-ba4b-00a0c93ec93b ]]\n+ efi_device=/dev/nbd0p1\n+ continue\n+ for device in ${devices}\n+ lsblk --nodeps -P --output-all /dev/nbd0p2\n++ lsblk --all --nodeps --noheadings --output FSTYPE /dev/nbd0p2\n+ fstype=\n++ lsblk --all --nodeps --noheadings --output LABEL /dev/nbd0p2\n+ label=\n++ lsblk --all --nodeps --noheadings --output PARTTYPENAME /dev/nbd0p2\n+ part_type_name='BIOS boot'\n++ lsblk --all --nodeps --noheadings --output PARTTYPE /dev/nbd0p2\n+ part_type=21686148-6449-6e6f-744e-656564454649\n+ '[' -z '' ']'\n+ con...

Revision history for this message
chandan kumar (chkumar246) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (master)
Changed in tripleo:
status: Triaged → In Progress
Changed in tripleo:
assignee: nobody → Harald Jensås (harald-jensas)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-common/+/856154
Committed: https://opendev.org/openstack/tripleo-common/commit/0366b14d0b540500fb2051a39962717e44df3038
Submitter: "Zuul (22348)"
Branch: master

commit 0366b14d0b540500fb2051a39962717e44df3038
Author: Harald Jensås <email address hidden>
Date: Tue Sep 6 17:32:33 2022 +0200

    Fix condition for CHROOT, missing quotes

    The condition controlling if start_chroot should run was
    always evaluating to true, so the start_chroot function was
    called even when --chroot option was not used.

    Adding quotes, and curly braces, on the $CHROOT variable should
    fix the issue.

    Closes-Bug: #1988808
    Change-Id: I318809454924a342750c6044f56ec05893a8da25

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 17.0.0

This issue was fixed in the openstack/tripleo-common 17.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.