tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001 failed due to qemu-nbd error

Bug #1950137 reported by Juan Badia Payno
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

RDO third party jobs failure

TASK [modify-image : Mount image] **********************************************
Monday 08 November 2021 08:07:36 +0000 (0:00:02.273) 0:18:09.537 *******
fatal: [undercloud]: FAILED! => {
    "changed": true,
    "cmd": "set -ex\nif type tripleo-mount-image >/dev/null; then\n tripleo-mount-image -a /home/zuul/overcloud-hardened-uefi-full.raw -m /tmp/tmp.jE3LZJVamk\nelse\n # stable branches do not have tripleo-mount-image, and only use\n # partition images\n modprobe nbd\n if qemu-img info --output json /home/zuul/overcloud-hardened-uefi-full.raw |grep '\"format\": \"raw\"' ; then\n image_format='--format raw'\n elif qemu-img info --output json /home/zuul/overcloud-hardened-uefi-full.raw |grep '\"format\": \"qcow2\"' ; then\n image_format='--format qcow2'\n else\n image_format=''\n fi\n qemu-nbd $image_format --connect /dev/nbd0 /home/zuul/overcloud-hardened-uefi-full.raw\n mount /dev/nbd0 /tmp/tmp.jE3LZJVamk\nfi\n",
    "delta": "0:00:00.083689",
    "end": "2021-11-08 08:07:36.671384",
    "rc": 1,
    "start": "2021-11-08 08:07:36.587695"
}

STDOUT:

    "format": "raw",

STDERR:

+ type tripleo-mount-image
+ tripleo-mount-image -a /home/zuul/overcloud-hardened-uefi-full.raw -m /tmp/tmp.jE3LZJVamk
+ qemu-img info --output json /home/zuul/overcloud-hardened-uefi-full.raw
+ grep '"format": "raw"'
+ image_format='--format raw'
+ qemu-nbd --format raw --connect /dev/nbd0 /home/zuul/overcloud-hardened-uefi-full.raw
qemu-nbd: Failed to set NBD socket
qemu-nbd: Disconnect client, due to: Failed to send reply: Unable to write to socket: Broken pipe

MSG:

non-zero return code
...ignoring

TASK [modify-image : Debug image mount] ****************************************
Monday 08 November 2021 08:07:36 +0000 (0:00:00.511) 0:18:10.049 *******
fatal: [undercloud]: FAILED! => {}

MSG:

{'stdout': ' "format": "raw",', 'stderr': '+ type tripleo-mount-image\n+ tripleo-mount-image -a /home/zuul/overcloud-hardened-uefi-full.raw -m /tmp/tmp.jE3LZJVamk\n+ qemu-img info --output json /home/zuul/overcloud-hardened-uefi-full.raw\n+ grep \'"format": "raw"\'\n+ image_format=\'--format raw\'\n+ qemu-nbd --format raw --connect /dev/nbd0 /home/zuul/overcloud-hardened-uefi-full.raw\nqemu-nbd: Failed to set NBD socket\nqemu-nbd: Disconnect client, due to: Failed to send reply: Unable to write to socket: Broken pipe'}

PLAY RECAP *********************************************************************
localhost : ok=18 changed=7 unreachable=0 failed=0 skipped=74 rescued=0 ignored=0
undercloud : ok=158 changed=57 unreachable=0 failed=1 skipped=160 rescued=0 ignored=5

Revision history for this message
Juan Badia Payno (jbadiapa) wrote :
tags: added: ci
Revision history for this message
Ronelle Landy (rlandy) wrote :
tags: added: promotion-blocker
Changed in tripleo:
milestone: none → yoga-1
importance: Undecided → Critical
status: New → Triaged
Revision history for this message
Harald Jensås (harald-jensas) wrote :

Nov 08 08:07:52 node-0001752850 kernel: block nbd0: NBD_DISCONNECT
Nov 08 08:07:52 node-0001752850 kernel: block nbd0: Disconnected due to user request.
Nov 08 08:07:52 node-0001752850 kernel: block nbd0: shutting down sockets

Nov 08 08:07:52 node-0001752850 kernel: blk_update_request: I/O error, dev nbd0, sector 51200 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Nov 08 08:07:52 node-0001752850 kernel: blk_update_request: I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Nov 08 08:07:52 node-0001752850 kernel: blk_update_request: I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Nov 08 08:07:52 node-0001752850 kernel: blk_update_request: I/O error, dev nbd0, sector 34816 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

Nov 08 08:12:58 node-0001752850 ansible-command[53254]: Invoked with _raw_params=set -ex
                                                        if type tripleo-mount-image >/dev/null; then
                                                          tripleo-mount-image -a /home/zuul/overcloud-hardened-uefi-full.raw -m /tmp/tmp.6vVNrg7FbC
                                                        else
                                                          # stable branches do not have tripleo-mount-image, and only use
                                                          # partition images
                                                          modprobe nbd
                                                          if qemu-img info --output json /home/zuul/overcloud-hardened-uefi-full.raw |grep '"format": "raw"' ; then
                                                              image_format='--format raw'
                                                          elif qemu-img info --output json /home/zuul/overcloud-hardened-uefi-full.raw |grep '"format": "qcow2"' ; then
                                                              image_format='--format qcow2'
                                                          else
                                                              image_format=''
                                                          fi
                                                          qemu-nbd $image_format --connect /dev/nbd0 /home/zuul/overcloud-hardened-uefi-full.raw
                                                          mount /dev/nbd0 /tmp/tmp.6vVNrg7FbC
                                                        fi
                                                         _uses_shell=True warn=False stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None
Nov 08 08:12:59 node-0001752850 kernel: block nbd0: Device being setup by another task

Revision history for this message
Juan Badia Payno (jbadiapa) wrote :

We might be able to use the --shared=3 on the qemu-nbd to fix this issue but I don't know the implication of it.

]$ qemu-nbd --help
Usage: qemu-nbd [OPTIONS] FILE
  or: qemu-nbd -L [OPTIONS]
QEMU Disk Network Block Device Utility

  -h, --help display this help and exit
  -V, --version output version information and exit

Connection properties:
  -p, --port=PORT port to listen on (default `10809')
  -b, --bind=IFACE interface to bind to (default `0.0.0.0')
  -k, --socket=PATH path to the unix socket
                            (default '/var/lock/qemu-nbd-DEVICE')
  -e, --shared=NUM device can be shared by NUM clients (default '1')
  -t, --persistent don't exit on the last connection
  -v, --verbose display extra debugging information
  -x, --export-name=NAME expose export by name (default is empty string)
  -D, --description=TEXT export a human-readable description

Revision history for this message
Harald Jensås (harald-jensas) wrote :
Download full text (9.6 KiB)

I was able to reproduce the issue by manually running tripleo-mount-image and tripleo-unmount-image agains a whole disk image:pv

$ curl -O https://cloud.centos.org/centos/8-stream/x86_64/images/CentOS-Stream-GenericCloud-8-20210210.0.x86_64.qcow2
$ export DIB_YUM_REPO_CONF="/etc/yum.repos.d/delorean* /etc/yum.repos.d/tripleo-centos-*"
$ export DIB_LOCAL_IMAGE={{ ansible_env.HOME }}/images/CentOS-Stream-GenericCloud-8-20210210.0.x86_64.qcow2
$ export DIB_LOCAL_IMAGE=/home/centos/whole-disk-images/CentOS-Stream-GenericCloud-8-20210210.0.x86_64.qcow2
$ export DIB_DNF_MODULE_STREAMS='container-tools:3.0'
$ openstack overcloud image build --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-hardened-images-uefi-python3.yaml --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-hardened-images-uefi-centos8.yaml

## Mount image

$ sudo tripleo-mount-image -a overcloud-hardened-uefi-full.qcow2 -m /mnt/

Nov 10 15:44:42 undercloud.rdocloud kernel: nbd0: p1 p2 p3
Nov 10 15:44:42 undercloud.rdocloud systemd[1]: Created slice system-lvm2\x2dpvscan.slice.
Nov 10 15:44:42 undercloud.rdocloud systemd[1]: Starting LVM event activation on device 43:3...
Nov 10 15:44:42 undercloud.rdocloud lvm[365333]: pvscan[365333] PV /dev/nbd0p3 online, VG vg is complete.
Nov 10 15:44:42 undercloud.rdocloud lvm[365333]: pvscan[365333] VG vg run autoactivation.
Nov 10 15:44:42 undercloud.rdocloud lvm[365333]: PVID VdbWs1-MVI7-J5zn-pqAI-uCuI-duHx-fm82kj read from /dev/nbd0p3 last written to /dev/mapper/loop0p3.
Nov 10 15:44:42 undercloud.rdocloud lvm[365333]: pvscan[365333] VG vg not using quick activation.

Nov 10 15:44:43 undercloud.rdocloud kernel: XFS (dm-0): Mounting V5 Filesystem
Nov 10 15:44:43 undercloud.rdocloud kernel: XFS (dm-0): Ending clean mount
Nov 10 15:44:43 undercloud.rdocloud kernel: XFS (dm-2): Mounting V5 Filesystem
Nov 10 15:4...

Read more...

Revision history for this message
Harald Jensås (harald-jensas) wrote :

unmount_image() {

    set -x

    if mountpoint "$MOUNT_DIR"; then
        for m in $REVERSE_MOUNTS; do
            path=${m#*:}
            unmount_volume $MOUNT_DIR$path
        done
        unmount_volume $MOUNT_DIR/boot/efi
        unmount_volume $MOUNT_DIR
    fi
    # `--activate n` makes LVs inactive, or unavailable
    vgchange --activate n vg # <--- Adding this line to deactivefixes the issue for me.
    qemu-nbd --disconnect $NBD_DEVICE
    vgchange --refresh vg || true

    for m in $REVERSE_MOUNTS; do
        device=${m%:*}
        remove_device $device
    done
    remove_device vg-lv_root
}

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (master)
Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-common/+/817456
Committed: https://opendev.org/openstack/tripleo-common/commit/cc2a64572ddf90401476b1f1ce52b7c62b3b24ad
Submitter: "Zuul (22348)"
Branch: master

commit cc2a64572ddf90401476b1f1ce52b7c62b3b24ad
Author: Harald Jensås <email address hidden>
Date: Wed Nov 10 17:14:05 2021 +0100

    Deactivate LV's before disconnecting nbd device

    In tripleo-unmount-image, ensure the LV's is inactive
    prior to disconnection the NBD device. If the LV's are
    active the device is locked, causing the disconnect to
    fail and subsequent attempts to mount using the same nbd
    device fails.

    Change-Id: I25be71542df1e738002063170138d5e66fabdaf4
    Closes-Bug: #1950137

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-common/+/817889

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/tripleo-common/+/817889
Committed: https://opendev.org/openstack/tripleo-common/commit/87c6a1aaaaada9ad9bc178dfa39289b7a08c5c13
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 87c6a1aaaaada9ad9bc178dfa39289b7a08c5c13
Author: Harald Jensås <email address hidden>
Date: Wed Nov 10 17:14:05 2021 +0100

    Deactivate LV's before disconnecting nbd device

    In tripleo-unmount-image, ensure the LV's is inactive
    prior to disconnection the NBD device. If the LV's are
    active the device is locked, causing the disconnect to
    fail and subsequent attempts to mount using the same nbd
    device fails.

    Change-Id: I25be71542df1e738002063170138d5e66fabdaf4
    Closes-Bug: #1950137
    (cherry picked from commit cc2a64572ddf90401476b1f1ce52b7c62b3b24ad)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 16.3.0

This issue was fixed in the openstack/tripleo-common 16.3.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.