wipe_osds.sh fails due to race condition
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Erickson Silva de Oliveira |
Bug Description
Brief Description
-----------------
The first Ansible bootstrap attempt failed with the following output:
2024-03-01 17:21:12,542 p=6063 u=sysadmin n=ansible | TASK [common/
2024-03-01 17:21:12,542 p=6063 u=sysadmin n=ansible | Friday 01 March 2024 17:21:12 +0000 (0:00:00.032) 0:00:39.271 **********
2024-03-01 17:21:12,553 p=6063 u=sysadmin n=ansible | skipping: [localhost]
2024-03-01 17:21:12,556 p=6063 u=sysadmin n=ansible | TASK [common/
2024-03-01 17:21:12,556 p=6063 u=sysadmin n=ansible | Friday 01 March 2024 17:21:12 +0000 (0:00:00.014) 0:00:39.285 **********
2024-03-01 17:21:15,051 p=6063 u=sysadmin n=ansible | fatal: [localhost]: FAILED! => changed=true
msg: non-zero return code
rc: 1
stderr: |-
+ for f in /dev/disk/by-path/*
+ '[' '!' -e /dev/disk/
++ readlink -f /dev/disk/
+ dev=/dev/sda
+ lsblk --nodeps --pairs /dev/sda
+ grep -q 'TYPE="disk"'
+ multipath -c /dev/sda
+ set -e
+ wipe_if_ceph_disk /dev/sda 4FBD7E29-
+ __dev=/dev/sda
+ __osd_guid=
+ __journal_
+ __is_multipath=0
+ ceph_disk=false
++ flock /dev/sda sfdisk -q -l /dev/sda
++ awk '$1 == "Device" {i=1; next}; i {print $1}'
+ for part in $(flock "${__dev}" sfdisk -q -l "${__dev}" | awk '$1 == "Device" {i=1; next}; i {print $1}')
++ udevadm info /dev/sda1
++ grep -oP -m1 'E: PARTN=\K.*|E: DM_PART=\K.*'
+ part_no=1
++ flock /dev/sda sfdisk --part-type /dev/sda 1
+ guid=BA5EBA11-
+ '[' BA5EBA11-
+ '[' BA5EBA11-
+ for part in $(flock "${__dev}" sfdisk -q -l "${__dev}" | awk '$1 == "Device" {i=1; next}; i {print $1}')
++ udevadm info /dev/sda2
++ grep -oP -m1 'E: PARTN=\K.*|E: DM_PART=\K.*'
+ part_no=2
++ flock /dev/sda sfdisk --part-type /dev/sda 2
+ guid=C12A7328-
+ '[' C12A7328-
+ '[' C12A7328-
+ for part in $(flock "${__dev}" sfdisk -q -l "${__dev}" | awk '$1 == "Device" {i=1; next}; i {print $1}')
++ udevadm info /dev/sda3
++ grep -oP -m1 'E: PARTN=\K.*|E: DM_PART=\K.*'
+ part_no=3
++ flock /dev/sda sfdisk --part-type /dev/sda 3
+ guid=0FC63DAF-
+ '[' 0FC63DAF-
+ '[' 0FC63DAF-
+ for part in $(flock "${__dev}" sfdisk -q -l "${__dev}" | awk '$1 == "Device" {i=1; next}; i {print $1}')
++ udevadm info /dev/sda4
++ grep -oP -m1 'E: PARTN=\K.*|E: DM_PART=\K.*'
+ part_no=4
++ flock /dev/sda sfdisk --part-type /dev/sda 4
+ guid=E6D6D379-
+ '[' E6D6D379-
+ '[' E6D6D379-
+ '[' false = true ']'
+ set +e
+ for f in /dev/disk/by-path/*
+ '[' '!' -e /dev/disk/
++ readlink -f /dev/disk/
+ dev=/dev/sda1
+ lsblk --nodeps --pairs /dev/sda1
+ grep -q 'TYPE="disk"'
+ continue
+ for f in /dev/disk/by-path/*
+ '[' '!' -e /dev/disk/
++ readlink -f /dev/disk/
+ dev=/dev/sda2
+ lsblk --nodeps --pairs /dev/sda2
+ grep -q 'TYPE="disk"'
+ continue
+ for f in /dev/disk/by-path/*
+ '[' '!' -e /dev/disk/
++ readlink -f /dev/disk/
+ dev=/dev/sda3
+ lsblk --nodeps --pairs /dev/sda3
+ grep -q 'TYPE="disk"'
+ continue
+ for f in /dev/disk/by-path/*
+ '[' '!' -e /dev/disk/
++ readlink -f /dev/disk/
+ dev=/dev/sda4
+ lsblk --nodeps --pairs /dev/sda4
+ grep -q 'TYPE="disk"'
+ continue
+ for f in /dev/disk/by-path/*
+ '[' '!' -e /dev/disk/
++ readlink -f /dev/disk/
+ dev=/dev/sdb
+ lsblk --nodeps --pairs /dev/sdb
+ grep -q 'TYPE="disk"'
+ multipath -c /dev/sdb
+ set -e
+ wipe_if_ceph_disk /dev/sdb 4FBD7E29-
+ __dev=/dev/sdb
+ __osd_guid=
+ __journal_
+ __is_multipath=0
+ ceph_disk=false
++ flock /dev/sdb sfdisk -q -l /dev/sdb
++ awk '$1 == "Device" {i=1; next}; i {print $1}'
+ for part in $(flock "${__dev}" sfdisk -q -l "${__dev}" | awk '$1 == "Device" {i=1; next}; i {print $1}')
++ udevadm info /dev/sdb1
++ grep -oP -m1 'E: PARTN=\K.*|E: DM_PART=\K.*'
+ part_no=1
++ flock /dev/sdb sfdisk --part-type /dev/sdb 1
+ guid=4FBD7E29-
+ '[' 4FBD7E29-
+ echo 'Found Ceph OSD partition #1 /dev/sdb1, erasing!'
+ dd if=/dev/zero of=/dev/sdb1 bs=512 count=34
++ blockdev --getsz /dev/sdb1
+ seek_end=1873285741
+ dd if=/dev/zero of=/dev/sdb1 bs=512 count=34 seek=1873285741
+ parted -s /dev/sdb rm 1
+ ceph_disk=true
+ parted /dev/sdb p
+ grep 'ceph data'
+ for part in $(flock "${__dev}" sfdisk -q -l "${__dev}" | awk '$1 == "Device" {i=1; next}; i {print $1}')
++ udevadm info /dev/sdb2
++ grep -oP -m1 'E: PARTN=\K.*|E: DM_PART=\K.*'
Unknown device "/dev/sdb2": No such file or directory
+ part_no=
stderr_lines: <omitted>
stdout: |-
DM_
DM_
Found Ceph OSD partition #1 /dev/sdb1, erasing!
stdout_lines: <omitted>
2024-03-01 17:21:15,052 p=6063 u=sysadmin n=ansible | PLAY RECAP *******
2024-03-01 17:21:15,053 p=6063 u=sysadmin n=ansible | localhost : ok=138 changed=24 unreachable=0 failed=1 skipped=176 rescued=0 ignored=0
Severity
--------
Minor: Prevents the first Ansible bootstrap from succeeding with every lab installation that is installing on top of a disk with an older installation.
Steps to Reproduce
------------------
Install a ISO image onto a server in All-in-One simplex configuration.
Expected Behavior
------------------
The installation should succeed in the first try of the Ansible bootstrap.
Actual Behavior
----------------
First Ansible bootstrap attempt fails due to the wipe_osds.sh script issue mentioned above with the quoted logs.
Reproducibility
---------------
Intermittent
System Configuration
-------
AIO-SX
Changed in starlingx: | |
importance: | Undecided → Medium |
tags: | added: stx.10.0 stx.storage |
Changed in starlingx: | |
assignee: | nobody → Erickson Silva de Oliveira (esilvade) |
Fix proposed to branch: master /review. opendev. org/c/starlingx /ansible- playbooks/ +/912463
Review: https:/