IPv6 Distributed Cloud: ansible-playbook ‘Wipe ceph osds’ does not support re-play / re-entrance

Bug #1847336 reported by Peng Peng on 2019-10-08
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Medium
Ovidiu Poncea

Bug Description

Brief Description
-----------------
To install distributed cloud system controller, run ansible-play bootstrap first, then run 'ansible-playbook lab-install-playbook.yaml -e "@local-install-overrides.yaml" failed at Wipe ceph osds

Severity
--------
Major

Steps to Reproduce
------------------
ansible-playbook /usr/share/ansible/stx-ansible/playbooks/bootstrap.yml
ansible-playbook lab-install-playbook.yaml -e "@local-install-overrides.yaml"

TC-name:

Expected Behavior
------------------
ansible Wipe ceph osds should support rerun

Actual Behavior
----------------

Reproducibility
---------------
Reproducible

System Configuration
--------------------
DC system
IPv6

Lab-name: DC WCP_90

Branch/Pull Time/Commit
-----------------------
19.10 master as of 2019-10-06_20-00-00

Last Pass
---------

Timestamp/Logs
--------------
TASK [bootstrap/validate-config : Wipe ceph osds] ***********************************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"msg": "privilege output closed while waiting for password prompt:\n+ for f in '/dev/disk/by-path/*'\n+ '[' '!' -e /dev/disk/by-path/pci-0000:00:14.0-usb-0:12:1.0-scsi-0:0:0:0 ']'\n++ readlink -f /dev/disk/by-path/pci-0000:00:14.0-usb-0:12:1.0-scsi-0:0:0:0\n+ dev=/dev/sda\n+ grep -q 'TYPE=\"disk\"'\n+ lsblk --nodeps --pairs /dev/sda\n+ '[' 0 -ne 0 ']'\n+ set -e\n+ part_no=1\n+ ceph_disk=false\n++ awk '$1 == \"part\" {print $2}'\n++ lsblk -rip /dev/sda -o TYPE,NAME\n+ for part in '$(lsblk -rip ${dev} -o TYPE,NAME | awk '\\''$1 == \"part\" {print $2}'\\'')'\n++ flock /dev/sda sgdisk -i 1 /dev/sda\nInvalid partition data!\n++ true\n+ sgdisk_part_info=\n++ awk '{print $4;}'\n++ grep 'Partition GUID code'\n++ echo ''\n+ guid=\n+ '[' '' == 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D ']'\n+ '[' '' == 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 ']'\n+ part_no=2\n+ for part in '$(lsblk -rip ${dev} -o TYPE,NAME | awk '\\''$1 == \"part\" {print $2}'\\'')'\n++ flock /dev/sda sgdisk -i 2 /dev/sda\nInvalid partition data!\n++ true\n+ sgdisk_part_info=\n++ awk '{print $4;}'\n++ grep 'Partition GUID code'\n++ echo ''\n+ guid=\n+ '[' '' == 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D ']'\n+ '[' '' == 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 ']'\n+ part_no=3\n+ '[' false == true ']'\n+ set +e\n+ for f in '/dev/disk/by-path/*'\n+ '[' '!' -e /dev/disk/by-path/pci-0000:00:14.0-usb-0:12:1.0-scsi-0:0:0:0-part1 ']'\n++ readlink -f /dev/disk/by-path/pci-0000:00:14.0-usb-0:12:1.0-scsi-0:0:0:0-part1\n+ dev=/dev/sda1\n+ grep -q 'TYPE=\"disk\"'\n+ lsblk --nodeps --pairs /dev/sda1\n+ '[' 1 -ne 0 ']'\n+ continue\n+ for f in '/dev/disk/by-path/*'\n+ '[' '!' -e /dev/disk/by-path/pci-0000:00:14.0-usb-0:12:1.0-scsi-0:0:0:0-part2 ']'\n++ readlink -f /dev/disk/by-path/pci-0000:00:14.0-usb-0:12:1.0-scsi-0:0:0:0-part2\n+ dev=/dev/sda2\n+ grep -q 'TYPE=\"disk\"'\n+ lsblk --nodeps --pairs /dev/sda2\n+ '[' 1 -ne 0 ']'\n+ continue\n+ for f in '/dev/disk/by-path/*'\n+ '[' '!' -e /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0 ']'\n++ readlink -f /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0\n+ dev=/dev/sdb\n+ grep -q 'TYPE=\"disk\"'\n+ lsblk --nodeps --pairs /dev/sdb\n+ '[' 0 -ne 0 ']'\n+ set -e\n+ part_no=1\n+ ceph_disk=false\n++ awk '$1 == \"part\" {print $2}'\n++ lsblk -rip /dev/sdb -o TYPE,NAME\n+ for part in '$(lsblk -rip ${dev} -o TYPE,NAME | awk '\\''$1 == \"part\" {print $2}'\\'')'\n++ flock /dev/sdb sgdisk -i 1 /dev/sdb\n+ sgdisk_part_info='Partition GUID code: 21686148-6449-6E6F-744E-656564454649 (BIOS boot partition)\nPartition unique GUID: 4C20D594-3538-4890-A8FC-49B897430427\nFirst sector: 2048 (at 1024.0 KiB)\nLast sector: 4095 (at 2.0 MiB)\nPartition size: 2048 sectors (1024.0 KiB)\nAttribute flags: 0000000000000000\nPartition name: '\\'''\\'''\n++ awk '{print $4;}'\n++ grep 'Partition GUID code'\n++ echo 'Partition GUID code: 21686148-6449-6E6F-744E-656564454649 (BIOS boot partition)\nPartition unique GUID: 4C20D594-3538-4890-A8FC-49B897430427\nFirst sector: 2048 (at 1024.0 KiB)\nLast sector: 4095 (at 2.0 MiB)\nPartition size: 2048 sectors (1024.0 KiB)\nAttribute flags: 0000000000000000\nPartition name: '\\'''\\'''\n+ guid=21686148-6449-6E6F-744E-656564454649\n+ '[' 21686148-6449-6E6F-744E-656564454649 == 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D ']'\n+ '[' 21686148-6449-6E6F-744E-656564454649 == 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 ']'\n+ part_no=2\n+ for part in '$(lsblk -rip ${dev} -o TYPE,NAME | awk '\\''$1 == \"part\" {print $2}'\\'')'\n++ flock /dev/sdb sgdisk -i 2 /dev/sdb\n+ sgdisk_part_info='Partition GUID code: EBD0A0A2-B9E5-4433-87C0-68B6B72699C7 (Microsoft basic data)\nPartition unique GUID: C4279531-45D9-47B0-ADA7-F592B3781FE5\nFirst sector: 4096 (at 2.0 MiB)\nLast sector: 1028095 (at 502.0 MiB)\nPartition size: 1024000 sectors (500.0 MiB)\nAttribute flags: 0000000000000000\nPartition name: '\\'''\\'''\n++ awk '{print $4;}'\n++ grep 'Partition GUID code'\n++ echo 'Partition GUID code: EBD0A0A2-B9E5-4433-87C0-68B6B72699C7 (Microsoft basic data)\nPartition unique GUID: C4279531-45D9-47B0-ADA7-F592B3781FE5\nFirst sector: 4096 (at 2.0 MiB)\nLast sector: 1028095 (at 502.0 MiB)\nPartition size: 1024000 sectors (500.0 MiB)\nAttribute flags: 0000000000000000\nPartition name: '\\'''\\'''\n+ guid=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7\n+ '[' EBD0A0A2-B9E5-4433-87C0-68B6B72699C7 == 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D ']'\n+ '[' EBD0A0A2-B9E5-4433-87C0-68B6B72699C7 == 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 ']'\n+ part_no=3\n+ for part in '$(lsblk -rip ${dev} -o TYPE,NAME | awk '\\''$1 == \"part\" {print $2}'\\'')'\n++ flock /dev/sdb sgdisk -i 3 /dev/sdb\n+ sgdisk_part_info='Partition GUID code: EBD0A0A2-B9E5-4433-87C0-68B6B72699C7 (Microsoft basic data)\nPartition unique GUID: 403088A0-2690-4946-8578-CDD58DCA364A\nFirst sector: 1028096 (at 502.0 MiB)\nLast sector: 41988095 (at 20.0 GiB)\nPartition size: 40960000 sectors (19.5 GiB)\nAttribute flags: 0000000000000000\nPartition name: '\\'''\\'''\n++ awk '{print $4;}'\n++ grep 'Partition GUID code'\n++ echo 'Partition GUID code: EBD0A0A2-B9E5-4433-87C0-68B6B72699C7 (Microsoft basic data)\nPartition unique GUID: 403088A0-2690-4946-8578-CDD58DCA364A\nFirst sector: 1028096 (at 502.0 MiB)\nLast sector: 41988095 (at 20.0 GiB)\nPartition size: 40960000 sectors (19.5 GiB)\nAttribute flags: 0000000000000000\nPartition name: '\\'''\\'''\n+ guid=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7\n+ '[' EBD0A0A2-B9E5-4433-87C0-68B6B72699C7 == 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D ']'\n+ '[' EBD0A0A2-B9E5-4433-87C0-68B6B72699C7 == 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 ']'\n+ part_no=4\n+ for part in '$(lsblk -rip ${dev} -o TYPE,NAME | awk '\\''$1 == \"part\" {print $2}'\\'')'\n++ flock /dev/sdb sgdisk -i 4 /dev/sdb\n+ sgdisk_part_info='Partition GUID code: E6D6D379-F507-44C2-A23C-238F2A3DF928 (Linux LVM)\nPartition unique GUID: 219FE0A2-734D-45C3-998E-7B5A5B359431\nFirst sector: 41988096 (at 20.0 GiB)\nLast sector: 522235903 (at 249.0 GiB)\nPartition size: 480247808 sectors (229.0 GiB)\nAttribute flags: 0000000000000000\nPartition name: '\\'''\\'''\n++ awk '{print $4;}'\n++ grep 'Partition GUID code'\n++ echo 'Partition GUID code: E6D6D379-F507-44C2-A23C-238F2A3DF928 (Linux LVM)\nPartition unique GUID: 219FE0A2-734D-45C3-998E-7B5A5B359431\nFirst sector: 41988096 (at 20.0 GiB)\nLast sector: 522235903 (at 249.0 GiB)\nPartition size: 480247808 sectors (229.0 GiB)\nAttribute flags: 0000000000000000\nPartition name: '\\'''\\'''\n+ guid=E6D6D379-F507-44C2-A23C-238F2A3DF928\n+ '[' E6D6D379-F507-44C2-A23C-238F2A3DF928 == 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D ']'\n+ '[' E6D6D379-F507-44C2-A23C-238F2A3DF928 == 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 ']'\n+ part_no=5\n+ '[' false == true ']'\n+ set +e\n+ for f in '/dev/disk/by-path/*'\n+ '[' '!' -e /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0-part1 ']'\n++ readlink -f /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0-part1\n+ dev=/dev/sdb1\n+ grep -q 'TYPE=\"disk\"'\n+ lsblk --nodeps --pairs /dev/sdb1\n+ '[' 1 -ne 0 ']'\n+ continue\n+ for f in '/dev/disk/by-path/*'\n+ '[' '!' -e /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0-part2 ']'\n++ readlink -f /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0-part2\n+ dev=/dev/sdb2\n+ grep -q 'TYPE=\"disk\"'\n+ lsblk --nodeps --pairs /dev/sdb2\n+ '[' 1 -ne 0 ']'\n+ continue\n+ for f in '/dev/disk/by-path/*'\n+ '[' '!' -e /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0-part3 ']'\n++ readlink -f /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0-part3\n+ dev=/dev/sdb3\n+ grep -q 'TYPE=\"disk\"'\n+ lsblk --nodeps --pairs /dev/sdb3\n+ '[' 1 -ne 0 ']'\n+ continue\n+ for f in '/dev/disk/by-path/*'\n+ '[' '!' -e /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0-part4 ']'\n++ readlink -f /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0-part4\n+ dev=/dev/sdb4\n+ grep -q 'TYPE=\"disk\"'\n+ lsblk --nodeps --pairs /dev/sdb4\n+ '[' 1 -ne 0 ']'\n+ continue\n+ for f in '/dev/disk/by-path/*'\n+ '[' '!' -e /dev/disk/by-path/pci-0000:00:1f.2-ata-2.0 ']'\n++ readlink -f /dev/disk/by-path/pci-0000:00:1f.2-ata-2.0\n+ dev=/dev/sdc\n+ grep -q 'TYPE=\"disk\"'\n+ lsblk --nodeps --pairs /dev/sdc\n+ '[' 0 -ne 0 ']'\n+ set -e\n+ part_no=1\n+ ceph_disk=false\n++ awk '$1 == \"part\" {print $2}'\n++ lsblk -rip /dev/sdc -o TYPE,NAME\n+ '[' false == true ']'\n+ set +e\n+ for f in '/dev/disk/by-path/*'\n+ '[' '!' -e /dev/disk/by-path/pci-0000:00:1f.2-ata-3.0 ']'\n++ readlink -f /dev/disk/by-path/pci-0000:00:1f.2-ata-3.0\n+ dev=/dev/sdd\n+ grep -q 'TYPE=\"disk\"'\n+ lsblk --nodeps --pairs /dev/sdd\n+ '[' 0 -ne 0 ']'\n+ set -e\n+ part_no=1\n+ ceph_disk=false\n++ awk '$1 == \"part\" {print $2}'\n++ lsblk -rip /dev/sdd -o TYPE,NAME\n+ '[' false == true ']'\n+ set +e\n"}

PLAY RECAP **************************************************************************************************************************************************************************************************
localhost : ok=88 changed=12 unreachable=0 failed=1

Test Activity
-------------
installation

Peng Peng (ppeng) wrote :
Yang Liu (yliu12) on 2019-10-08
tags: added: stx.retestneeded
Changed in starlingx:
assignee: nobody → Ovidiu Poncea (ovidiu.poncea)
Ovidiu Poncea (ovidiu.poncea) wrote :

I run ansible-playbook /usr/share/ansible/stx-ansible/playbooks/bootstrap.yml 3 times in a row and it is not reproducing. So re-play in a normal deployment works.

Peng, a few questions:
1. can you upload lab-install-playbook.yaml and local-install-overrides.yaml? I want to try also with them.

2. Is this 100% reproducible?

3. Was this run locally? It seems so, but I need confirmation.

================================

Looking over the above logs it seems that the wipe-ceph-osds script executes correctly yet it throws this: "privilege output closed while waiting for password prompt:"

The error happens here:

- block:
  - name: Wipe ceph osds
    script: wipe_osds.sh
    register: results
    become: yes
    become_user: root

Which is called from here:
    - { role: bootstrap/validate-config, when: not skip_play, become: yes }

Notice that both have become: yes, which means that ansible escalates to root privileges twice.

Now, searching for similar issues, I stumbled upon a 2016 Ansible bug, https://github.com/ansible/ansible/issues/13763 which has an identical error, and, although the bug got fixed and the condition is different, one comment is important:

"I am going to guess that perhaps this is because on attempt one, sudo password is asked, and by attempt 2 sudo doesnt need a password. Thus the command completes but looks like an error and ansible stops processing the rest"

Which seems to be the case here, we are asking twice for the root privileges. This shouldn't be a problem but it is... which could lead to an Ansible issue.

For more details I need to reproduce it. The fix may be as simple as removing
    become: yes
    become_user: root
from the code or as bad as fixing Ansible itself.

Ovidiu

Peng Peng (ppeng) wrote :

I ran it twice. Both of them hit the same issue

Peng Peng (ppeng) wrote :
Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0 / medium priority - this affects the ability to re-run ansible playbooks which is a supported operation

tags: added: stx.3.0 stx.config
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.storage
Ovidiu Poncea (ovidiu.poncea) wrote :

Retried with lab-install-playbook.yaml and it does not reproduce. I need the setup where it reproduces, when you get time please retest and once it reproduces ping me.

Thank you!

Fix proposed to branch: master
Review: https://review.opendev.org/689730

Changed in starlingx:
status: Triaged → In Progress

Reviewed: https://review.opendev.org/689730
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=7b9d8c9a0d828ad3621596923301fe7a9ce4bb8d
Submitter: Zuul
Branch: master

commit 7b9d8c9a0d828ad3621596923301fe7a9ce4bb8d
Author: Ovidiu Poncea <email address hidden>
Date: Mon Oct 21 15:02:56 2019 +0300

    B&R: Remove unnecessary privilege escalation

    The whole role gets privilege escalation, no need to do it
    one more time for wipe_ceph_osds.

    Change-Id: Ib2a6b90934c1f8ac0178e2bf0e7af9203d00751c
    Closes-Bug: 1847336
    Signed-off-by: Ovidiu Poncea <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Peng Peng (ppeng) wrote :

Not found same issue in recent DC installation

tags: removed: stx.retestneeded
Yosief Gebremariam (ygebrema) wrote :

This issue still appears to exist in the distributed cloud lab on replay. Logs are attached.

Frank Miller (sensfan22) wrote :

Changing state from Fix Released back to Triaged as the issue has occurred again in a load that includes this commit: https://review.opendev.org/689730

Additional investigation is required. Will request access to the lab where the issue occurred in order to debug since Ovidiu and Tyler cannot reproduce in their lab.

Changed in starlingx:
status: Fix Released → Triaged
Yang Liu (yliu12) on 2019-11-19
tags: added: stx.retestneeded
Austin Sun (sunausti) wrote :

Hi, Yang:
    is this issue reproduced recently ?

Thanks.
BR
Austin Sun.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.