Storage node fails to unlock when installing

Bug #1800889 reported by Daniel Badea
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Daniel Badea

Bug Description

Storage node fails to unlock when installing

Brief Description
-----------------
Applying puppet manifest fails on storage node when running ceph-disk prepare command with the following log messages:

  2018-10-18T18:52:11.347 ^[[0;36mDebug: 2018-10-18 18:52:11 +0000 Executing: '/usr/sbin/ceph-disk prepare --cluster ceph --cluster-uuid eabae35e-db1f-4b51-90b2-65558c268441 --osd-uuid 3c994f0f-310f-413c-9df4-edc286b89118 --fs-t$
  2018-10-18T18:52:11.349 '''^[[0m
  2018-10-18T18:52:25.894 ^[[1;31mError: 2018-10-18 18:52:25 +0000 ^GCaution: invalid backup GPT header, but valid main header; regenerating
  2018-10-18T18:52:25.897 backup header from main header.
  ...
  2018-10-18T18:52:25.940 The operation has completed successfully.
  2018-10-18T18:52:25.942 sh: line 1: : command not found
  ...
  2018-10-18T18:52:26.040 ^[[1;31mError: 2018-10-18 18:52:25 +0000 /Stage[main]/Platform::Ceph::Storage/Platform_ceph_osd[stor-4]/Ceph::Osd[/dev/disk/by-path/pci-0000:86:00.0-nvme-1]/Exec[ceph-osd-prepare-/dev/disk/by-path/pci-0000:86:00.0-nvme-1]/returns: change from notrun to 0 failed: ^GCaution: invalid backup GPT header, but valid main header; regenerating
  ...

Severity
--------
Major (2nd unlock is successful)

Steps to Reproduce
------------------
1. install setup with dedicated storage
2. unlock storage node

Expected Behavior
------------------
Storage node should be able to configure OSDs and unlock successfully.

Actual Behavior
----------------
Applying puppet manifest fails on Exec[ceph-osd-prepare-...] and storage node reboots.
Lock & unlock storage node seems to fix the issue.

Reproducibility
---------------
Depends on status of nodes before install (probably if OSDs were wiped or not).

System Configuration
--------------------
Dedicated storage.

Branch/Pull Time/Commit
-----------------------
master

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Daniel Badea (daniel.badea)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-integ (master)

Reviewed: https://review.openstack.org/614627
Committed: https://git.openstack.org/cgit/openstack/stx-integ/commit/?id=2936d5d56892ee98b7c5613714f7e89545614123
Submitter: Zuul
Branch: master

commit 2936d5d56892ee98b7c5613714f7e89545614123
Author: Daniel Badea <email address hidden>
Date: Wed Oct 31 19:34:14 2018 +0000

    ceph-disk prepare invalid data disk value

    ceph-disk prepare data OSD parameter contains a new line causing
    puppet manifest to fail:

    1. $data = generate('/bin/bash','-c',"/bin/readlink -f ${name}")

       is expanded together with a new line in:

       exec { $ceph_prepare:
         command => "/usr/sbin/ceph-disk prepare ${cluster_option}
                        ${cluster_uuid_option} ${uuid_option}
                        --fs-type xfs --zap-disk ${data} ${journal}"

       just before ${journal} is expanded. Puppet reports:

         sh: line 1: : command not found

       when trying to run '' (default journal value).

    2. 'readlink' should be called when running ceph-disk prepare
       command, not when the puppet resource is defined. Let
       exec's shell call readlink instead of using puppet's
       generate() . See also:

         https://github.com/openstack/puppet-ceph/commit/ff2b2e689846dd3d980c7c706c591e8cfb8f33a9

    Added --verbose and --log-stdout options to log commands executed
    by 'ceph-disk prepare' and identify where it fails.

    Closes-Bug: 1800889
    Change-Id: I6b71147706edb97d5a1e6579924d45b999efe98f
    Signed-off-by: Daniel Badea <email address hidden>

Changed in starlingx:
status: New → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.2019.03 stx.distro.openstack
tags: added: stx.config
removed: stx.distro.openstack
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to stx-integ (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/616129

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to stx-integ (master)

Reviewed: https://review.openstack.org/616129
Committed: https://git.openstack.org/cgit/openstack/stx-integ/commit/?id=2994a26b9edbd1ea038855dde5e1bf5492179ee7
Submitter: Zuul
Branch: master

commit 2994a26b9edbd1ea038855dde5e1bf5492179ee7
Author: Robert Church <email address hidden>
Date: Wed Nov 7 00:41:17 2018 -0500

    Fix exec check which restricts ceph-prepare execution

    Minor patch update to the command that checks to see if an OSD should be
    erased.

    The recent bug fix attempts to perform command expansion within a
    literal string. This will always fail and cause the OSD to be
    reformatted.

    Change the single quotes to double quotes to allow the command expansion
    to execute successfully.

    Change-Id: I522207e7e5f7a428fb3a962ab6a13574f73bb3c9
    Related-Bug: #1800889
    Signed-off-by: Robert Church <email address hidden>

Ken Young (kenyis)
tags: added: stx.2019.05
removed: stx.2019.03
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.