Error with Ceph OSD

Bug #1844332 reported by Mariano Ucha
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
High
chen haochuan

Bug Description

Brief Description
-----------------

After unlocking the controller-o in a AIO-Simplex i seen problems with Ceph. Ceph OSD is not
working properly.

Severity
--------

Major

Steps to Reproduce
------------------

After using the guide provided in:

https://docs.starlingx.io/deploy_install_guides/current/bare_metal_aio_simplex.html

I unlocked the controller-0. It reboot and i have no service, then i reboot the node again
and the system seems to be running, but i found the error in Ceph OSD.

controller-0:/var/log/ceph$ ceph -s
  cluster:
    id: 6cbe0ddd-f791-4226-8530-7a8347f12437
    health: HEALTH_WARN
            Reduced data availability: 64 pgs inactive

  services:
    mon: 1 daemons, quorum controller-0
    mgr: controller-0(active)
    osd: 1 osds: 0 up, 0 in

  data:
    pools: 1 pools, 64 pgs
    objects: 0 objects, 0 B
    usage: 0 B used, 0 B / 0 B avail
    pgs: 100.000% pgs unknown
             64 unknown

 [sysadmin@controller-0 ceph(keystone_admin)]$ system host-stor-list controller-0

+--------------------------------------+----------+-------+------------+--------------------------------------+----------------------------------+--------------+------------------+-----------+
| uuid | function | osdid | state | idisk_uuid | journal_path | journal_node | journal_size_gib | tier_name |
+--------------------------------------+----------+-------+------------+--------------------------------------+----------------------------------+--------------+------------------+-----------+
| c04938b2-cb80-411b-af2a-1a6b82d13df4 | osd | 0 | configured | c99b1eb9-789e-4f55-ae65-96d3bb147224 | /dev/disk/by-path/pci-0000:03:00 | /dev/sdb2 | 1 | storage |
| | | | | | .0-scsi-0:1:0:1-part2 | | | |
| | | | | | | | | |
+--------------------------------------+----------+-------+------------+--------------------------------------+----------------------------------+--------------+------------------+-----------+
[sysadmin@controller-0 ceph(keystone_admin)]$
[sysadmin@controller-0 ceph(keystone_admin)]$ system host-disk-list controller-0
+--------------------------------------+-----------+---------+---------+-------+------------+--------------+----------+-------------------------------------------------+
| uuid | device_no | device_ | device_ | size_ | available_ | rpm | serial_i | device_path |
| | de | num | type | gib | gib | | d | |
+--------------------------------------+-----------+---------+---------+-------+------------+--------------+----------+-------------------------------------------------+
| d04d36cf-abc2-4b0b-b911-028b9eaebf82 | /dev/sda | 2048 | HDD | 300.0 | 16.977 | Undetermined | PCQVU0CR | /dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:0 |
| | | | | | | | H4J1HN | |
| | | | | | | | | |
| c99b1eb9-789e-4f55-ae65-96d3bb147224 | /dev/sdb | 2064 | HDD | 538. | 0.0 | Undetermined | PCQVU0CR | /dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:1 |
| | | | | 33 | | | H4J1HN | |
| | | | | | | | | |
+--------------------------------------+-----------+---------+---------+-------+------------+--------------+----------+-------------------------------------------------+

Expected Behavior
------------------
Ceph OSD expected to be up and running

Actual Behavior
----------------

Ceph OSD is not UP.

Reproducibility
---------------
Reproducible

I installed the official image of R2 and have the same problem.

System Configuration
--------------------

AIO-Simplex with IPv4 in a blade node ProLiant BL460c

Branch/Pull Time/Commit
-----------------------

###
### StarlingX
### Built from master
###

OS="centos"
SW_VERSION="19.09"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20190915T230000Z"

JOB="STX_build_master_master"
<email address hidden>"
BUILD_NUMBER="251"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-09-15 23:00:00 +0000"

Timestamp/Logs
--------------

I provide attached logs from /var/log and /etc

Test Activity
-------------

Evaluation

Tags: stx.2.0
Revision history for this message
Mariano Ucha (marianoucha) wrote :
tags: added: stx.2.0
Changed in starlingx:
assignee: nobody → chen haochuan (martin1982)
Revision history for this message
Mariano Ucha (marianoucha) wrote :

Martin here are the outputs of the commands that you told me.

controller-0:~$ sudo mount /dev/sdb1 /var/lib/ceph/osd/ceph-0
mount: /dev/sdb1 is already mounted or /var/lib/ceph/osd/ceph-0 busy
       /dev/sdb1 is already mounted on /var/lib/ceph/osd/ceph-0
controller-0:~$ ls /var/lib/ceph/osd/ceph-0 -l
total 20
-rw-r--r-- 1 root root 37 Sep 16 18:49 ceph_fsid
-rw-r--r-- 1 root root 37 Sep 16 18:49 fsid
lrwxrwxrwx 1 root root 58 Sep 16 18:49 journal -> /dev/disk/by-partuuid/18b04f21-9511-45b5-bc83-bf9b0678c617
-rw-r--r-- 1 root root 37 Sep 16 18:49 journal_uuid
-rw-r--r-- 1 root root 21 Sep 16 18:49 magic
-rw-r--r-- 1 root root 10 Sep 16 18:49 type
controller-0:~$

controller-0:/home/sysadmin# /usr/sbin/ceph-disk list | grep -v 'unknown cluster' | grep " *$(readlink -f /dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:1).*ceph data" | grep -v unprepared | grep 'osd uuid c04938b2-cb80-411b-af2a-1a6b82d13df4'
/usr/lib/python2.7/site-packages/ceph_disk/main.py:5707: UserWarning:
*******************************************************************************
This tool is now deprecated in favor of ceph-volume.
It is recommended to use ceph-volume for OSD deployments. For details see:

    http://docs.ceph.com/docs/master/ceph-volume/#migrating

*******************************************************************************

  warnings.warn(DEPRECATION_WARNING)
/usr/lib/python2.7/site-packages/ceph_disk/main.py:5739: UserWarning:
*******************************************************************************
This tool is now deprecated in favor of ceph-volume.
It is recommended to use ceph-volume for OSD deployments. For details see:

    http://docs.ceph.com/docs/master/ceph-volume/#migrating

*******************************************************************************

  warnings.warn(DEPRECATION_WARNING)
 /dev/sdb1 ceph data, active, cluster ceph, osd uuid c04938b2-cb80-411b-af2a-1a6b82d13df4, journal /dev/sdb2
controller-0:/home/sysadmin# echo $?
0
controller-0:/home/sysadmin#

Revision history for this message
Mariano Ucha (marianoucha) wrote :
Revision history for this message
Mariano Ucha (marianoucha) wrote :
Revision history for this message
Mariano Ucha (marianoucha) wrote :

New output, all seems to be equal.

controller-0:~$ sudo mount /dev/sdb1 /var/lib/ceph/osd/ceph-0
mount: /dev/sdb1 is already mounted or /var/lib/ceph/osd/ceph-0 busy
       /dev/sdb1 is already mounted on /var/lib/ceph/osd/ceph-0
controller-0:~$
controller-0:~$ ls /var/lib/ceph/osd/ceph-0 -l
total 20
-rw-r--r-- 1 root root 37 Sep 19 11:21 ceph_fsid
-rw-r--r-- 1 root root 37 Sep 19 11:21 fsid
lrwxrwxrwx 1 root root 58 Sep 19 11:21 journal -> /dev/disk/by-partuuid/cd1ceaaf-2cce-44cd-b631-9e75d2b0fcd6
-rw-r--r-- 1 root root 37 Sep 19 11:21 journal_uuid
-rw-r--r-- 1 root root 21 Sep 19 11:21 magic
-rw-r--r-- 1 root root 10 Sep 19 11:21 type
controller-0:~$
controller-0:~$ sudo /usr/sbin/ceph-disk list | grep -v 'unknown cluster' | grep " *$(readlink -f /dev/disk/by-path/pci-0000:03:00.0-scsi-0:1:0:1).*ceph data" | grep -v unprepared
/usr/lib/python2.7/site-packages/ceph_disk/main.py:5707: UserWarning:
*******************************************************************************
This tool is now deprecated in favor of ceph-volume.
It is recommended to use ceph-volume for OSD deployments. For details see:

    http://docs.ceph.com/docs/master/ceph-volume/#migrating

*******************************************************************************

  warnings.warn(DEPRECATION_WARNING)
/usr/lib/python2.7/site-packages/ceph_disk/main.py:5739: UserWarning:
*******************************************************************************
This tool is now deprecated in favor of ceph-volume.
It is recommended to use ceph-volume for OSD deployments. For details see:

    http://docs.ceph.com/docs/master/ceph-volume/#migrating

*******************************************************************************

  warnings.warn(DEPRECATION_WARNING)
 /dev/sdb1 ceph data, active, cluster ceph, osd uuid 20640458-c15b-45cc-8ca1-1fa1203f2261, journal /dev/sdb2
controller-0:~$ echo $?
0

Cindy Xie (xxie1)
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Mariano Ucha (marianoucha) wrote :

WorkAround for this.

1) dd if=/dev/zero of=/dev/sdb <-- Destory all info on OSD and Journal disk
2) Reboot
3) Lock system
4) Unlock system

But no idea about the main problem. :-(

Changed in starlingx:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers