StarlingX

Ceph osd process was not recovered after lock and unlock on storage node with journal disk

Bug #1830736 reported by Anujeyan Manokeran on 2019-05-28

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Fix Released	High	chen haochuan

Bug Description

Brief Description
-----------------.
After storage node(storage-0) was locked and unlocked never recovered due to ceph (osd.0, osd.1, ) process. Auto recovery was not successful. Storage-0 was not recovered.

Send 'system --os-username [2019-05-26 02:02:28,248] 387 +---------------------------controller-http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-show storage-0'
DEBUG MainThread ssh.expect :: Output:
/>--------+---------------------------------------------------------------+
|
/>--------+---------------------------------------------------------------+
|
|
|
|
|
|
| /dev/disk/by-path/pci-0000:83:00.0-nvme-1 |
| {u'stor_function': u'monitor'} |
| bfa21b40-4963-486c-9d0b-a978245be1ef |
|
| bfa21b40-4963-486c-9d0b-a978245be1ef |
|
| 2019-05-25T16:15:52.115877+00:00 |
|
|
|
|
|
|
|
|
|
|
| {u'hosts': [u'storage-1', u'storage-0'], u'name': u'group-0'} |
|
|
| /dev/disk/by-path/pci-0000:83:00.0-nvme-1 |
|
|
|
|
|
| 2019-05-26T02:02:22.902168+00:00 |
|
| a56c9e15-ca13-4c26-b9b2-c6d93e14c8da |
|
/>--------+---------------------------------------------------------------+
/>0 ~(keystone_admin)]$

Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock storage-0'
[2019-05-26 02:03:03,616] 387 DEBUG MainThread ssh.expect :: Output:
+---------------------+---------------------------------------------------------------+
| Property | Value |
+---------------------+---------------------------------------------------------------+
| action | none |
| administrative | locked |
| availability | online |
| bm_ip | 128.224.64.220 |
| bm_type | bmc |
| bm_username | root |
| boot_device | /dev/disk/by-path/pci-0000:83:00.0-nvme-1 |
| capabilities | {u'stor_function': u'monitor'} |
| config_applied | bfa21b40-4963-486c-9d0b-a978245be1ef |
| config_status | None |
| config_target | bfa21b40-4963-486c-9d0b-a978245be1ef |
| console | ttyS0,115200 |
| created_at | 2019-05-25T16:15:52.115877+00:00 |
| hostname | storage-0 |
| id | 6 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| invprovision | provisioned |
| location | {} |
| mgmt_ip | 192.168.204.191 |
| mgmt_mac | 90:e2:ba:c6:95:ec |
| operational | disabled |
| peers | {u'hosts': [u'storage-1', u'storage-0'], u'name': u'group-0'} |
| personality | storage |
| reserved | False |
| rootfs_device | /dev/disk/by-path/pci-0000:83:00.0-nvme-1 |
| serialid | None |
| software_load | 19.05 |
| task | Unlocking |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2019-05-26T02:02:22.902168+00:00 |
| uptime | 33217 |
| uuid | a56c9e15-ca13-4c26-b9b2-c6d93e14c8da |
| vim_progress_status | services-disabled |
+---------------------+---------------------------------------------------------------+

DEBUG MainThread ssh.exec_cmd:: Executing command...
[2019-05-26 02:25:41,372] 262 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-show storage-0'
[2019-05-26 02:25:43,014] 387 DEBUG MainThread ssh.expect :: Output:
+---------------------+---------------------------------------------------------------+
| Property | Value |
+---------------------+---------------------------------------------------------------+
| action | none |
| administrative | unlocked |
| availability | failed |
| bm_ip | 128.224.64.220 |
| bm_type | bmc |
| bm_username | root |
| boot_device | /dev/disk/by-path/pci-0000:83:00.0-nvme-1 |
| capabilities | {u'stor_function': u'monitor'} |
| config_applied | bfa21b40-4963-486c-9d0b-a978245be1ef |
| config_status | None |
| config_target | bfa21b40-4963-486c-9d0b-a978245be1ef |
| console | ttyS0,115200 |
| created_at | 2019-05-25T16:15:52.115877+00:00 |
| hostname | storage-0 |
| id | 6 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| invprovision | provisioned |
| location | {} |
| mgmt_ip | 192.168.204.191 |
| mgmt_mac | 90:e2:ba:c6:95:ec |
| operational | disabled |
| peers | {u'hosts': [u'storage-1', u'storage-0'], u'name': u'group-0'} |
| personality | storage |
| reserved | False |
| rootfs_device | /dev/disk/by-path/pci-0000:83:00.0-nvme-1 |
| serialid | None |
| software_load | 19.05 |
| task | Service Failure, threshold reached, Lock/Unlock to retry |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2019-05-26T02:24:51.733772+00:00 |
| uptime | 841 |
| uuid | a56c9e15-ca13-4c26-b9b2-c6d93e14c8da |
| vim_progress_status | services-disabled |
+---------------------+---------------------------------------------------------------+
[wrsroot@controller-0 ~(keystone_admin)]$

019-05-26 03:09:00,908] 387 DEBUG MainThread ssh.expect :: Output:
fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internal
URL --os-region-name RegionOne alarm-list --nowrap --uuid
+--------------------------------------+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+----------+----------------------------+
| 94c0c118-4cb0-45af-a72e-429058555b9b | 200.006 | storage-0 is degraded due to the failure of its 'ceph (osd.0, osd.1, )' process. Auto recovery of this major process is in progress. | host=storage-0.process=ceph (osd.0, osd.1, ) | major | 2019-05-26T02:14:47.616363 |
| 574b7d0e-426d-4d34-9733-dff2d693b1dd | 200.004 | storage-0 experienced a service-affecting failure. Auto-recovery in progress. Manual Lock and Unlock may be required if auto-recovery is unsuccessful. | host=storage-0 | critical | 2019-05-26T02:07:44.159670 |
| 484d6b73-fe37-459c-80dd-f943dd6cc81e | 800.011 | Loss of replication in replication group group-0: OSDs are down | cluster=bafbb0f1-b04c-42b6-a492-03911e6c21f3.peergroup=group-0.host=storage-0 | major | 2019-05-26T02:02:27.949793 |
| e8a473ea-1e6d-4f34-93a7-89f253c1ccc7 | 800.001 | Storage Alarm Condition: HEALTH_WARN [PGs are degraded/stuck or undersized]. Please check 'ceph -s' for more details. | cluster=bafbb0f1-b04c-42b6-a492-03911e6c21f3 | warning | 2019-05-26T02:02:27.676319 |
+--------------------------------------+----------+-----------------------------------

Severity
--------.
Major
Steps to Reproduce
------------------
1. Install storage lab with open stack application as per install procedure.
2. lock and unlock the storage node.
3. storage node was not in available state as per description.
Expected Behavior
------------------
No failure on osd after unlock and storage in available state.

Actual Behavior
----------------
storage-0 was not recovered . It was on failed mode.
Reproducibility
---------------

System Configuration
--------------------
Regular system
Branch/Pull Time/Commit
-----------------------
BUILD_DATE=":2019-05-24_17-39-51"

Last Pass
---------
20190503T013000Z

Timestamp/Logs
--------------
2019-05-26T02:02:22.902168+00:00

Test Activity
-------------
Regression test

See original description

Tags:

Anujeyan Manokeran (anujeyan) on 2019-05-28

description:

updated

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-05-29:

Marking as release gating; appears related to ceph and requires further investigation.

tags:	added: stx.storage
tags:	added: stx.2.0
Changed in starlingx:
importance:	Undecided → High
importance:	High → Medium
status:	New → Triaged
assignee:	nobody → Cindy Xie (xxie1)

Numan Waheed (nwaheed) on 2019-05-29

tags:

added: stx.retestneeded

yong hu (yhu6) on 2019-05-31

Changed in starlingx:
assignee:	Cindy Xie (xxie1) → nobody
assignee:	nobody → chen haochuan (martin1982)

Revision history for this message

chen haochuan (martin1982) wrote on 2019-05-31:

storage-0-lock-unlock.txt Edit (13.8 KiB, text/plain)

I checked storage-0 lock and unlock, with BUILD_ID="20190528T202529Z". After unlock, storage-0 could will turn to available with no error, check uploaded attachment.

Revision history for this message

Yang Liu (yliu12) wrote on 2019-05-31:

This is reproduced in the same lab in WR, the storage node that failed has journal disk configured just fyi.

storage-0:/home/wrsroot# /etc/init.d/ceph status
=== mon.storage-0 ===
{"version":"13.2.2","release":"mimic","release_type":"stable"}
mon.storage-0: running.
=== osd.0 ===
osd.0: not running.
=== osd.1 ===
osd.1: not running.
storage-0:/home/wrsroot#
storage-0:/home/wrsroot# /etc/init.d/ceph stop
=== osd.1 ===
storage-0:/home/wrsroot# /etc/init.d/ceph stop
=== osd.1 ===
Stopping Ceph osd.1 on storage-0...done
=== osd.0 ===
2019-05-30 17:58:23.946 7efc5f5751c0 -1 journal FileJournal::open: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected f89591cf-a74f-4cff-85ed-8f93103267e0, invalid (someone else's?) journal
2019-05-30 17:58:23.946 7efc5f5751c0 -1 filestore(/var/lib/ceph/osd/ceph-1) mount(1871): failed to open journal /var/lib/ceph/osd/ceph-1/journal: (22) Invalid argument
2019-05-30 17:58:23.947 7efc5f5751c0 -1 ** ERROR: error flushing journal /var/lib/ceph/osd/ceph-1/journal for object store /var/lib/ceph/osd/ceph-1: (22) Invalid argument
Stopping Ceph osd.0 on storage-0...done
=== mon.storage-0 ===
2019-05-30 17:58:24.105 7f7d57a5e1c0 -1 journal FileJournal::open: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected 6a6674e1-3450-4b0e-9618-b414bffff153, invalid (someone else's?) journal
2019-05-30 17:58:24.105 7f7d57a5e1c0 -1 filestore(/var/lib/ceph/osd/ceph-0) mount(1871): failed to open journal /var/lib/ceph/osd/ceph-0/journal: (22) Invalid argument
2019-05-30 17:58:24.105 7f7d57a5e1c0 -1 ** ERROR: error donehing journal /var/lib/ceph/osd/ceph-0/journal for object store /var/lib/ceph/osd/ceph-0: (22) Invalid argument
storage-0:/home/wrsroot# /etc/init.d/ceph stopl 31770...
=== osd.1 ===
Stopping Ceph osd.1 on storage-0...done
=== osd.0 ===
2019-05-30 18:00:15.760 7f70637011c0 -1 journal FileJournal::open: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected f89591cf-a74f-4cff-85ed-8f93103267e0, invalid (someone else's?) journal
2019-05-30 18:00:15.760 7f70637011c0 -1 filestore(/var/lib/ceph/osd/ceph-1) mount(1871): failed to open journal /var/lib/ceph/osd/ceph-1/journal: (22) Invalid argument
2019-05-30 18:00:15.761 7f70637011c0 -1 ** ERROR: error flushing journal /var/lib/ceph/osd/ceph-1/journal for object store /var/lib/ceph/osd/ceph-1: (22) Invalid argument
Stopping Ceph osd.0 on storage-0...done
=== mon.storage-0 ===
2019-05-30 18:00:15.923 7fcc881e71c0 -1 journal FileJournal::open: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected 6a6674e1-3450-4b0e-9618-b414bffff153, invalid (someone else's?) journal
2019-05-30 18:00:15.923 7fcc881e71c0 -1 filestore(/var/lib/ceph/osd/ceph-0) mount(1871): failed to open journal /var/lib/ceph/osd/ceph-0/journal: (22) Invalid argument
2019-05-30 18:00:15.923 7fcc881e71c0 -1 ** ERROR: error flushing journal /var/lib/ceph/osd/ceph-0/journal for object store /var/lib/ceph/osd/ceph-0: (22) Invalid argument
Stopping Ceph mon.storage-0 on storage-0...kill 45253...
done

This is reproduced in the same lab in WR, the storage node that failed has journal disk configured just fyi.

storage-0:/home/wrsroot# /etc/init.d/ceph status
=== mon.storage-0 ===
{"version":"13.2.2","release":"mimic","release_type":"stable"}
mon.storage-0: running.
=== osd.0 ===
osd.0: not running.
=== osd.1 ===
osd.1: not running.
storage-0:/home/wrsroot#
storage-0:/home/wrsroot# /etc/init.d/ceph stop
=== osd.1 ===
storage-0:/home/wrsroot# /etc/init.d/ceph stop
=== osd.1 ===
Stopping Ceph osd.1 on storage-0...done
=== osd.0 ===
2019-05-30 17:58:23.946 7efc5f5751c0 -1 journal FileJournal::open: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected f89591cf-a74f-4cff-85ed-8f93103267e0, invalid (someone else's?) journal
2019-05-30 17:58:23.946 7efc5f5751c0 -1 filestore(/var/lib/ceph/osd/ceph-1) mount(1871): failed to open journal /var/lib/ceph/osd/ceph-1/journal: (22) Invalid argument
2019-05-30 17:58:23.947 7efc5f5751c0 -1  ** ERROR: error flushing journal /var/lib/ceph/osd/ceph-1/journal for object store /var/lib/ceph/osd/ceph-1: (22) Invalid argument
Stopping Ceph osd.0 on storage-0...done
=== mon.storage-0 ===
2019-05-30 17:58:24.105 7f7d57a5e1c0 -1 journal FileJournal::open: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected 6a6674e1-3450-4b0e-9618-b414bffff153, invalid (someone else's?) journal
2019-05-30 17:58:24.105 7f7d57a5e1c0 -1 filestore(/var/lib/ceph/osd/ceph-0) mount(1871): failed to open journal /var/lib/ceph/osd/ceph-0/journal: (22) Invalid argument
2019-05-30 17:58:24.105 7f7d57a5e1c0 -1  ** ERROR: error donehing journal /var/lib/ceph/osd/ceph-0/journal for object store /var/lib/ceph/osd/ceph-0: (22) Invalid argument
storage-0:/home/wrsroot# /etc/init.d/ceph stopl  31770...
=== osd.1 ===
Stopping Ceph osd.1 on storage-0...done
=== osd.0 ===
2019-05-30 18:00:15.760 7f70637011c0 -1 journal FileJournal::open: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected f89591cf-a74f-4cff-85ed-8f93103267e0, invalid (someone else's?) journal
2019-05-30 18:00:15.760 7f70637011c0 -1 filestore(/var/lib/ceph/osd/ceph-1) mount(1871): failed to open journal /var/lib/ceph/osd/ceph-1/journal: (22) Invalid argument
2019-05-30 18:00:15.761 7f70637011c0 -1  ** ERROR: error flushing journal /var/lib/ceph/osd/ceph-1/journal for object store /var/lib/ceph/osd/ceph-1: (22) Invalid argument
Stopping Ceph osd.0 on storage-0...done
=== mon.storage-0 ===
2019-05-30 18:00:15.923 7fcc881e71c0 -1 journal FileJournal::open: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected 6a6674e1-3450-4b0e-9618-b414bffff153, invalid (someone else's?) journal
2019-05-30 18:00:15.923 7fcc881e71c0 -1 filestore(/var/lib/ceph/osd/ceph-0) mount(1871): failed to open journal /var/lib/ceph/osd/ceph-0/journal: (22) Invalid argument
2019-05-30 18:00:15.923 7fcc881e71c0 -1  ** ERROR: error flushing journal /var/lib/ceph/osd/ceph-0/journal for object store /var/lib/ceph/osd/ceph-0: (22) Invalid argument
Stopping Ceph mon.storage-0 on storage-0...kill  45253...
done

summary:

Ceph osd process was not recovered after lock and unlock on storage node
+ with journal disk

Revision history for this message

Tingjie Chen (silverhandy) wrote on 2019-07-05:

Was the journal device purged correctly and removed from any Ceph metadata on it?
There is similar issue in ceph community and to avoid is to ensure there are no partitions or signatures on the NVMe device.

Revision history for this message

chen haochuan (martin1982) wrote on 2019-07-09:

difficult to reproduce and filestore will be replace with bluestore in ceph later release

Changed in starlingx:
status:	Triaged → Incomplete

Revision history for this message

Cindy Xie (xxie1) wrote on 2019-07-31:

@Yang, do you have a manual way to recover the node once it is in failure state?

Revision history for this message

Cindy Xie (xxie1) wrote on 2019-07-31:

if it can be recovered by a manual way, then we are OK with Medium; otherwise, we'd like to raise the priority to High.

Revision history for this message

Yang Liu (yliu12) wrote on 2019-08-09:

Download full text (5.4 KiB)

No it is not recoverable, system needs to be reinstalled after that. And is 100% reproducible in WR environment.
I'm not sure if the journal device was purged successfully, but ceph was healthy on that storage node before lock/unlock. Any commands we can run to check?

[sysadmin@controller-0 ~(keystone_admin)]$ system host-disk-list storage-0
ssh +--------------------------------------+--------------+---------+---------+-------+------------+-----+--------------+-------------------------------------------+
| uuid                                 | device_node  | device_ | device_ | size_ | available_ | rpm | serial_id    | device_path                               |
|                                      |              | num     | type    | gib   | gib        |     |              |                                           |
+--------------------------------------+--------------+---------+---------+-------+------------+-----+--------------+-------------------------------------------+
| ca69b91a-7a38-4592-927b-be497f65c20d | /dev/nvme0n1 | 66305   | NVME    | 372.  | 0.0        | N/A | CVFT64740025 | /dev/disk/by-path/pci-0000:83:00.0-nvme-1 |
|                                      |              |         |         | 611   |            |     | 400GGN       |                                           |
|                                      |              |         |         |       |            |     |              |                                           |
| d3a6d5a7-06db-47d0-a7b1-c24b19f6d584 | /dev/nvme1n1 | 66304   | NVME    | 372.  | 370.609    | N/A | CVFT64740069 | /dev/disk/by-path/pci-0000:84:00.0-nvme-1 |
|                                      |              |         |         | 611   |            |     | 400GGN       |                                           |
|                                      |              |         |         |       |            |     |              |                                           |
| dbf30bdc-d25f-4f33-92be-c639b4a82b82 | /dev/nvme2n1 | 66310   | NVME    | 745.  | 0.0        | N/A | CVFT65130043 | /dev/disk/by-path/pci-0000:85:00.0-nvme-1 |
|                                      |              |         |         | 211   |            |     | 800HGN       |                                           |
|                                      |              |         |         |       |            |     |              |                                           |
| 29dce56f-dfbe-4127-a89c-5803fe5b18c8 | /dev/nvme3n1 | 66311   | NVME    | 745.  | 0.0        | N/A | CVFT65130048 | /dev/disk/by-path/pci-0000:86:00.0-nvme-1 |
|                                      |              |         |         | 211   |            |     | 800HGN       |                                           |
|                                      |              |         |         |       |            |     |              |                                           |
+--------------------------------------+--------------+---------+---------+-------+------------+-----+--------------+-------------------------------------------+

Changed in starlingx:
status:	Incomplete → Confirmed

Revision history for this message

chen haochuan (martin1982) wrote on 2019-08-12:

HI Liu Yang

If you reproduce, please help to this firstly. Thanks.

"ls -l /var/lib/ceph/osd/ceph-0"
"ls -l /var/lib/ceph/osd/ceph-1"

And such info on storage node, in /tmp/puppet/hieradata/host.yaml

platform::ceph::osds::journal_config: {}
platform::ceph::osds::osd_config:
  stor-1:
    data_path: !!python/unicode '/dev/disk/by-path/pci-0000:00:03.0-ata-2.0-part1'
    disk_path: !!python/unicode '/dev/disk/by-path/pci-0000:00:03.0-ata-2.0'
    journal_path: !!python/unicode '/dev/disk/by-path/pci-0000:00:03.0-ata-2.0-part2'
    osd_id: 0
    osd_uuid: !!python/unicode '70b3ad94-94b1-437f-b85c-064d9a89ad68'
    tier_name: !!python/unicode 'storage'

Revision history for this message

Cindy Xie (xxie1) wrote on 2019-08-13:

#10

make it high as the failure cannot be recovered according to Liu Yang.

Changed in starlingx:
importance:	Medium → High

Revision history for this message

Tingjie Chen (silverhandy) wrote on 2019-08-13:

#11

It seems the issue logs is the same with https://bugs.launchpad.net/starlingx/+bug/1837242, but it is in nvme device.

Revision history for this message

Cindy Xie (xxie1) wrote on 2019-08-14:

#12

Found out the provision steps are different from Martin - need to reproduce it from Shanghai lab. Martin is working on align w/ Yang's provision steps.

Revision history for this message

chen haochuan (martin1982) wrote on 2019-08-19:

#13

issue reproduce in shanghai, with storage node provision with this command

system host-stor-add storage-0 journal 7cbc9885-476c-4ad2-9058-466f1e0f9667
system host-stor-add storage-0 osd 46393030-acbf-43f4-8ca9-f705f65bf457 --tier-uuid 4c672ca9-7c4b-472a-b049-eac115c8aef9

Revision history for this message

chen haochuan (martin1982) wrote on 2019-08-19:

#14

$ system host-lock storage-0 -f

And quickly correct this file on active controller /opt/platform/puppet/<storage-0 ip>.yaml

platform::ceph::osds::osd_config:
  stor-2:
    data_path: !!python/unicode '/dev/disk/by-path/pci-0000:00:17.0-ata-6.0-part1'
    disk_path: !!python/unicode '/dev/disk/by-path/pci-0000:00:17.0-ata-6.0'
    journal_path: !!python/unicode '/dev/disk/by-path/pci-0000:00:17.0-ata-2.0-part1'
    osd_id: 0
    osd_uuid: !!python/unicode 'fa90317d-3a7e-40db-9459-097e17e1eb90'
    tier_name: !!python/unicode 'storage'

change to

platform::ceph::osds::osd_config:
  stor-2:
    data_path: !!python/unicode '/dev/disk/by-path/pci-0000:00:17.0-ata-6.0-part1'
    disk_path: !!python/unicode '/dev/disk/by-path/pci-0000:00:17.0-ata-6.0'
    journal_path: !!python/unicode '/dev/disk/by-path/pci-0000:00:17.0-ata-6.0-part2'
    osd_id: 0
    osd_uuid: !!python/unicode 'fa90317d-3a7e-40db-9459-097e17e1eb90'
    tier_name: !!python/unicode 'storage'

Then storage-0 will be available.

Should be storage-disk-path and journal path wrong in database, which is write in storage node provision command "system host-stor-add"

Will check host-stor-add command.

Revision history for this message

chen haochuan (martin1982) wrote on 2019-08-22:

#15

confirm in ceph.pp

    Ceph::Osd {
      cluster => $cluster_name,
      cluster_uuid => $cluster_uuid,
    }

miss journal setting.

Study how to get journal_path in osd_config and set to class ceph::osd

Revision history for this message

Ovidiu Poncea (ovidiuponcea) wrote on 2019-08-28:

#16

Hello, the problem is not a problem when calling Ceph::Osd with the right journal path. Question is: are the DB entries correct for all external journals on the node?

The process is more complex. Journal partitions are (1) created in the system then (2) each partition is associated with an OSD.

Journal partitions are created in puppet through:
class platform::ceph::osds(
  $osd_config = {},
  $journal_config = {},
) inherits ::platform::ceph::params {
    ...
    create_resources('platform_ceph_journal', $journal_config)
    ...
}

and are assigned to osd's thourgh:

define platform_ceph_osd(
  $osd_id,
  $osd_uuid,
  $disk_path,
  $data_path,
  $journal_path,
  $tier_name,
) {

...
  -> exec { "configure journal location ${name}":
    logoutput => true,
    command => template('platform/ceph.journal.location.erb')
  }
...
}

Check ceph.journal.location.erb on what's called to create the journals.

Now, the first step is to check if journals are configured correctly in the DB (they must be one after the other: part1, part2 ... partN in consecutive order with no overlapping), then check them to also be ok in hieradata.
If they are right you should look in the OSD mount folder (check mount | grep osd) and see where the journal symbolic link is pointing if they are pointing ok then 'everything' is ok here.

If this breaks on lock/unlock w/o us doing any configuration change then I can only suspect that somthing changed either in DB or in generated yamls.

Revision history for this message

Ovidiu Poncea (ovidiuponcea) wrote on 2019-08-28:

#17

So, given that the process is different there is no need to add journal in:
Ceph::Osd {
      cluster => $cluster_name,
      cluster_uuid => $cluster_uuid,
    }

And please don't change it as it should no longer create the default collocated journal partition on the OSD, which is still needed even if users configure external journals (for failback).

Revision history for this message

chen haochuan (martin1982) wrote on 2019-08-29:

#18

Hi Ovidiu

Yes, ceph.journal.location.erb will create journal disk. But when "ceph disk prepare", in such command as below. (the below command, all excerpt from execution of osd.pp)

ceph-disk --verbose --log-stdout prepare --filestore --cluster-uuid 2b7f8105-78f0-4660-b110-ca2b90cfb3de --osd-uuid 47779da5-36c3-49b9-96c9-912c1f6bb70a --osd-id 1 --fs-type xfs --zap-disk /dev/sdb $(readlink -f '')

as no journal disk specified, ceph-disk will crate /dev/sdb1 and /dev/sdb2, /dev/sdb2 as journal disk

Then
# mount /dev/sdb1 /var/lib/ceph/osd/ceph-1
# ceph-osd --id 1 --mkfs --mkkey --mkjournal # it will initialize /dev/sdb2 as journal disk, write journal head to /dev/sdb2
# ceph auth add osd.1 osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-1/keyring
# umount /var/lib/ceph/osd/ceph-1

# ceph-disk activated /dev/sdb1

Then "/usr/sbin/ceph-manage-journal location" in ceph.journal.location.erb to create /dev/sdc1 for journal partition. But journal header and keyring is no write to /dev/sdc1

So next time lock and unlock, /etc/init.d/ceph start. read /dev/sdc1 header, find header missing, with below error message.

2019-05-30 17:58:23.946 7efc5f5751c0 -1 journal FileJournal::open: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected f89591cf-a74f-4cff-85ed-8f93103267e0, invalid (someone else's?) journal

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-29: Fix proposed to config (master)

#19

Fix proposed to branch: master
Review: https://review.opendev.org/679185

Changed in starlingx:
status:	Confirmed → In Progress

Revision history for this message

Ovidiu Poncea (ovidiuponcea) wrote on 2019-08-29:

#20

Hi Chen,

Journal header should be repaired when ceph is started, unless something changed with Mimic, the message you see should be treated as a warning ( 2019-05-30 17:58:23.946 7efc5f5751c0 -1 journal FileJournal::open: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected f89591cf-a74f-4cff-85ed-8f93103267e0, invalid (someone else's?) journal ).
The journal partition is empty, cheps-osd sees this and initializes it as journal =» OSD should start correctly. Question is, why does it sometimes fail? It starts correctly most of the time...

So, if DB/manifest entries are ok, and the proposed fix works when moving journal from collocated to external on an existing OSD, please check one more more thing: when adding a new OSD with external journal (do not unlock, add a new OSD, collocate it to /dev/sdc1, then unlock) will you still see /dev/sdb1 and /dev/sdb2? It has to be there as user should be able to fallback to collocated in case the journal drive breaks (this was a product requirement at the time of implementation).

Revision history for this message

chen haochuan (martin1982) wrote on 2019-08-30:

#21

Hi Ovidiu

Yes still see /dev/sdc2 and /dev/sdd2. For this time deploy /dev/sdb is journal disk.

storage-0:~$ ls /dev/sd*
/dev/sda /dev/sda1 /dev/sda2 /dev/sda3 /dev/sda4 /dev/sdb /dev/sdb1 /dev/sdb2 /dev/sdc /dev/sdc1 /dev/sdc2 /dev/sdd /dev/sdd1 /dev/sdd2

And for the first puppet apply, there is such error log. This is maybe your referred
"Error initializing journal node: /dev/sdb2 for osd id: 1 ceph-osd return code: 250 "

2019-08-30T06:23:29.879 ^[[0;36mDebug: 2019-08-30 06:23:29 +0000 Exec[configure journal location stor-22](provider=posix): Executing '/usr/sbin/ceph-manage-journal location '{"osdid": 1, "journal_path": "/dev/disk/by-path/pci-0000:00:17.0-ata-2.0-part2", "data_path": "/dev/disk/by-path/pci-0000:00:17.0-ata-6.0-part1"}''^[[0m
2019-08-30T06:23:29.881 ^[[0;36mDebug: 2019-08-30 06:23:29 +0000 Executing: '/usr/sbin/ceph-manage-journal location '{"osdid": 1, "journal_path": "/dev/disk/by-path/pci-0000:00:17.0-ata-2.0-part2", "data_path": "/dev/disk/by-path/pci-0000:00:17.0-ata-6.0-part1"}''^[[0m
2019-08-30T06:23:30.127 ^[[mNotice: 2019-08-30 06:23:30 +0000 /Stage[main]/Platform::Ceph::Osds/Platform_ceph_osd[stor-22]/Exec[configure journal location stor-22]/returns: Fixing journal location for OSD id: 1^[[0m
2019-08-30T06:23:30.129 ^[[mNotice: 2019-08-30 06:23:30 +0000 /Stage[main]/Platform::Ceph::Osds/Platform_ceph_osd[stor-22]/Exec[configure journal location stor-22]/returns: Symlink created: /var/lib/ceph/osd/ceph-1/journal -> /dev/disk/by-partuuid/4f3e224e-0f97-4a8b-85b8-fa512a8ec042^[[0m
2019-08-30T06:23:30.130 ^[[mNotice: 2019-08-30 06:23:30 +0000 /Stage[main]/Platform::Ceph::Osds/Platform_ceph_osd[stor-22]/Exec[configure journal location stor-22]/returns: Error initializing journal node: /dev/sdb2 for osd id: 1 ceph-osd return code: 250 reason: 2019-08-30 06:23:30.119 7f464315c1c0 -1 open: failed to lock pidfile /var/run/ceph/osd.1.pid because another process locked it.^[[0m

And after puppet apply /var/lib/ceph/osd/ceph-1/journal already link to /dev/sdb2 (journal disk, parttition 2, for this time deploy, /dev/sdb is journal disk, /dev/sdb1 for osd 0)

You can check log I uploaded.

Hi Ovidiu

Yes still see /dev/sdc2 and /dev/sdd2. For this time deploy /dev/sdb is journal disk.

storage-0:~$ ls /dev/sd*
/dev/sda  /dev/sda1  /dev/sda2  /dev/sda3  /dev/sda4  /dev/sdb  /dev/sdb1  /dev/sdb2  /dev/sdc  /dev/sdc1  /dev/sdc2  /dev/sdd  /dev/sdd1  /dev/sdd2

And for the first puppet apply, there is such error log. This is maybe your referred 
"Error initializing journal node: /dev/sdb2 for osd id: 1 ceph-osd return code: 250 "

And after puppet apply /var/lib/ceph/osd/ceph-1/journal already link to /dev/sdb2 (journal disk, parttition 2, for this time deploy, /dev/sdb is journal disk, /dev/sdb1 for osd 0)

You can check log I uploaded.

Revision history for this message

chen haochuan (martin1982) wrote on 2019-08-30:

#22

ALL_NODE.tar.gz Edit (27.0 MiB, application/x-tar)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-09: Fix proposed to stx-puppet (master)

#23

Fix proposed to branch: master
Review: https://review.opendev.org/680897

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-09: Change abandoned on config (master)

#24

Change abandoned by chen haochuan (<email address hidden>) on branch: master
Review: https://review.opendev.org/679185

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-16: Fix merged to stx-puppet (master)

#25

Reviewed: https://review.opendev.org/680897
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=3a387d729b73c23e584f2430a42b8155b225130f
Submitter: Zuul
Branch: master

commit 3a387d729b73c23e584f2430a42b8155b225130f
Author: Martin, Chen <email address hidden>
Date: Thu Aug 29 20:55:50 2019 +0800

Fix journal disk path incorrect initialize with dedicate journal disk

    When user provision a storage node with dedicated storage add for
    journal, it is designated to create journal partition on this dedicated
    disk for every osd.
        $ system host-stor-add storage-0 journal <disk uuid>
        $ system host-stor-add storage-0 osd <disk uuid>

For above case, "ceph-disk prepare" request correct journal partition
for initialize and write header and keyring to journal disk.

Closes-Bug: 1830736

Change-Id: I70ae1a3bc049ad3842e0ec22851b148de5671781
Signed-off-by: Martin, Chen <email address hidden>

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-09-19:

#26

@Martin, when the change has sufficient soak in master and sanity is green, please cherrypick to r/stx.2.0

Revision history for this message

Anujeyan Manokeran (anujeyan) wrote on 2019-09-23:

#27

tested on build 2019-09-22 20:01:30 . Lock and unlock worked on both storage.

tags:

removed: stx.retestneeded

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-09-24:

#28

@Martin, Given that Jeyan verified the fix in master, please cherry-pick the change to r/stx.2.0 asap.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-25: Fix proposed to config (r/stx.2.0)

#29

Fix proposed to branch: r/stx.2.0
Review: https://review.opendev.org/684438

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-25: Fix merged to config (r/stx.2.0)

#30

Reviewed: https://review.opendev.org/684438
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=3e5e741f47cb4d226e73b0632741af6446e606b7
Submitter: Zuul
Branch: r/stx.2.0

commit 3e5e741f47cb4d226e73b0632741af6446e606b7
Author: Martin, Chen <email address hidden>
Date: Thu Aug 29 20:55:50 2019 +0800

Fix journal disk path incorrect initialize with dedicate journal disk

For above case, "ceph-disk prepare" request correct journal partition
for initialize and write header and keyring to journal disk.

Closes-Bug: 1830736

Change-Id: I70ae1a3bc049ad3842e0ec22851b148de5671781
Signed-off-by: Martin, Chen <email address hidden>

Ghada Khalil (gkhalil) on 2019-10-03

tags:

added: in-r-stx20

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.