Ceph osd process was not recovered after lock and unlock on storage node with journal disk
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
chen haochuan |
Bug Description
Brief Description
-----------------.
After storage node(storage-0) was locked and unlocked never recovered due to ceph (osd.0, osd.1, ) process. Auto recovery was not successful. Storage-0 was not recovered.
Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
[2019-05-26 02:02:28,248] 387 DEBUG MainThread ssh.expect :: Output:
+------
| Property | Value |
+------
| action | none |
| administrative | locked |
| availability | online |
| bm_ip | 128.224.64.220 |
| bm_type | bmc |
| bm_username | root |
| boot_device | /dev/disk/
| capabilities | {u'stor_function': u'monitor'} |
| config_applied | bfa21b40-
| config_status | None |
| config_target | bfa21b40-
| console | ttyS0,115200 |
| created_at | 2019-05-
| hostname | storage-0 |
| id | 6 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| invprovision | provisioned |
| location | {} |
| mgmt_ip | 192.168.204.191 |
| mgmt_mac | 90:e2:ba:c6:95:ec |
| operational | disabled |
| peers | {u'hosts': [u'storage-1', u'storage-0'], u'name': u'group-0'} |
| personality | storage |
| reserved | False |
| rootfs_device | /dev/disk/
| serialid | None |
| software_load | 19.05 |
| task | |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2019-05-
| uptime | 33217 |
| uuid | a56c9e15-
| vim_progress_status | services-disabled |
+------
[wrsroot@
Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
[2019-05-26 02:03:03,616] 387 DEBUG MainThread ssh.expect :: Output:
+------
| Property | Value |
+------
| action | none |
| administrative | locked |
| availability | online |
| bm_ip | 128.224.64.220 |
| bm_type | bmc |
| bm_username | root |
| boot_device | /dev/disk/
| capabilities | {u'stor_function': u'monitor'} |
| config_applied | bfa21b40-
| config_status | None |
| config_target | bfa21b40-
| console | ttyS0,115200 |
| created_at | 2019-05-
| hostname | storage-0 |
| id | 6 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| invprovision | provisioned |
| location | {} |
| mgmt_ip | 192.168.204.191 |
| mgmt_mac | 90:e2:ba:c6:95:ec |
| operational | disabled |
| peers | {u'hosts': [u'storage-1', u'storage-0'], u'name': u'group-0'} |
| personality | storage |
| reserved | False |
| rootfs_device | /dev/disk/
| serialid | None |
| software_load | 19.05 |
| task | Unlocking |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2019-05-
| uptime | 33217 |
| uuid | a56c9e15-
| vim_progress_status | services-disabled |
+------
DEBUG MainThread ssh.exec_cmd:: Executing command...
[2019-05-26 02:25:41,372] 262 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
[2019-05-26 02:25:43,014] 387 DEBUG MainThread ssh.expect :: Output:
+------
| Property | Value |
+------
| action | none |
| administrative | unlocked |
| availability | failed |
| bm_ip | 128.224.64.220 |
| bm_type | bmc |
| bm_username | root |
| boot_device | /dev/disk/
| capabilities | {u'stor_function': u'monitor'} |
| config_applied | bfa21b40-
| config_status | None |
| config_target | bfa21b40-
| console | ttyS0,115200 |
| created_at | 2019-05-
| hostname | storage-0 |
| id | 6 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| invprovision | provisioned |
| location | {} |
| mgmt_ip | 192.168.204.191 |
| mgmt_mac | 90:e2:ba:c6:95:ec |
| operational | disabled |
| peers | {u'hosts': [u'storage-1', u'storage-0'], u'name': u'group-0'} |
| personality | storage |
| reserved | False |
| rootfs_device | /dev/disk/
| serialid | None |
| software_load | 19.05 |
| task | Service Failure, threshold reached, Lock/Unlock to retry |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2019-05-
| uptime | 841 |
| uuid | a56c9e15-
| vim_progress_status | services-disabled |
+------
[wrsroot@
019-05-26 03:09:00,908] 387 DEBUG MainThread ssh.expect :: Output:
fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
URL --os-region-name RegionOne alarm-list --nowrap --uuid
+------
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+------
| 94c0c118-
| 574b7d0e-
| 484d6b73-
| e8a473ea-
+------
Severity
--------.
Major
Steps to Reproduce
------------------
1. Install storage lab with open stack application as per install procedure.
2. lock and unlock the storage node.
3. storage node was not in available state as per description.
Expected Behavior
------------------
No failure on osd after unlock and storage in available state.
Actual Behavior
----------------
storage-0 was not recovered . It was on failed mode.
Reproducibility
---------------
System Configuration
-------
Regular system
Branch/Pull Time/Commit
-------
BUILD_DATE=
Last Pass
---------
20190503T013000Z
Timestamp/Logs
--------------
2019-05-
Test Activity
-------------
Regression test
description: | updated |
tags: | added: stx.retestneeded |
Changed in starlingx: | |
assignee: | Cindy Xie (xxie1) → nobody |
assignee: | nobody → chen haochuan (martin1982) |
tags: | added: in-r-stx20 |
Marking as release gating; appears related to ceph and requires further investigation.