Just to be clear, the logs Peng Peng attached in comment #7, is for verification failure.
Email thread pasted:
Hi Dan,
Please put your comment in the ticket. What is you suggestion for next step? Should we reopen it for more investigation?
Thanks,
Peng
From: Voiculeasa, Dan
Sent: Monday, February 24, 2020 12:32 PM
To: Peng, Peng
Cc: Liu, Yang (YOW)
Subject: Re: LP-1856064 is reproduced
Hello,
Yes, the fix for the identified issue when investigating LP-1856064 is in that load.
It correctly detects stuck peering OSDs that are not false positives determined by host-lock operation.
Not sure if the issue at hand is related to lock-unlock. Seems an osd is in a wrong state.
020-02-21 15:39:02.413 /etc/init.d/ceph osd.1 WARN: Detected stuck peering for 202 seconds
2020-02-21 15:39:02.427 /etc/init.d/ceph-init-wrapper osd.1 INFO: Restarting OSD stuck peering
2020-02-21 15:39:02.947 /etc/init.d/ceph osd.1 INFO: Stopping process
2020-02-21 15:39:04.012 /etc/init.d/ceph osd.1 INFO: Process stopped, setting state to STOPPED
2020-02-21 15:39:04.151 /etc/init.d/ceph mgr.controller-0 WARN: /var/lib/ceph/mgr/ceph-controller-0/sysvinit file is missing
2020-02-21 15:39:04.569 /etc/init.d/ceph osd.1 INFO: Process STARTED successfully, waiting for it to become OPERATIONAL
2020-02-21 15:39:05.473 /etc/init.d/ceph-init-wrapper - INFO: Ceph START command received
2020-02-21 15:39:05.474 /etc/init.d/ceph-init-wrapper - INFO: Grab service locks
2020-02-21 15:39:05.477 /etc/init.d/ceph-init-wrapper - INFO: Lock service status
2020-02-21 15:43:40.340 /etc/init.d/ceph osd.1 WARN: /var/lib/ceph/osd/ceph-1/sysvinit file is missing
2020-02-21 15:43:40.346 /etc/init.d/ceph mgr.controller-0 WARN: /var/lib/ceph/mgr/ceph-controller-0/sysvinit file is missing
2020-02-21 15:43:40.599 /etc/init.d/ceph mon.controller-0 INFO: Process STARTED successfully, waiting for it to become OPERATIONAL
2020-02-21 15:43:43.227 /etc/init.d/ceph mgr.controller-0 WARN: /var/lib/ceph/mgr/ceph-controller-0/sysvinit file is missing
2020-02-21 15:43:43.661 /etc/init.d/ceph osd.1 INFO: Process STARTED successfully, waiting for it to become OPERATIONAL
2020-02-21 15:44:01.448 /etc/init.d/ceph mgr.controller-0 WARN: /var/lib/ceph/mgr/ceph-controller-0/sysvinit file is missing
2020-02-21 15:45:02.095 /etc/init.d/ceph mon.controller-0 INFO: Process is OPERATIONAL
2020-02-21 15:45:02.857 /etc/init.d/ceph osd.1 INFO: Process is OPERATIONAL
# osd.0 of controller-1 seems
2020-02-21 15:35:03.145 /etc/init.d/ceph osd.0 WARN: Process went down!
2020-02-21 15:35:36.774 /etc/init.d/ceph osd.0 WARN: Process went up, flapping status or busy process?
2020-02-21 15:35:38.401 /etc/init.d/ceph mgr.controller-1 WARN: /var/lib/ceph/mgr/ceph-controller-1/sysvinit file is missing
2020-02-21 15:39:00.150 /etc/init.d/ceph osd.0 WARN: Detected stuck peering for 204 seconds
2020-02-21 15:39:00.162 /etc/init.d/ceph-init-wrapper osd.0 INFO: Restarting OSD stuck peering
2020-02-21 15:39:00.682 /etc/init.d/ceph osd.0 INFO: Stopping process
2020-02-21 15:39:01.760 /etc/init.d/ceph osd.0 INFO: Process stopped, setting state to STOPPED
2020-02-21 15:39:01.901 /etc/init.d/ceph mgr.controller-1 WARN: /var/lib/ceph/mgr/ceph-controller-1/sysvinit file is missing
2020-02-21 15:39:02.328 /etc/init.d/ceph osd.0 INFO: Process STARTED successfully, waiting for it to become OPERATIONAL
On Fri, 2020-02-21 at 21:11 +0200, Peng, Peng wrote:
Hi Dan,
LP-1856064 Seems reproduced on
Lab: WCP_63_66
Load: 2020-02-19_20-00-00
Log attached
Can you help to check whether the fix is in the load?
Just to be clear, the logs Peng Peng attached in comment #7, is for verification failure.
Email thread pasted:
Hi Dan,
Please put your comment in the ticket. What is you suggestion for next step? Should we reopen it for more investigation?
Thanks,
Peng
From: Voiculeasa, Dan
Sent: Monday, February 24, 2020 12:32 PM
To: Peng, Peng
Cc: Liu, Yang (YOW)
Subject: Re: LP-1856064 is reproduced
Hello,
Yes, the fix for the identified issue when investigating LP-1856064 is in that load.
It correctly detects stuck peering OSDs that are not false positives determined by host-lock operation.
Not sure if the issue at hand is related to lock-unlock. Seems an osd is in a wrong state.
var/log/ bash.log: 2020-02- 21T15:16: 12.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user- domain- name Default --os-project- domain- name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock controller-0 bash.log: 2020-02- 21T15:17: 11.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user- domain- name Default --os-project- domain- name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock controller-1 bash.log: 2020-02- 21T15:25: 49.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user- domain- name Default --os-project- domain- name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock compute-0
var/log/
var/log/
log/bash. log:2020- 02-21T15: 19:10.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user- domain- name Default --os-project- domain- name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-1 log:2020- 02-21T15: 27:39.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user- domain- name Default --os-project- domain- name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock compute-0
log/bash.
# Successful restart on osd.1 of controller-0
020-02-21 15:39:02.413 /etc/init.d/ceph osd.1 WARN: Detected stuck peering for 202 seconds d/ceph- init-wrapper osd.1 INFO: Restarting OSD stuck peering ceph/mgr/ ceph-controller -0/sysvinit file is missing d/ceph- init-wrapper - INFO: Ceph START command received d/ceph- init-wrapper - INFO: Grab service locks d/ceph- init-wrapper - INFO: Lock service status ceph/osd/ ceph-1/ sysvinit file is missing ceph/mgr/ ceph-controller -0/sysvinit file is missing ceph/mgr/ ceph-controller -0/sysvinit file is missing ceph/mgr/ ceph-controller -0/sysvinit file is missing
2020-02-21 15:39:02.427 /etc/init.
2020-02-21 15:39:02.947 /etc/init.d/ceph osd.1 INFO: Stopping process
2020-02-21 15:39:04.012 /etc/init.d/ceph osd.1 INFO: Process stopped, setting state to STOPPED
2020-02-21 15:39:04.151 /etc/init.d/ceph mgr.controller-0 WARN: /var/lib/
2020-02-21 15:39:04.569 /etc/init.d/ceph osd.1 INFO: Process STARTED successfully, waiting for it to become OPERATIONAL
2020-02-21 15:39:05.473 /etc/init.
2020-02-21 15:39:05.474 /etc/init.
2020-02-21 15:39:05.477 /etc/init.
2020-02-21 15:43:40.340 /etc/init.d/ceph osd.1 WARN: /var/lib/
2020-02-21 15:43:40.346 /etc/init.d/ceph mgr.controller-0 WARN: /var/lib/
2020-02-21 15:43:40.599 /etc/init.d/ceph mon.controller-0 INFO: Process STARTED successfully, waiting for it to become OPERATIONAL
2020-02-21 15:43:43.227 /etc/init.d/ceph mgr.controller-0 WARN: /var/lib/
2020-02-21 15:43:43.661 /etc/init.d/ceph osd.1 INFO: Process STARTED successfully, waiting for it to become OPERATIONAL
2020-02-21 15:44:01.448 /etc/init.d/ceph mgr.controller-0 WARN: /var/lib/
2020-02-21 15:45:02.095 /etc/init.d/ceph mon.controller-0 INFO: Process is OPERATIONAL
2020-02-21 15:45:02.857 /etc/init.d/ceph osd.1 INFO: Process is OPERATIONAL
# osd.0 of controller-1 seems
2020-02-21 15:35:03.145 /etc/init.d/ceph osd.0 WARN: Process went down! ceph/mgr/ ceph-controller -1/sysvinit file is missing d/ceph- init-wrapper osd.0 INFO: Restarting OSD stuck peering ceph/mgr/ ceph-controller -1/sysvinit file is missing
2020-02-21 15:35:36.774 /etc/init.d/ceph osd.0 WARN: Process went up, flapping status or busy process?
2020-02-21 15:35:38.401 /etc/init.d/ceph mgr.controller-1 WARN: /var/lib/
2020-02-21 15:39:00.150 /etc/init.d/ceph osd.0 WARN: Detected stuck peering for 204 seconds
2020-02-21 15:39:00.162 /etc/init.
2020-02-21 15:39:00.682 /etc/init.d/ceph osd.0 INFO: Stopping process
2020-02-21 15:39:01.760 /etc/init.d/ceph osd.0 INFO: Process stopped, setting state to STOPPED
2020-02-21 15:39:01.901 /etc/init.d/ceph mgr.controller-1 WARN: /var/lib/
2020-02-21 15:39:02.328 /etc/init.d/ceph osd.0 INFO: Process STARTED successfully, waiting for it to become OPERATIONAL
On Fri, 2020-02-21 at 21:11 +0200, Peng, Peng wrote:
Hi Dan,
LP-1856064 Seems reproduced on
Lab: WCP_63_66
Load: 2020-02-19_20-00-00
Log attached
Can you help to check whether the fix is in the load?
Thanks,
Peng Peng
Tel: 613-963-1420
Skype ID: pengp1978