Just to be clear, the logs Peng Peng attached in comment #7, is for verification failure. Email thread pasted: Hi Dan, Please put your comment in the ticket. What is you suggestion for next step? Should we reopen it for more investigation? Thanks, Peng From: Voiculeasa, Dan Sent: Monday, February 24, 2020 12:32 PM To: Peng, Peng Cc: Liu, Yang (YOW) Subject: Re: LP-1856064 is reproduced Hello, Yes, the fix for the identified issue when investigating LP-1856064 is in that load. It correctly detects stuck peering OSDs that are not false positives determined by host-lock operation. Not sure if the issue at hand is related to lock-unlock. Seems an osd is in a wrong state. var/log/bash.log:2020-02-21T15:16:12.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock controller-0 var/log/bash.log:2020-02-21T15:17:11.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock controller-1 var/log/bash.log:2020-02-21T15:25:49.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock compute-0 log/bash.log:2020-02-21T15:19:10.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-1 log/bash.log:2020-02-21T15:27:39.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock compute-0 # Successful restart on osd.1 of controller-0 020-02-21 15:39:02.413 /etc/init.d/ceph osd.1 WARN: Detected stuck peering for 202 seconds 2020-02-21 15:39:02.427 /etc/init.d/ceph-init-wrapper osd.1 INFO: Restarting OSD stuck peering 2020-02-21 15:39:02.947 /etc/init.d/ceph osd.1 INFO: Stopping process 2020-02-21 15:39:04.012 /etc/init.d/ceph osd.1 INFO: Process stopped, setting state to STOPPED 2020-02-21 15:39:04.151 /etc/init.d/ceph mgr.controller-0 WARN: /var/lib/ceph/mgr/ceph-controller-0/sysvinit file is missing 2020-02-21 15:39:04.569 /etc/init.d/ceph osd.1 INFO: Process STARTED successfully, waiting for it to become OPERATIONAL 2020-02-21 15:39:05.473 /etc/init.d/ceph-init-wrapper - INFO: Ceph START command received 2020-02-21 15:39:05.474 /etc/init.d/ceph-init-wrapper - INFO: Grab service locks 2020-02-21 15:39:05.477 /etc/init.d/ceph-init-wrapper - INFO: Lock service status 2020-02-21 15:43:40.340 /etc/init.d/ceph osd.1 WARN: /var/lib/ceph/osd/ceph-1/sysvinit file is missing 2020-02-21 15:43:40.346 /etc/init.d/ceph mgr.controller-0 WARN: /var/lib/ceph/mgr/ceph-controller-0/sysvinit file is missing 2020-02-21 15:43:40.599 /etc/init.d/ceph mon.controller-0 INFO: Process STARTED successfully, waiting for it to become OPERATIONAL 2020-02-21 15:43:43.227 /etc/init.d/ceph mgr.controller-0 WARN: /var/lib/ceph/mgr/ceph-controller-0/sysvinit file is missing 2020-02-21 15:43:43.661 /etc/init.d/ceph osd.1 INFO: Process STARTED successfully, waiting for it to become OPERATIONAL 2020-02-21 15:44:01.448 /etc/init.d/ceph mgr.controller-0 WARN: /var/lib/ceph/mgr/ceph-controller-0/sysvinit file is missing 2020-02-21 15:45:02.095 /etc/init.d/ceph mon.controller-0 INFO: Process is OPERATIONAL 2020-02-21 15:45:02.857 /etc/init.d/ceph osd.1 INFO: Process is OPERATIONAL # osd.0 of controller-1 seems 2020-02-21 15:35:03.145 /etc/init.d/ceph osd.0 WARN: Process went down! 2020-02-21 15:35:36.774 /etc/init.d/ceph osd.0 WARN: Process went up, flapping status or busy process? 2020-02-21 15:35:38.401 /etc/init.d/ceph mgr.controller-1 WARN: /var/lib/ceph/mgr/ceph-controller-1/sysvinit file is missing 2020-02-21 15:39:00.150 /etc/init.d/ceph osd.0 WARN: Detected stuck peering for 204 seconds 2020-02-21 15:39:00.162 /etc/init.d/ceph-init-wrapper osd.0 INFO: Restarting OSD stuck peering 2020-02-21 15:39:00.682 /etc/init.d/ceph osd.0 INFO: Stopping process 2020-02-21 15:39:01.760 /etc/init.d/ceph osd.0 INFO: Process stopped, setting state to STOPPED 2020-02-21 15:39:01.901 /etc/init.d/ceph mgr.controller-1 WARN: /var/lib/ceph/mgr/ceph-controller-1/sysvinit file is missing 2020-02-21 15:39:02.328 /etc/init.d/ceph osd.0 INFO: Process STARTED successfully, waiting for it to become OPERATIONAL On Fri, 2020-02-21 at 21:11 +0200, Peng, Peng wrote: Hi Dan, LP-1856064 Seems reproduced on Lab: WCP_63_66 Load: 2020-02-19_20-00-00 Log attached Can you help to check whether the fix is in the load? Thanks, Peng Peng Tel: 613-963-1420 Skype ID: pengp1978