Comment 8 for bug 1856064

Revision history for this message
Yang Liu (yliu12) wrote :

Just to be clear, the logs Peng Peng attached in comment #7, is for verification failure.

Email thread pasted:

Hi Dan,

Please put your comment in the ticket. What is you suggestion for next step? Should we reopen it for more investigation?

Thanks,
Peng

From: Voiculeasa, Dan
Sent: Monday, February 24, 2020 12:32 PM
To: Peng, Peng
Cc: Liu, Yang (YOW)
Subject: Re: LP-1856064 is reproduced

Hello,

Yes, the fix for the identified issue when investigating LP-1856064 is in that load.
It correctly detects stuck peering OSDs that are not false positives determined by host-lock operation.

Not sure if the issue at hand is related to lock-unlock. Seems an osd is in a wrong state.

var/log/bash.log:2020-02-21T15:16:12.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock controller-0
var/log/bash.log:2020-02-21T15:17:11.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock controller-1
var/log/bash.log:2020-02-21T15:25:49.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock compute-0

log/bash.log:2020-02-21T15:19:10.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-1
log/bash.log:2020-02-21T15:27:39.000 controller-0 -sh: info HISTORY: PID=3318225 UID=42425 system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock compute-0

# Successful restart on osd.1 of controller-0

020-02-21 15:39:02.413 /etc/init.d/ceph osd.1 WARN: Detected stuck peering for 202 seconds
2020-02-21 15:39:02.427 /etc/init.d/ceph-init-wrapper osd.1 INFO: Restarting OSD stuck peering
2020-02-21 15:39:02.947 /etc/init.d/ceph osd.1 INFO: Stopping process
2020-02-21 15:39:04.012 /etc/init.d/ceph osd.1 INFO: Process stopped, setting state to STOPPED
2020-02-21 15:39:04.151 /etc/init.d/ceph mgr.controller-0 WARN: /var/lib/ceph/mgr/ceph-controller-0/sysvinit file is missing
2020-02-21 15:39:04.569 /etc/init.d/ceph osd.1 INFO: Process STARTED successfully, waiting for it to become OPERATIONAL
2020-02-21 15:39:05.473 /etc/init.d/ceph-init-wrapper - INFO: Ceph START command received
2020-02-21 15:39:05.474 /etc/init.d/ceph-init-wrapper - INFO: Grab service locks
2020-02-21 15:39:05.477 /etc/init.d/ceph-init-wrapper - INFO: Lock service status
2020-02-21 15:43:40.340 /etc/init.d/ceph osd.1 WARN: /var/lib/ceph/osd/ceph-1/sysvinit file is missing
2020-02-21 15:43:40.346 /etc/init.d/ceph mgr.controller-0 WARN: /var/lib/ceph/mgr/ceph-controller-0/sysvinit file is missing
2020-02-21 15:43:40.599 /etc/init.d/ceph mon.controller-0 INFO: Process STARTED successfully, waiting for it to become OPERATIONAL
2020-02-21 15:43:43.227 /etc/init.d/ceph mgr.controller-0 WARN: /var/lib/ceph/mgr/ceph-controller-0/sysvinit file is missing
2020-02-21 15:43:43.661 /etc/init.d/ceph osd.1 INFO: Process STARTED successfully, waiting for it to become OPERATIONAL
2020-02-21 15:44:01.448 /etc/init.d/ceph mgr.controller-0 WARN: /var/lib/ceph/mgr/ceph-controller-0/sysvinit file is missing
2020-02-21 15:45:02.095 /etc/init.d/ceph mon.controller-0 INFO: Process is OPERATIONAL
2020-02-21 15:45:02.857 /etc/init.d/ceph osd.1 INFO: Process is OPERATIONAL

# osd.0 of controller-1 seems

2020-02-21 15:35:03.145 /etc/init.d/ceph osd.0 WARN: Process went down!
2020-02-21 15:35:36.774 /etc/init.d/ceph osd.0 WARN: Process went up, flapping status or busy process?
2020-02-21 15:35:38.401 /etc/init.d/ceph mgr.controller-1 WARN: /var/lib/ceph/mgr/ceph-controller-1/sysvinit file is missing
2020-02-21 15:39:00.150 /etc/init.d/ceph osd.0 WARN: Detected stuck peering for 204 seconds
2020-02-21 15:39:00.162 /etc/init.d/ceph-init-wrapper osd.0 INFO: Restarting OSD stuck peering
2020-02-21 15:39:00.682 /etc/init.d/ceph osd.0 INFO: Stopping process
2020-02-21 15:39:01.760 /etc/init.d/ceph osd.0 INFO: Process stopped, setting state to STOPPED
2020-02-21 15:39:01.901 /etc/init.d/ceph mgr.controller-1 WARN: /var/lib/ceph/mgr/ceph-controller-1/sysvinit file is missing
2020-02-21 15:39:02.328 /etc/init.d/ceph osd.0 INFO: Process STARTED successfully, waiting for it to become OPERATIONAL

On Fri, 2020-02-21 at 21:11 +0200, Peng, Peng wrote:
Hi Dan,

LP-1856064 Seems reproduced on
Lab: WCP_63_66
Load: 2020-02-19_20-00-00
Log attached

Can you help to check whether the fix is in the load?

Thanks,

Peng Peng
Tel: 613-963-1420
Skype ID: pengp1978