ceph-osd failed to start if filesystem is recovered at mount time

Bug #1372081 reported by Dennis Dmitriev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Low
MOS Ceph
6.0.x
Won't Fix
Low
MOS Ceph
7.0.x
Won't Fix
Low
MOS Ceph
8.0.x
Won't Fix
Low
MOS Ceph
Mitaka
Invalid
Low
MOS Ceph

Bug Description

I have faced with this issue during working with CI test "thead_3" on CentOS, in particular with the "ceph_ha_restart" test.

When a node, where Ceph was installed, reboots/reset and filesystem used for ceph damages, then ceph-osd fails to start while this filesystem checks for errors during mounting.

Manual restart 'ceph' service makes ceph-osd alive.

Looks like the filesystem is still in 'ro' state when ceph-osd is starting.

It is hard to reproduce because of a short moment of race condition between ceph-osd starting and filesystem checking.

It can be investigated by turning a ceph filesystem into 'ro' state and restarting ceph service:

# service ceph stop

# mount -f -o remount,ro /dev/vda4 #/dev/vda4 - ceph partiton
or
# xfs_freeze -f /var/lib/ceph/osd/ceph-1 #/var/lib/ceph/osd/ceph-1 - ceph mount point for /dev/vda4

# service ceph start

# service ceph status
=== mon.node-1 ===
mon.node-1: running {"version":"0.80.4"}
=== osd.1 ===
osd.1: not running.

In the diagnostic snapshot, for 'node-1' ceph-osd was started on 2014-09-20 15:41:20, and filesystem check was started on 2014-09-20 15:41:21 (16:41:21 in the kernel.log):
================
2014-09-20T16:41:21.037209+01:00 notice: XFS (vda4): Mounting Filesystem
2014-09-20T16:41:21.066841+01:00 notice: XFS (vda4): Starting recovery (logdev: internal)
2014-09-20T16:41:21.070858+01:00 notice: XFS (vda4): Ending recovery (logdev: internal)
================

Cluster configuration:
CentOS/HA, nova-network/flat, 3 controller+ceph; 2 compute+ceph; 1 ceph.

{"build_id": "2014-09-17_21-40-34", "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346", "build_number": "11", "auth_required": true, "api": "1.0", "nailgun_sha": "eb8f2b358ea4bb7eb0b2a0075e7ad3d3a905db0d", "production": "docker", "fuelmain_sha": "8ef433e939425eabd1034c0b70e90bdf888b69fd", "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13", "feature_groups": ["mirantis"], "release": "5.1", "release_versions": {"2014.1.1-5.1": {"VERSION": {"build_id": "2014-09-17_21-40-34", "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346", "build_number": "11", "api": "1.0", "nailgun_sha": "eb8f2b358ea4bb7eb0b2a0075e7ad3d3a905db0d", "production": "docker", "fuelmain_sha": "8ef433e939425eabd1034c0b70e90bdf888b69fd", "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13", "feature_groups": ["mirantis"], "release": "5.1", "fuellib_sha": "d9b16846e54f76c8ebe7764d2b5b8231d6b25079"}}}, "fuellib_sha": "d9b16846e54f76c8ebe7764d2b5b8231d6b25079"}

Tags: area-mos ceph
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :
tags: added: ceph
Changed in fuel:
milestone: 5.1 → 6.0
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
Changed in fuel:
importance: Undecided → Medium
status: New → Triaged
Changed in fuel:
importance: Medium → Low
Changed in fuel:
milestone: 6.0 → 6.1
no longer affects: fuel/6.1.x
Changed in fuel:
status: Triaged → Won't Fix
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 6.1 → 8.0
status: Won't Fix → Triaged
no longer affects: fuel/8.0.x
Dmitry Pyzhov (dpyzhov)
tags: added: area-mos
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

We no longer fix Low bugs in 8.0, closing as Won't Fix

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

As of MOS 9.0 we support CentOS 7 and Ubuntu 14.04. The init systems of those distros (systemd and upstart, respectively) restart the failed services. Therefore I'm marking bug as Invalid.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.