[library] ceph osd: invalid (someone else's?) journal
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Committed
|
Medium
|
Dmitry Borodaenko |
Bug Description
I have not been able to replicate this.
I have 3 OSD servers with 4 OSD disks and 1 Journal disk each; deployed with Fuel 5.0.
# ceph osd dump | fgrep out
osd.11 down out weight 0 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) :/0 :/0 :/0 :/0 exists,new e68c819d-
# ps -ef | fgrep ceph
root 3438 1 0 Jun02 ? 00:01:37 /usr/bin/ceph-mon -i node-1 --pid-file /var/run/
root 3791 1 0 Jun02 ? 00:08:26 /usr/bin/ceph-osd -i 0 --pid-file /var/run/
root 4026 1 0 Jun02 ? 00:08:25 /usr/bin/ceph-osd -i 3 --pid-file /var/run/
root 4322 1 0 Jun02 ? 00:08:15 /usr/bin/ceph-osd -i 10 --pid-file /var/run/
# tail -5 /var/log/
2014-06-03 21:23:44.949786 7f6b9d1e57a0 0 filestore(
2014-06-03 21:23:44.958455 7f6b9d1e57a0 1 journal _open /var/lib/
2014-06-03 21:23:44.958563 7f6b9d1e57a0 -1 journal FileJournal::open: ondisk fsid f69f1af8-
2014-06-03 21:23:44.958632 7f6b9d1e57a0 -1 filestore(
2014-06-03 21:23:44.960387 7f6b9d1e57a0 -1 ** ERROR: error converting store /var/lib/
# ls -l /var/lib/
lrwxrwxrwx 1 root root 9 May 30 20:37 /var/lib/
lrwxrwxrwx 1 root root 9 May 30 20:42 /var/lib/
lrwxrwxrwx 1 root root 9 May 30 20:37 /var/lib/
lrwxrwxrwx 1 root root 9 May 30 20:37 /var/lib/
Two of the OSDs are pointed at the same journal partition.
Changed in fuel: | |
status: | New → Confirmed |
importance: | Undecided → Medium |
assignee: | nobody → Dmitry Borodaenko (dborodaenko) |
milestone: | none → 5.1 |
summary: |
- ceph osd: invalid (someone else's?) journal + [library] ceph osd: invalid (someone else's?) journal |
Redeployed my cluster and it happened again; one of the OSD nodes has the same journal partition linked to two of the OSD disks:
[root@node-1 ~]# ls -l /var/lib/ ceph/osd/ ceph-*/ journal ceph/osd/ ceph-11/ journal -> /dev/sda4 ceph/osd/ ceph-2/ journal -> /dev/sda4 ceph/osd/ ceph-5/ journal -> /dev/sda5 ceph/osd/ ceph-10/ journal -> /dev/sda7
lrwxrwxrwx 1 root root 9 Jun 4 16:34 /var/lib/
lrwxrwxrwx 1 root root 9 Jun 4 16:30 /var/lib/
lrwxrwxrwx 1 root root 9 Jun 4 16:30 /var/lib/
lrwxrwxrwx 1 root root 9 Jun 4 16:31 /var/lib/
From the puppet logs:
(/Stage[ main]/Ceph: :Osd/Exec[ ceph-deploy osd activate]/returns) change from notrun to 0 failed: ceph-deploy osd activate node-1: /dev/sde4: /dev/sda4 returned 1 instead of one of [0] main]/Ceph: :Osd/Exec[ ceph-deploy osd prepare]/returns) change from notrun to 0 failed: ceph-deploy osd prepare node-1: /dev/sde4: /dev/sda4 node-1: /dev/sdb4: /dev/sda5 returned 1 instead of one of [0] /dev/sde4: /dev/sda4 node-1: /dev/sdb4: /dev/sda5 returned 1 instead of one of [0] main]/Ceph: :Osd/Exec[ ceph-deploy osd activate]/returns) change from notrun to 0 failed: ceph-deploy osd activate node-1: /dev/sdc4: /dev/sda4 node-1: /dev/sdd4: /dev/sda5 node-1: /dev/sde4: /dev/sda6 node-1: /dev/sdb4: /dev/sda7 returned 1 instead of one of [0] /dev/sdc4: /dev/sda4 node-1: /dev/sdd4: /dev/sda5 node-1: /dev/sde4: /dev/sda6 node-1: /dev/sdb4: /dev/sda7 returned 1 instead of one of [0]
(/Stage[
ceph-deploy osd prepare node-1:
(/Stage[
ceph-deploy osd activate node-1: