[library] ceph osd: invalid (someone else's?) journal
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Invalid
|
High
|
Oleksiy Molchanov |
Bug Description
Detailed decription:
Environment: MOS 8, MU2
# fuel plugins
id | name | version | package_version
---|---
1 | elasticsearch_
2 | influxdb_grafana | 0.9.0 | 4.0.0
3 | lma_collector | 0.9.0 | 4.0.0
4 | lma_infrastruct
5 | static_ntp_routing | 4.0.6 | 4.0.0
6 | zabbix-database | 4.0.1 | 4.0.0
7 | zabbix_monitoring | 2.6.25 | 4.0.0
8 | ivb5_plugin_vw | 1.0.27 | 4.0.0
# ceph -v
ceph version 0.94.5 (9764da52395923
360 osds, 18 hosts.
In ceph-osd logs wee see errors [library] ceph osd: invalid (someone else's?) journal
We had 20 osd per host and 4disks(5 partitions per disk) for journaling.
After one osd failure we were not able to start osd until we recreated journal for it.
We rechecked the whole env and discovered that we have 7 problematic hosts, when some osds are sharing the same journal and they are still working.
So, during deployment the wrong journal assignment/
All partitions created properly, but not assigned during ceph-deployement:
# ceph-disk list | grep journal
/dev/sda4 ceph data, active, cluster ceph, osd.469, journal /dev/sdu3
/dev/sdb3 ceph data, active, cluster ceph, osd.265, journal /dev/sdu4
/dev/sdc3 ceph data, active, cluster ceph, osd.483, journal /dev/sdu4
/dev/sdd3 ceph data, active, cluster ceph, osd.476, journal /dev/sdu5
/dev/sde3 ceph data, active, cluster ceph, osd.399, journal /dev/sdu6
/dev/sdf3 ceph data, active, cluster ceph, osd.498, journal /dev/sdu3
/dev/sdg3 ceph data, active, cluster ceph, osd.428, journal /dev/sdv3
/dev/sdh3 ceph data, active, cluster ceph, osd.508, journal /dev/sdu4
/dev/sdi3 ceph data, active, cluster ceph, osd.504, journal /dev/sdu5
/dev/sdj3 ceph data, active, cluster ceph, osd.183, journal /dev/sdv7
/dev/sdk3 ceph data, active, cluster ceph, osd.453, journal /dev/sdv6
/dev/sdl3 ceph data, active, cluster ceph, osd.396, journal /dev/sdv7
/dev/sdm3 ceph data, active, cluster ceph, osd.349, journal /dev/sdw3
/dev/sdn3 ceph data, active, cluster ceph, osd.419, journal /dev/sdw4
/dev/sdo3 ceph data, active, cluster ceph, osd.206, journal /dev/sdw7
/dev/sdp3 ceph data, active, cluster ceph, osd.436, journal /dev/sdw5
/dev/sdq3 ceph data, active, cluster ceph, osd.401, journal /dev/sdw6
/dev/sdr3 ceph data, active, cluster ceph, osd.463, journal /dev/sdw7
/dev/sds3 ceph data, active, cluster ceph, osd.445, journal /dev/sdx3
/dev/sdt3 ceph data, active, cluster ceph, osd.490, journal /dev/sdx4
/dev/sdu3 ceph journal, for /dev/sdf3
/dev/sdu4 ceph journal, for /dev/sdh3
/dev/sdu5 ceph journal, for /dev/sdi3
/dev/sdu6 ceph journal, for /dev/sde3
/dev/sdu7 ceph journal
/dev/sdv3 ceph journal, for /dev/sdg3
/dev/sdv4 ceph journal
/dev/sdv5 ceph journal
/dev/sdv6 ceph journal, for /dev/sdk3
/dev/sdv7 ceph journal, for /dev/sdl3
/dev/sdw3 ceph journal, for /dev/sdm3
/dev/sdw4 ceph journal, for /dev/sdn3
/dev/sdw5 ceph journal, for /dev/sdp3
/dev/sdw6 ceph journal, for /dev/sdq3
/dev/sdw7 ceph journal, for /dev/sdr3
/dev/sdx3 ceph journal, for /dev/sds3
/dev/sdx4 ceph journal, for /dev/sdt3
/dev/sdx5 ceph journal
/dev/sdx6 ceph journal
/dev/sdx7 ceph journal
# ls -l /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:27 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:29 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:32 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:40 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:41 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:42 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:42 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:43 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:43 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:43 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:43 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:44 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:44 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:44 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:45 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:45 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:45 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:46 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:46 /var/lib/
lrwxrwxrwx 1 root root 58 May 19 22:46 /var/lib/
# ls -l /var/lib/
2 /dev/disk/
1 /dev/disk/
1 /dev/disk/
1 /dev/disk/
1 /dev/disk/
1 /dev/disk/
1 /dev/disk/
2 /dev/disk/
3 /dev/disk/
1 /dev/disk/
1 /dev/disk/
2 /dev/disk/
1 /dev/disk/
2 /dev/disk/
Steps to reproduce:
I can just guess , that:
try to deploy ceph-osd node with more than 2 osds, stop deployment when just only one is started, try to deploy others.
Workaround:
Manually recreated journals for the half of osds that use dublicated/
After you know which partition you will use for specific osd journal:
# ceph osd set noout
# cd /var/lib/
# echo "bbaddcd9-
# cp -rp journal_uuid journal_uuid_orig
# rm journal
# ln -s /dev/disk/
# stop seph-osd id=247
# ceph-osd --mkjournal -i 247
# start seph-osd id=247
tags: | added: area-ceph |
Changed in fuel: | |
assignee: | nobody → MOS Ceph (mos-ceph) |
status: | New → Confirmed |
Changed in fuel: | |
assignee: | MOS Ceph (mos-ceph) → Oleksiy Molchanov (omolchanov) |
It seems to be very specific case, was not able to reproduce on small env. I am marking this as Invalid. Please reopen as soon as it reproduced again and you have diagnostic snapshot.