New units on 3.7 TB disks created with only 10 GiB block files

Bug #1885516 reported by Diko Parvanov
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Triaged
Low
Unassigned
OpenStack Charms Deployment Guide
Fix Released
Medium
Peter Matulis

Bug Description

Adding 18 new nodes with 3 OSDs on each, with 3.7 TB disks the newly created OSDs were created with only 10 GiB usable block files. Cloud: bionic/queens. Charm revision: commit 578770acecf5b572296357bf6685c030b2b9325e (origin/stable/19.10)

Steps we executed for this procedure:
- Make sure the cluster is healthy
- Prevent data movement:
-- ceph osd set nobackfill
-- ceph osd set norebalance
- Now add all new osd units:
-- juju add-unit ceph-osd -n <as needed> --to <machine ids>
- Wait for all PGs to peer
- Allow data to rebalance
-- ceph osd unset nobackfill
-- ceph osd unset norebalance
- Wait for HEALTH_OK

Filesystem Size Used Avail Use% Mounted on
[...]
/dev/bcache2 3.7T 5.4G 3.7T 1% /srv/ceph/bcache-sdb
/dev/bcache1 3.7T 5.4G 3.7T 1% /srv/ceph/bcache-sdc
/dev/bcache0 3.7T 5.4G 3.7T 1% /srv/ceph/bcache-sdd

# ls -lh /srv/ceph/bcache-sdc/
total 1.7G
-rw-r--r-- 1 root root 525 Jun 24 06:26 activate.monmap
-rw-r--r-- 1 ceph ceph 3 Jun 24 06:26 active
-rw-r--r-- 1 ceph ceph 10G Jun 29 06:20 block
-rw-r--r-- 1 ceph ceph 2 Jun 24 06:26 bluefs
-rw-r--r-- 1 ceph ceph 37 Jun 24 06:25 ceph_fsid
-rw-r--r-- 1 ceph ceph 37 Jun 24 06:25 fsid
-rw------- 1 ceph ceph 58 Jun 24 06:26 keyring
-rw-r--r-- 1 ceph ceph 8 Jun 24 06:26 kv_backend
-rw-r--r-- 1 ceph ceph 21 Jun 24 06:25 magic
-rw-r--r-- 1 ceph ceph 4 Jun 24 06:26 mkfs_done
-rw-r--r-- 1 ceph ceph 6 Jun 24 06:26 ready
-rw-r--r-- 1 ceph ceph 2 Jun 24 06:26 require_osd_release
-rw-r--r-- 1 ceph ceph 0 Jun 24 06:28 systemd
-rw-r--r-- 1 ceph ceph 10 Jun 24 06:25 type

ceph osd tree (same for all new 54 OSDs)

XXX ssd 0.00980 osd.XXX up 1.00000 1.00000

ceph osd df (same for all new 54 OSDs)

XXX ssd 0.00980 1.00000 10GiB 1.40GiB 8.60GiB 13.98 4.28 0

Log for initial OSD creation:

2020-06-24 06:26:08.505939 7efd9d202fc0 0 set uid:gid to 64045:64045 (ceph:ceph)
2020-06-24 06:26:08.505951 7efd9d202fc0 0 ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) luminous (stable), process ceph-osd, pid 29081
2020-06-24 06:26:08.507913 7efd9d202fc0 1 bluestore(/srv/ceph/bcache-sdc) mkfs path /srv/ceph/bcache-sdc
2020-06-24 06:26:08.507944 7efd9d202fc0 -1 bluestore(/srv/ceph/bcache-sdc/block) _read_bdev_label failed to open /srv/ceph/bcache-sdc/block: (2) No such file or directory
2020-06-24 06:26:08.507966 7efd9d202fc0 -1 bluestore(/srv/ceph/bcache-sdc/block) _read_bdev_label failed to open /srv/ceph/bcache-sdc/block: (2) No such file or directory
2020-06-24 06:26:08.508054 7efd9d202fc0 1 bluestore(/srv/ceph/bcache-sdc) _setup_block_symlink_or_file resized block file to 10GiB
2020-06-24 06:26:08.508079 7efd9d202fc0 1 bdev create path /srv/ceph/bcache-sdc/block type kernel
2020-06-24 06:26:08.508089 7efd9d202fc0 1 bdev(0x56079aaa8b40 /srv/ceph/bcache-sdc/block) open path /srv/ceph/bcache-sdc/block
2020-06-24 06:26:08.508294 7efd9d202fc0 1 bdev(0x56079aaa8b40 /srv/ceph/bcache-sdc/block) open size 10737418240 (0x280000000, 10GiB) block_size 4096 (4KiB) non-rotational
2020-06-24 06:26:08.508990 7efd9d202fc0 1 bluestore(/srv/ceph/bcache-sdc) _set_cache_sizes cache_size 3221225472 meta 0.4 kv 0.4 data 0.2
2020-06-24 06:26:08.509078 7efd9d202fc0 1 bdev create path /srv/ceph/bcache-sdc/block type kernel
2020-06-24 06:26:08.509085 7efd9d202fc0 1 bdev(0x56079aaa9200 /srv/ceph/bcache-sdc/block) open path /srv/ceph/bcache-sdc/block
2020-06-24 06:26:08.509255 7efd9d202fc0 1 bdev(0x56079aaa9200 /srv/ceph/bcache-sdc/block) open size 10737418240 (0x280000000, 10GiB) block_size 4096 (4KiB) non-rotational
2020-06-24 06:26:08.509266 7efd9d202fc0 1 bluefs add_block_device bdev 1 path /srv/ceph/bcache-sdc/block size 10GiB
2020-06-24 06:26:08.509269 7efd9d202fc0 1 bluefs add_block_extent bdev 1 0x120000000~40000000
2020-06-24 06:26:08.509297 7efd9d202fc0 1 bluefs mkfs osd_uuid f45082ae-9b7d-41e2-88ea-54a2b908bdf7
2020-06-24 06:26:08.509305 7efd9d202fc0 1 bluefs _init_alloc id 1 alloc_size 0x10000 size 0x280000000
2020-06-24 06:26:08.509375 7efd9d202fc0 1 bluefs mkfs uuid a81b1629-0e86-4119-b419-35b9acbf6cf8
2020-06-24 06:26:08.509995 7efd9d202fc0 1 fbmap_alloc 0x56079adae600 shutdown
2020-06-24 06:26:08.510015 7efd9d202fc0 1 bluefs mount
2020-06-24 06:26:08.510055 7efd9d202fc0 1 bluefs _init_alloc id 1 alloc_size 0x10000 size 0x280000000

Revision history for this message
Diko Parvanov (dparv) wrote :

Workaround for this issue is to:

stop ceph-osd@X services on nodes
unmount ceph volumes
comment out /etc/fstab ceph bcache devices

run-action zap-disk
run-action add-disk

New ceph-osds are created with lvm and with the proper size.

description: updated
Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

Note we're using directory based OSDs here. Also the disk format is unchanged from the default, i.e. using bluestore

juju config ceph-osd osd-devices
/srv/ceph/bcache-sdb /srv/ceph/bcache-sdc /srv/ceph/bcache-sdd

juju config ceph-osd bluestore
true

Revision history for this message
Chris Sanders (chris.sanders) wrote :

I've subscribed filed-high this seems to be a critical issue if its reproducible

Revision history for this message
James Page (james-page) wrote :

"Note we're using directory based OSDs here."

Why? directory based OSD's are not really supported in any meaningful way

Revision history for this message
James Page (james-page) wrote :

Hmm well they might have worked at Queens but that was really for demo/test only

Revision history for this message
Andrea Ieri (aieri) wrote :

I believe this is what happened: this cloud used to be a xenial-ocata using filestore. When it got upgraded to bionic-queens it gained the ability to use bluestore, and the charm started defaulting to bluestore=true. We then expanded the cloud without modifying the charm config, which led to ceph-osd trying to create a bluestore osd in an xfs drive that was really meant to host a filestore osd.

While I agree the charm pretty much tried doing what it was told to, it would be much more user friendly if the unit went into a blocked state when it detects the combination of bluestore=true and the existence of at least one of the osd-devices paths.

Revision history for this message
Billy Olsen (billy-olsen) wrote :

The charm started defaulting to bluestore=True in the 18.08 release. The blocked state is a good idea, if it prevents users from having an incompatible configuration. Something worth probably calling out in the docs on upgrade steps, to make it more clear when upgrading charms that this is something to be concerned with.

Given the charm was doing what it was told to do, though there are improvements that can be made here, I'm going to remove field high at this point in time and send it over for an update on docs at the very least.

Changed in charm-ceph-osd:
status: New → Triaged
importance: Undecided → Low
assignee: nobody → Peter Matulis (petermatulis)
Revision history for this message
Peter Matulis (petermatulis) wrote :

The BlueStore-enabled default should have kicked in at Ceph Luminous, which first appears in xenial-queens. I'll add an upgrade known issue in the deploy guide.

Changed in charm-deployment-guide:
assignee: nobody → Peter Matulis (petermatulis)
Changed in charm-ceph-osd:
assignee: Peter Matulis (petermatulis) → nobody
Changed in charm-deployment-guide:
importance: Undecided → High
status: New → Triaged
Changed in charm-ceph-osd:
importance: Low → Undecided
importance: Undecided → Low
Changed in charm-deployment-guide:
importance: High → Medium
Changed in charm-deployment-guide:
status: Triaged → In Progress
Revision history for this message
Peter Matulis (petermatulis) wrote :
Changed in charm-deployment-guide:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.