Expected-osd-count set to 0 results in "too few PGs per OSD"
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph Monitor Charm |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
I'm deploying my cloud with expected-osd-count set to 0, which supposedly should result in a valid crushmap.
bundle:
https:/
ubuntu@
cluster:
id: b3ade16c-
health: HEALTH_WARN
too few PGs per OSD (4 < min 30)
services:
mon: 3 daemons, quorum juju-8d6642-
mgr: juju-8d6642-
osd: 140 osds: 135 up, 135 in
data:
pools: 1 pools, 200 pgs
objects: 0 objects, 0 bytes
usage: 272 GB used, 491 TB / 491 TB avail
pgs: 200 active+clean
-------
ubuntu@
application: nvme-ceph-mon
charm: ceph-mon
settings:
auth-supported:
default: cephx
description: |
Which authentication flavour to use.
.
Valid options are "cephx" and "none". If "none" is specified,
keys will still be created and deployed so that it can be
enabled later.
source: default
type: string
value: cephx
ceph-
description: |
The IP address and netmask of the cluster (back-side) network (e.g.,
192.
.
If multiple networks are to be used, a space-delimited list of a.b.c.d/x
can be provided.
source: unset
type: string
ceph-
description: |
The IP address and netmask of the public (front-side) network (e.g.,
192.
.
If multiple networks are to be used, a space-delimited list of a.b.c.d/x
can be provided.
source: unset
type: string
config-flags:
description: |
User provided Ceph configuration. Supports a string representation of
a python dictionary where each top-level key represents a section in
the ceph.conf template. You may only use sections supported in the
template.
.
WARNING: this is not the recommended way to configure the underlying
services that this charm installs and is used at the user's own risk.
This option is mainly provided as a stop-gap for users that either
want to test the effect of modifying some config or who have found
a critical bug in the way the charm has configured their services
and need it fixed immediately. We ask that whenever this is used,
that the user consider opening a bug on this charm at
http://
config was needed so that we may consider it for inclusion as a
natively supported config in the the charm.
source: user
type: string
value: '{''global'': {''mon max pg per osd'': 100000}}'
customize-
default: false
description: |
Setting this to true will tell Ceph to replicate across Juju's
Availability Zone instead of specifically by host.
source: user
type: boolean
value: true
default-
description: |
Restrict the rbd features used to the specified level. If set, this will
inform clients that they should set the config value `rbd default
features`, for example:
.
rbd default features = 1
.
This needs to be set to 1 when deploying a cloud with the nova-lxd
hypervisor.
source: unset
type: int
expected-
default: 0
description: |
Number of OSDs expected to be deployed in the cluster. This value is used
for calculating the number of placement groups on pool creation. The
number of placement groups for new pools are based on the actual number
of OSDs in the cluster or the expected-osd-count, whichever is greater
A value of 0 will cause the charm to only consider the actual number of
OSDs in the cluster.
source: default
type: int
value: 0
harden:
description: |
Apply system hardening. Supports a space-delimited list of modules
to run. Supported modules currently include os, ssh, apache and mysql.
source: unset
type: string
key:
description: |
Key ID to import to the apt keyring to support use with arbitary source
configuration from outside of Launchpad archives or PPA's.
source: unset
type: string
loglevel:
default: 1
description: Mon and OSD debug level. Max is 20.
source: default
type: int
value: 1
monitor-count:
default: 3
description: |
Number of ceph-mon units to wait for before attempting to bootstrap the
monitor cluster. For production clusters the default value of 3 ceph-mon
units is normally a good choice.
.
For test and development environments you can enable single-unit
deployment by setting this to 1.
.
NOTE: To establish quorum and enable partition tolerance a odd number of
ceph-mon units is required.
source: default
type: int
value: 3
monitor-hosts:
description: |
A space-separated list of ceph mon hosts to use. This field is only used
to migrate an existing cluster to a juju-managed solution and should
otherwise be left unset.
source: unset
type: string
monitor-secret:
description: |
The Ceph secret key used by Ceph monitors. This value will become the
mon.key. To generate a suitable value use:
.
.
If left empty, a secret key will be generated.
.
NOTE: Changing this configuration after deployment is not supported and
new service units will not be able to join the cluster.
source: user
type: string
value: AQA8UzZbA1MwMhA
nagios_context:
default: juju
description: |
Used by the nrpe-external-
A string that will be prepended to instance name to set the hostname
in nagios. So for instance the hostname would be something like:
.
.
If you're running multiple environments with the same services in them
this allows you to differentiate between them.
source: default
type: string
value: juju
nagios_
default: 1
description: Threshold for degraded ratio (0.1 = 10%)
source: default
type: float
value: 1
nagios_
default: false
description: Whether to ignore the nodeep-scrub flag
source: default
type: boolean
value: false
nagios_
default: 10
description: Threshold for misplaced ratio (0.1 = 10%)
source: default
type: float
value: 10
nagios_
default: "1"
description: Recovery rate below which we consider recovery to be stalled
source: default
type: string
value: "1"
nagios_
default: ""
description: |
A comma-separated list of nagios servicegroups. If left empty, the
nagios_
source: default
type: string
value: ""
no-bootstrap:
default: false
description: |
Causes the charm to not do any of the initial bootstrapping of the
Ceph monitor cluster. This is only intended to be used when migrating
from the ceph all-in-one charm to a ceph-mon / ceph-osd deployment.
Refer to the Charm Deployment guide at https:/
for more information.
source: default
type: boolean
value: false
pgs-per-osd:
default: 100
description: |
The number of placement groups per OSD to target. It is important to
properly size the number of placement groups per OSD as too many
or too few placement groups per OSD may cause resource constraints and
performance degradation. This value comes from the recommendation of
the Ceph placement group calculator (http://
recommended values are:
.
100 - If the cluster OSD count is not expected to increase in the
200 - If the cluster OSD count is expected to increase (up to 2x) in the
300 - If the cluster OSD count is expected to increase between 2x and 3x
in the foreseeable future.
source: default
type: int
value: 100
prefer-ipv6:
default: false
description: |
If True enables IPv6 support. The charm will expect network interfaces
to be configured with an IPv6 address. If set to False (default) IPv4
is expected.
.
NOTE: these charms do not currently support IPv6 privacy extension. In
order for this charm to function correctly, the privacy extension must be
disabled and a non-temporary address must be configured/
your network interface.
source: default
type: boolean
value: false
source:
description: |
Optional configuration to support use of additional sources such as:
.
- ppa:myteam/ppa
- cloud:xenial-
- http://
.
The last option should be used in conjunction with the key configuration
option.
source: user
type: string
value: cloud:xenial-queens
sysctl:
default: '{ kernel.pid_max : 2097152, vm.max_map_count : 524288, kernel.threads-max:
2097152 }'
description: "YAML-formatted associative array of sysctl key/value pairs to be
set\
to a high value to avoid problems with large numbers (>20)\nof OSDs recovering.
very large clusters should set those values even\nhigher (e.g. max for kernel.pid_max
is 4194303).\n"
source: default
type: string
value: '{ kernel.pid_max : 2097152, vm.max_map_count : 524288, kernel.threads-max:
2097152 }'
use-direct-io:
default: true
description: Configure use of direct IO for OSD journals.
source: default
type: boolean
value: true
use-syslog:
default: false
description: |
If set to True, supporting services will log to syslog.
source: default
type: boolean
value: false
-------
ubuntu@
application: nvme-ceph-osd
charm: ceph-osd
settings:
aa-profile-mode:
default: disable
description: |
Enable apparmor profile. Valid settings: 'complain', 'enforce' or
'disable'.
.
NOTE: changing the value of this option is disruptive to a running Ceph
cluster as all ceph-osd processes must be restarted as part of changing
the apparmor profile enforcement mode. Always test in pre-production
before enabling AppArmor on a live cluster.
source: user
type: string
value: complain
autotune:
default: false
description: |
Enabling this option will attempt to tune your network card sysctls and
hard drive settings. This changes hard drive read ahead settings and
max_
and make appropriate sysctl changes. Enabling this option should
generally be safe.
source: user
type: boolean
value: true
availability_
description: |
Custom availability zone to provide to Ceph for the OSD placement
source: unset
type: string
bluestore:
default: false
description: |
Use experimental bluestore storage format for OSD devices; only supported
in Ceph Jewel (10.2.0) or later.
.
Note that despite bluestore being the default for Ceph Luminous, if this
option is False, OSDs will still use filestore.
source: user
type: boolean
value: true
bluestore-
default: 0
description: |
Size of a partition or file to use for BlueStore metadata
or RocksDB SSTs. A default value is not set as it is calculated
by ceph-disk if not specified.
source: default
type: int
value: 0
bluestore-
default: 0
description: |
Size of a partition or file to use for BlueStore WAL (RocksDB WAL)
A default value is not set as it is calculated by ceph-disk if
not specified.
source: default
type: int
value: 0
bluestore-db:
description: |
Path to a BlueStore WAL db block device or file
source: user
type: string
value: /dev/nvme10n1
bluestore-wal:
description: |
Path to a BlueStore WAL block device or file.
source: user
type: string
value: /dev/nvme10n1
ceph-
description: |
The IP address and netmask of the cluster (back-side) network (e.g.,
192.
.
If multiple networks are to be used, a space-delimited list of a.b.c.d/x
can be provided.
source: unset
type: string
ceph-
description: |
The IP address and netmask of the public (front-side) network (e.g.,
192.
.
If multiple networks are to be used, a space-delimited list of a.b.c.d/x
can be provided.
source: unset
type: string
config-flags:
description: |
User provided Ceph configuration. Supports a string representation of
a python dictionary where each top-level key represents a section in
the ceph.conf template. You may only use sections supported in the
template.
.
WARNING: this is not the recommended way to configure the underlying
services that this charm installs and is used at the user's own risk.
This option is mainly provided as a stop-gap for users that either
want to test the effect of modifying some config or who have found
a critical bug in the way the charm has configured their services
and need it fixed immediately. We ask that whenever this is used,
that the user consider opening a bug on this charm at
http://
config was needed so that we may consider it for inclusion as a
natively supported config in the the charm.
source: unset
type: string
crush-
description: |
The initial crush weight for newly added osds into crushmap. Use this
option only if you wish to set the weight for newly added OSDs in order
to gradually increase the weight over time. Be very aware that setting
this overrides the default setting, which can lead to imbalance in the
cluster, especially if there are OSDs of different sizes in use. By
default, the initial crush weight for the newly added osd is set to its
volume size in TB. Leave this option unset to use the default provided
by Ceph itself. This option only affects NEW OSDs, not existing ones.
source: unset
type: float
customize-
default: false
description: |
Setting this to true will tell Ceph to replicate across Juju's
Availability Zone instead of specifically by host.
source: user
type: boolean
value: true
ephemeral-
description: |
Cloud instances provide ephermeral storage which is normally mounted
on /mnt.
.
Setting this option to the path of the ephemeral mountpoint will force
an unmount of the corresponding device so that it can be used as a OSD
storage device. This is useful for testing purposes (cloud deployment
is not a typical use case).
source: unset
type: string
harden:
description: |
Apply system hardening. Supports a space-delimited list of modules
to run. Supported modules currently include os, ssh, apache and mysql.
source: unset
type: string
ignore-
default: false
description: |
By default, the charm will raise errors if a whitelisted device is found,
but for some reason the charm is unable to initialize the device for use
by Ceph.
.
Setting this option to 'True' will result in the charm classifying such
problems as warnings only and will not result in a hook error.
source: default
type: boolean
value: false
key:
description: |
Key ID to import to the apt keyring to support use with arbitary source
configuration from outside of Launchpad archives or PPA's.
source: unset
type: string
loglevel:
default: 1
description: OSD debug level. Max is 20.
source: default
type: int
value: 1
max-sectors-kb:
default: 1.048576e+06
description: |
This parameter will adjust every block device in your server to allow
greater IO operation sizes. If you have a RAID card with cache on it
consider tuning this much higher than the 1MB default. 1MB is a safe
default for spinning HDDs that don't have much cache.
source: default
type: int
value: 1.048576e+06
nagios_context:
default: juju
description: |
Used by the nrpe-external-
A string that will be prepended to instance name to set the hostname
in nagios. So for instance the hostname would be something like:
.
.
If you're running multiple environments with the same services in them
this allows you to differentiate between them.
source: default
type: string
value: juju
nagios_
default: ""
description: |
A comma-separated list of nagios servicegroups.
If left empty, the nagios_context will be used as the servicegroup
source: default
type: string
value: ""
osd-devices:
default: /dev/vdb
description: |
The devices to format and set up as OSD volumes.
.
These devices are the range of devices that will be checked for and
used across all service units, in addition to any volumes attached
via the --storage flag during deployment.
.
For ceph >= 0.56.6 these can also be directories instead of devices - the
charm assumes anything not starting with /dev is a directory instead.
source: user
type: string
value: /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1
/dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1
osd-encrypt:
default: false
description: |
By default, the charm will not encrypt Ceph OSD devices; however, by
setting osd-encrypt to True, Ceph's dmcrypt support will be used to
encrypt OSD devices.
.
Specifying this option on a running Ceph OSD node will have no effect
until new disks are added, at which point new disks will be encrypted.
source: user
type: boolean
value: true
osd-encrypt-
default: ceph
description: |
Keymanager to use for storage of dm-crypt keys used for OSD devices;
by default 'ceph' itself will be used for storage of keys, making use
of the key/value storage provided by the ceph-mon cluster.
.
Alternatively 'vault' may be used for storage of dm-crypt keys. Both
approaches ensure that keys are never written to the local filesystem.
This also requires a relation to the vault charm.
source: user
type: string
value: vault
osd-format:
default: xfs
description: |
Format of filesystem to use for OSD devices; supported formats include:
.
xfs (Default >= 0.48.3)
ext4 (Only option < 0.48.3)
btrfs (experimental and not recommended)
.
Only supported with ceph >= 0.48.3.
source: default
type: string
value: xfs
osd-journal:
description: |
The device to use as a shared journal drive for all OSD's. By default
a journal partition will be created on each OSD volume device for use by
that OSD.
.
Only supported with ceph >= 0.48.3.
source: unset
type: string
osd-journal-size:
default: 1024
description: |
Ceph OSD journal size. The journal size should be at least twice the
product of the expected drive speed multiplied by filestore max sync
interval. However, the most common practice is to partition the journal
drive (often an SSD), and mount it such that Ceph uses the entire
partition for the journal.
.
Only supported with ceph >= 0.48.3.
source: default
type: int
value: 1024
osd-max-
description: |
The maximum number of backfills allowed to or from a single OSD.
.
Setting this option on a running Ceph OSD node will not affect running
OSD devices, but will add the setting to ceph.conf for the next restart.
source: unset
type: int
osd-recovery-
description: |
The number of active recovery requests per OSD at one time. More requests
will accelerate recovery, but the requests places an increased load on the
cluster.
.
Setting this option on a running Ceph OSD node will not affect running
OSD devices, but will add the setting to ceph.conf for the next restart.
source: unset
type: int
prefer-ipv6:
default: false
description: |
If True enables IPv6 support. The charm will expect network interfaces
to be configured with an IPv6 address. If set to False (default) IPv4
is expected.
.
NOTE: these charms do not currently support IPv6 privacy extension. In
order for this charm to function correctly, the privacy extension must be
disabled and a non-temporary address must be configured/
your network interface.
source: default
type: boolean
value: false
source:
description: |
Optional configuration to support use of additional sources such as:
.
- ppa:myteam/ppa
- cloud:xenial-
- http://
.
The last option should be used in conjunction with the key configuration
option.
source: user
type: string
value: cloud:xenial-queens
sysctl:
default: '{ kernel.pid_max : 2097152, vm.max_map_count : 524288, kernel.threads-max:
2097152, vm.vfs_
description: |
YAML-
persistently. By default we set pid_max, max_map_count and
threads-max to a high value to avoid problems with large numbers (>20)
of OSDs recovering. very large clusters should set those values even
higher (e.g. max for kernel.pid_max is 4194303).
source: user
type: string
value: |
net.
net.
net.
net.
net.
net.
net.
net.
net.
net.
net.
net.
use-direct-io:
default: true
description: Configure use of direct IO for OSD journals.
source: default
type: boolean
value: true
use-syslog:
default: false
description: |
If set to True, supporting services will log to syslog.
source: default
type: boolean
value: false
"A value of 0 will cause the charm to only consider the actual number of OSDs in the cluster."
If you have that many OSD's you really do need to set the expected-osd-count; the PG calc will happen early in the deployment based on the number of in OSD's at that point in time, which will be inaccurate and then the OSD's join, the PG's spread and the number per OSD drops well below the target of 100/200/300.