Ceph OSD Charm

ceph-osd fails to start machine restart due to incorrect ceph.conf

Bug #2049770 reported by Rafał Krzewski on 2024-01-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Ceph OSD Charm	Expired	Undecided	Unassigned

Bug Description

I'm setting up a Charmed Kubernetes cluster on 3 bare metal machines managed by MAAS. Due to limited number of physical machines I am running a number of Juju units in LXD containers.

The overlay file looks as follows:

applications:
  ceph-mon:
    charm: ceph-mon
    channel: quincy/stable
    revision: 195
    num_units: 3
    to:
    - lxd:0
    - lxd:1
    - lxd:2
  ceph-osd:
    charm: ceph-osd
    channel: quincy/stable
    revision: 576
    num_units: 3
    to:
    - "0"
    - "1"
    - "2"
    options:
      osd-devices: /dev/nvme0n1
  ceph-fs:
    charm: ceph-fs
    channel: quincy/stable
    revision: 60
    num_units: 3
    to:
    - lxd:0
    - lxd:1
    - lxd:2
  ceph-csi:
    charm: ceph-csi
    channel: stable
    revision: 37
    options:
      namespace: kube-system
      cephfs-enable: true
relations:
- [ceph-osd:mon, ceph-mon:osd]
- [ceph-fs:ceph-mds, ceph-mon:mds]
- [ceph-mon:client, ceph-csi:ceph-client]
- [kubernetes-control-plane:juju-info, ceph-csi:kubernetes]

The only workload that is running unconfined on the machines is kubernetes-control-plane. Everything else is running either in LXD or in Kubernetes.

Deployment works fine, all units start up and Ceph StorageClasses are available in Kubernetes.

Trouble begins after restarting a node. ceph-osd unit on the node does not come up. Juju reports the following status message: "No block devices detected using current configuration"

It turns out that the ceph-osd systemd service is not running:

root@stagnum3:/home/ubuntu# systemctl status ceph-osd@0
× ceph-osd@0.service - Ceph object storage daemon osd.0
     Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled)
     Active: failed (Result: exit-code) since Mon 2024-01-15 21:36:32 UTC; 2 days ago
    Process: 7258 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 0 (code=exited, status=0/SUCCESS)
    Process: 7262 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id 0 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
   Main PID: 7262 (code=exited, status=1/FAILURE)
        CPU: 105ms

Jan 15 21:36:32 stagnum3 systemd[1]: ceph-osd@0.service: Scheduled restart job, restart counter is at 4.
Jan 15 21:36:32 stagnum3 systemd[1]: Stopped Ceph object storage daemon osd.0.
Jan 15 21:36:32 stagnum3 systemd[1]: ceph-osd@0.service: Start request repeated too quickly.
Jan 15 21:36:32 stagnum3 systemd[1]: ceph-osd@0.service: Failed with result 'exit-code'.
Jan 15 21:36:32 stagnum3 systemd[1]: Failed to start Ceph object storage daemon osd.0.

journalctl shows the following:

Jan 15 21:36:11 stagnum3 systemd[1]: Started Ceph object storage daemon osd.0.
Jan 15 21:36:11 stagnum3 ceph-osd[6585]: 2024-01-15T21:36:11.682+0000 7f3717f67800 -1 auth: unable to find a keyring on /etc/ceph/ceph.osd.0.keyring: (2) No such file or directory
Jan 15 21:36:11 stagnum3 ceph-osd[6585]: 2024-01-15T21:36:11.682+0000 7f3717f67800 -1 auth: unable to find a keyring on /etc/ceph/ceph.osd.0.keyring: (2) No such file or directory
Jan 15 21:36:11 stagnum3 ceph-osd[6585]: 2024-01-15T21:36:11.682+0000 7f3717f67800 -1 AuthRegistry(0x56435af66138) no keyring found at /etc/ceph/ceph.osd.0.keyring, disabling cephx
Jan 15 21:36:11 stagnum3 ceph-osd[6585]: 2024-01-15T21:36:11.682+0000 7f3717f67800 -1 auth: unable to find a keyring on /etc/ceph/ceph.osd.0.keyring: (2) No such file or directory
Jan 15 21:36:11 stagnum3 ceph-osd[6585]: 2024-01-15T21:36:11.682+0000 7f3717f67800 -1 AuthRegistry(0x7ffdfa515c20) no keyring found at /etc/ceph/ceph.osd.0.keyring, disabling cephx
Jan 15 21:36:11 stagnum3 ceph-osd[6585]: failed to fetch mon config (--no-mon-config to skip)

Indeed, /etc/ceph/ceph.osd.0.keyring does not exist. The keyring is at /var/lib/ceph/osd/ceph-0/keyring and the location should be configured in /etc/ceph/ceph.conf

ceph.conf has the following contents:

[global]
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
keyring = /etc/ceph/$cluster.$name.keyring
mon host = 192.168.3.21 192.168.3.45 192.168.3.56
log to syslog = true
err to syslog = true
clog to syslog = true
mon cluster log to syslog = true
debug mon = 1/5
debug osd = 1/5

[client]
log file = /var/log/ceph.log

Notice that the file does not contain fsid setting nor public addr, cluster addr settings.

In one of the earlier iterations of the cluster I had a similar situation: two nodes had incorrect configuration but the third one (I can't remember if it was the leader node for ceph-osd or ceph-mon Juju applications) had correct configuration that contained fsid and addr settings and also [osd] section with
keyring = /var/lib/ceph/osd/$cluster-$id/keyring setting. I was able to recover the cluster by copying the file to the other nodes, substituting addr settings and restarting ceph-osd service using systemctl. The files were overwritten with incorrect contents shortly after, presumably by juju agent.

If there is something I can do to help fixing this please let me know. I can tear down and reinstall the cluster if needed - I definitely can't hand over my cluster to the users until this is resolved.

Revision history for this message

Peter Sabaini (peter-sabaini) wrote on 2024-01-18:

As you noted the ceph.conf looks incomplete. Naturally the ceph-osd charm should manage that -- so something must have gone wrong there.

Would you be able to provide juju logs from the ceph-{osd,mon} units, and ideally sosreports for those as well?

TIA

Changed in charm-ceph-osd:
status:	New → Incomplete

Revision history for this message

Rafał Krzewski (rafal-krzewski) wrote on 2024-01-19:

debug-log --replay --include ceph-osd/2
unit-ceph-osd-1: 17:22:02 INFO unit.ceph-osd/2.juju-log Updating status.
unit-ceph-osd-1: 17:22:02 INFO juju.worker.uniter.operation ran "update-status" hook (via explicit, bespoke hook script)

I scrolled back and cant's see any other messages. Logs from working unit look the same:

debug-log --replay --include ceph-osd/1
unit-ceph-osd-1: 17:27:59 INFO unit.ceph-osd/1.juju-log Updating status.
unit-ceph-osd-1: 17:28:00 INFO juju.worker.uniter.operation ran "update-status" hook (via explicit, bespoke hook script)

ceph-mon/1 is the leader

juju debug-log --replay --include ceph-mon/1
unit-ceph-mon-1: 17:27:29 WARNING unit.ceph-mon/1.juju-log 0 containers are present in metadata.yaml and refresh_event was not specified. Defaulting to update_status. Metrics IP may not be set in a timely fashion.
unit-ceph-mon-1: 17:27:29 INFO unit.ceph-mon/1.juju-log Updating status
unit-ceph-mon-1: 17:27:29 INFO unit.ceph-mon/1.juju-log Status updated
unit-ceph-mon-1: 17:27:30 INFO unit.ceph-mon/1.juju-log Updating status
unit-ceph-mon-1: 17:27:30 INFO unit.ceph-mon/1.juju-log Status updated
unit-ceph-mon-1: 17:27:30 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)

scrolling back does not reveal any other messages.

I think this is because it's been a few days since I've rebooted the machine 2 and juju logs seem to go back <24h.

I'll try rebooting another machine and see if any other messages show up and report back.

I've never used sosreports before. I see it's available on ubuntu. Do you recommend any specific options I should use? Should I attach the results to the bug or upload somewhere else (where?) and post links? Do you need sosreports from one unit of each application or from all of them?

juju status
...
ceph-osd/0                   active    idle   0        192.168.3.46                  Unit is ready (1 OSD)
ceph-osd/1*                  active    idle   1        192.168.3.57                  Unit is ready (1 OSD)
ceph-osd/2                   blocked   idle   2        192.168.3.30                  No block devices detected using current configuration
...

I scrolled back and cant's see any other messages. Logs from working unit look the same:

ceph-mon/1 is the leader

scrolling back does not reveal any other messages.

I think this is because it's been a few days since I've rebooted the machine 2 and juju logs seem to go back <24h.

I'll try rebooting another machine and see if any other messages show up and report back.

Revision history for this message

Peter Sabaini (peter-sabaini) wrote on 2024-01-19:

Maybe the juju unit log files have been rotated -- if so the older log files would be available in /var/log/juju on the units.

Let's take a look at these first, can check later if we indeed need the sosreports.

Revision history for this message

Rafał Krzewski (rafal-krzewski) wrote on 2024-01-19:

Download full text (3.6 KiB)

I did an experiment: I've installed the cluster step by step using CLI instead of bundle to catch at what point /etc/ceph/ceph.conf gets clobbered:

juju add-machine -n 3

juju deploy ceph-mon --channel quincy/stable -n 3 --to lxd:0,lxd:1,lxd:2

juju deploy ceph-osd --channel quincy/stable -n 3 --to 0,1,2\
--config osd-devices=/dev/nvme0n1

juju integrate ceph-mon:osd ceph-osd:mon

# at this point /etc/ceph/ceph.conf shows up on machines 0..2 with correct contents

juju deploy ceph-fs --channel quincy/stable -n 3 --to lxd:0,lxd:1,lxd:2

juju integrate ceph-mon:mds ceph-fs:ceph-mds

juju deploy easyrsa --channel 1.28/stable -n 3 --to lxd:0,lxd:1,lxd:2

juju deploy etcd --channel 1.28/stable -n 3 --to lxd:0,lxd:1,lxd:2

juju integrate easyrsa:client etcd:certificates

juju deploy kubernetes-control-plane --channel 1.28/stable -n 3 --to 0,1,2 \
  --config extra_sans="127.0.0.1 192.168.3.5 k8s.stagnum.caltha.eu"\
  --config loadbalancer-ips=192.168.3.5\
  --config service-cidr=10.152.180.0/22\
  --config register-with-taints=""\
  --config proxy-extra-config="{
          mode: ipvs,
          ipvs: {
            strictARP: true
          }
        }"\
  --config sysctl="{
          net.bridge.bridge-nf-call-iptables: 0,
          net.ipv4.conf.all.forwarding: 1,
          net.ipv4.conf.all.rp_filter: 1,
          net.ipv4.neigh.default.gc_thresh1: 128,
          net.ipv4.neigh.default.gc_thresh2: 28672,
          net.ipv4.neigh.default.gc_thresh3: 32768,
          net.ipv6.neigh.default.gc_thresh1: 128,
          net.ipv6.neigh.default.gc_thresh2: 28672,
          net.ipv6.neigh.default.gc_thresh3: 32768,
          fs.inotify.max_user_instances: 8192,
          fs.inotify.max_user_watches: 1048576,
          kernel.panic: 10,
          kernel.panic_on_oops: 1,
          vm.overcommit_memory: 1
        }"\
  --config allow-privileged=true

juju integrate easyrsa:client kubernetes-control-plane:certificates

juju integrate etcd:db kubernetes-control-plane:etcd

juju deploy containerd --channel 1.28/stable

juju integrate kubernetes-control-plane:container-runtime containerd:containerd

juju deploy calico --channel 1.28/stable\
--config cidr=92.168.64.0/20

juju integrate etcd:db calico:etcd

juju integrate kubernetes-control-plane:cni calico:cni

juju deploy kubeapi-load-balancer --channel 1.28/stable -n 3 --to lxd:0,lxd:1,lxd:2\
--config extra_sans="127.0.0.1 192.168.3.5 k8s.stagnum.caltha.eu"

juju deploy keepalived --channel stable\
--config vip_hostname=k8s.stagnum.caltha.eu\
--config virtual_ip=192.168.3.5

juju integrate easyrsa:client kubeapi-load-balancer:certificates

juju integrate kubeapi-load-balancer:juju-info keepalived:juju-info

juju integrate kubernetes-control-plane:loadbalancer-internal kubeapi-load-balancer:lb-consumers

juju integrate kubernetes-control-plane:loadbalancer-external kubeapi-load-balancer:lb-consumers

juju deploy ceph-csi --channel stable\
--config namespace=kube-system\
--config cephfs-enable=true

juju integrate ceph-csi:kubernetes kubernetes-control-plane:juju-info

juju integrate ceph-csi:ceph-client ceph-mon:client

# as soon as ceph-csi starts, /etc/ceph/ceph.conf on machines 0...

I did an experiment: I've installed the cluster step by step using CLI instead of bundle to catch at what point /etc/ceph/ceph.conf gets clobbered:

juju add-machine -n 3

juju deploy ceph-mon --channel quincy/stable -n 3 --to lxd:0,lxd:1,lxd:2

juju deploy ceph-osd --channel quincy/stable -n 3 --to 0,1,2\
  --config osd-devices=/dev/nvme0n1

juju integrate ceph-mon:osd ceph-osd:mon

# at this point /etc/ceph/ceph.conf shows up on machines 0..2 with correct contents

juju deploy ceph-fs --channel quincy/stable -n 3 --to lxd:0,lxd:1,lxd:2

juju integrate ceph-mon:mds ceph-fs:ceph-mds

juju deploy easyrsa --channel 1.28/stable -n 3 --to lxd:0,lxd:1,lxd:2

juju deploy etcd --channel 1.28/stable -n 3 --to lxd:0,lxd:1,lxd:2

juju integrate easyrsa:client etcd:certificates
  
juju deploy kubernetes-control-plane --channel 1.28/stable  -n 3 --to 0,1,2 \
  --config extra_sans="127.0.0.1 192.168.3.5 k8s.stagnum.caltha.eu"\
  --config loadbalancer-ips=192.168.3.5\
  --config service-cidr=10.152.180.0/22\
  --config register-with-taints=""\
  --config proxy-extra-config="{
          mode: ipvs, 
          ipvs: {
            strictARP: true
          }
        }"\
  --config sysctl="{
          net.bridge.bridge-nf-call-iptables: 0,
          net.ipv4.conf.all.forwarding: 1, 
          net.ipv4.conf.all.rp_filter: 1, 
          net.ipv4.neigh.default.gc_thresh1: 128, 
          net.ipv4.neigh.default.gc_thresh2: 28672, 
          net.ipv4.neigh.default.gc_thresh3: 32768, 
          net.ipv6.neigh.default.gc_thresh1: 128, 
          net.ipv6.neigh.default.gc_thresh2: 28672, 
          net.ipv6.neigh.default.gc_thresh3: 32768, 
          fs.inotify.max_user_instances: 8192, 
          fs.inotify.max_user_watches: 1048576, 
          kernel.panic: 10, 
          kernel.panic_on_oops: 1, 
          vm.overcommit_memory: 1
        }"\
  --config allow-privileged=true

juju integrate easyrsa:client kubernetes-control-plane:certificates

juju integrate etcd:db kubernetes-control-plane:etcd

juju deploy containerd --channel 1.28/stable

juju integrate kubernetes-control-plane:container-runtime containerd:containerd

juju deploy calico --channel 1.28/stable\
  --config cidr=92.168.64.0/20

juju integrate etcd:db calico:etcd

juju integrate kubernetes-control-plane:cni calico:cni

juju deploy kubeapi-load-balancer --channel 1.28/stable -n 3 --to lxd:0,lxd:1,lxd:2\
  --config extra_sans="127.0.0.1 192.168.3.5 k8s.stagnum.caltha.eu"

juju deploy keepalived --channel stable\
  --config vip_hostname=k8s.stagnum.caltha.eu\
  --config virtual_ip=192.168.3.5

juju integrate easyrsa:client kubeapi-load-balancer:certificates

juju integrate kubeapi-load-balancer:juju-info keepalived:juju-info

juju integrate kubernetes-control-plane:loadbalancer-internal kubeapi-load-balancer:lb-consumers

juju integrate kubernetes-control-plane:loadbalancer-external kubeapi-load-balancer:lb-consumers

juju deploy ceph-csi --channel stable\
  --config namespace=kube-system\
  --config cephfs-enable=true

juju integrate ceph-csi:kubernetes kubernetes-control-plane:juju-info

juju integrate ceph-csi:ceph-client ceph-mon:client

# as soon as ceph-csi starts, /etc/ceph/ceph.conf on machines 0..2 gets overwritten with incorrect contents

My hypothesis is that running kubernetes-control-plane and ceph-osd units on the same host unconfined by LXD causes the interference. I need to run ceph-osd unconfined to let it access the physical disks (I think) but I can try to run kubernetes-control-plane in LXD containers. I tried it before at some point before and it seemed to work, but I thought I'd better run it unconfined to avoid any potential containerd and calico problems. I'll try it and report back.

Revision history for this message

Peter Sabaini (peter-sabaini) wrote on 2024-01-19:

Just to confirm your hypothesis, indeed ceph-csi writes to /etc/ceph/ceph.conf so you will need to separate them somehow

https://github.com/charmed-kubernetes/ceph-csi-operator/blob/52dc3d10048c46a1f903238dfd253f1eab3e10b2/src/charm.py#L189

Right as you say ceph-osd needs to be unconfined for raw disk access

Revision history for this message

Rafał Krzewski (rafal-krzewski) wrote on 2024-01-19:

After redeploying the cluster with kubernetes-control-plane units in LXD containers I've run into another problem:

Warning FailedMount 10m kubelet MountVolume.MountDevice failed for volume "pvc-235b7bf7-b4b7-443d-a810-b1acc12eed45" : rpc error: code = Internal desc = rbd: map failed with error an error (exit status 22) occurred while running rbd args: [--id ceph-csi -m 192.168.3.45,192.168.3.46,192.168.3.60 --keyfile=***stripped*** map xfs-pool/csi-vol-79150a03-d4df-45f6-a339-d919a0184236 --device-type krbd --options noudev], rbd error output: rbd: mapping succeeded but /dev/rbd0 is not accessible, is host /dev mounted?
rbd: map failed: (22) Invalid argument

I can see /dev/rbd0 device on machine 2 but not in 2/lxd/5 container where kubelet is running.

I've tried setting ceph-csi cephfs-mounter=ceph-fuse and now I see the following message:

Warning FailedMount 64s (x5 over 9m13s) kubelet MountVolume.MountDevice failed for volume "pvc-235b7bf7-b4b7-443d-a810-b1acc12eed45" : rpc error: code = Internal desc = exit status 1

I don't know if changing the setting on already deployed cluster doesn't work or I have run into yet another limitation. I've looked into systemctl logs of snap.kubelet.daemon.service on 2/lxd/5 but it does not shows any more details about the error.

Next thing I'm going to try is destroying the model and redeploying with cephfs-mounter=ceph-fuse from the get go. Perhaps you have any other suggestions?

Revision history for this message

Rafał Krzewski (rafal-krzewski) wrote on 2024-01-19:

Oh, is cephfs-mounter used with RBD devices at all?

If not, is there any way to make host's /dev/rbd* devices available in the LXD container?

I have only 3 bare metal machines to build this cluster so I need to co-locate ceph-osd and kubernetes-control-plane units somehow.

Revision history for this message

Billy Olsen (billy-olsen) wrote on 2024-01-19:

Instead of a lxd container, you can consider a kvm instance. It will run isolated as a virtual machine, which avoids some of the limitations on the privilege restrictions in lxd containers for security reasons.

For placement, instead of --to lxd:machine#, you simply use kvm:machine# and you get a kvm instance instead.

Revision history for this message

Rafał Krzewski (rafal-krzewski) wrote on 2024-01-20:

Since running ceph-osd in kvm is a non-starter, I guess I should run kubernetes-control-plane in kvm then. But will it use host resources effectively though? Note that the actual k8s workloads would also run in the kvm in this setup.

Maybe ceph-csi charm could be modified to store it's ceph client configuration at location other than /etc/ceph/ceph.conf? That would solve the problem I'm facing, and I think that I'm not the only person who would like to setup Charmed Kubernetes on a small number of bare metal machines.

Revision history for this message

Rafał Krzewski (rafal-krzewski) wrote on 2024-01-20:

#10

Download full text (5.2 KiB)

I'm trying to deploy kubernetes-control-plane to kvm but it turns out to be difficult.

I've used the following placement directive for kubernetes-control-plane in the bundle definition:

    to:
    - kvm:0
    - kvm:1
    - kvm:2
    constraints: cores=60 mem=240G

It failed to deploy, with the following message displayed for the machine:

no obvious space for container "1/lxd/0", host machine has spaces: "stagnum", "undefined"

MAAS shows following sppaces:

stagnum untagged MAAS-provided fabric-0 192.168.3.0/24 16%
No space untagged No DHCP fabric-1 192.168.122.0/24 100%

I gather that "undefined" is "No space" one created for 192.168.122.0/24 likely coming from KVM/QEMU

I tried changing the constraints to "cores=60 mem=240G spaces=stagnum" but now I'm getting the following deployment error:

matching subnets to zones: cannot use space "alpha" as deployment target: no subnets

Where did the "alpha" space come from? And why is kubernetes-control-plane assigned to it despite spaces=stagnum constraint? I didn't have to touch anything space-related until now...

I went back and redeployed the cluster with unconfiened kubernetes-control-plane. Apparently all applications are assigned to "alpha" space also with this setup

$ juju spaces
Name Space ID Subnets
alpha 0
stagnum 1 192.168.3.0/24
undefined 2 192.168.122.0/24
$ juju show-space alpha
space:
  id: "0"
  name: alpha
  subnets: []
applications:
- calico
- ceph-fs
- ceph-mon
- ceph-osd
- containerd
- easyrsa
- etcd
- keepalived
- kubeapi-load-balancer
- kubernetes-control-plane
machine-count: 0
$ juju show-space stagnum
space:
  id: "1"
  name: stagnum
  subnets:
  - cidr: 192.168.3.0/24
    provider-id: "1"
    vlan-tag: 0
applications: []
machine-count: 18

Despite that, all machines in the model are getting IPs in 192.168.3.0/24 block:

For the next test, I've destroyed the model, recreated it and ran `juju mod...

I'm trying to deploy kubernetes-control-plane to kvm but it turns out to be difficult.

I've used the following placement directive for kubernetes-control-plane in the bundle definition:

to:
    - kvm:0
    - kvm:1
    - kvm:2
    constraints: cores=60 mem=240G

It failed to deploy, with the following message displayed for the machine:

no obvious space for container "1/lxd/0", host machine has spaces: "stagnum", "undefined"

MAAS shows following sppaces:

stagnum		untagged	MAAS-provided		fabric-0	192.168.3.0/24		16%
No space	untagged	No DHCP			fabric-1	192.168.122.0/24	100%

I gather that "undefined" is "No space" one created for 192.168.122.0/24 likely coming from KVM/QEMU

I tried changing the constraints to "cores=60 mem=240G spaces=stagnum" but now I'm getting the following deployment error:

matching subnets to zones: cannot use space "alpha" as deployment target: no subnets

$ juju spaces
Name       Space ID  Subnets         
alpha      0                         
stagnum    1         192.168.3.0/24  
undefined  2         192.168.122.0/24
$ juju show-space alpha
space:
  id: "0"
  name: alpha
  subnets: []
applications:
- calico
- ceph-csi
- ceph-fs
- ceph-mon
- ceph-osd
- containerd
- easyrsa
- etcd
- keepalived
- kubeapi-load-balancer
- kubernetes-control-plane
machine-count: 0

Where did the "alpha" space come from? And why is kubernetes-control-plane assigned to it despite spaces=stagnum constraint? I didn't have to touch anything space-related until now...

I went back and redeployed the cluster with unconfiened kubernetes-control-plane. Apparently all applications are assigned to "alpha" space also with this setup

$ juju spaces
Name       Space ID  Subnets         
alpha      0                         
stagnum    1         192.168.3.0/24  
undefined  2         192.168.122.0/24
$ juju show-space alpha
space:
  id: "0"
  name: alpha
  subnets: []
applications:
- calico
- ceph-fs
- ceph-mon
- ceph-osd
- containerd
- easyrsa
- etcd
- keepalived
- kubeapi-load-balancer
- kubernetes-control-plane
machine-count: 0
$ juju show-space stagnum
space:
  id: "1"
  name: stagnum
  subnets:
  - cidr: 192.168.3.0/24
    provider-id: "1"
    vlan-tag: 0
applications: []
machine-count: 18

Despite that, all machines in the model are getting IPs in 192.168.3.0/24 block:

Machine  State    Address       Inst id              Base          AZ       Message                                      
0        started  192.168.3.51  stagnum1             ubuntu@22.04  default  Deployed                                     
0/lxd/0  started  192.168.3.49  juju-02d0c2-0-lxd-0  ubuntu@22.04  default  Container started                            
0/lxd/1  started  192.168.3.64  juju-02d0c2-0-lxd-1  ubuntu@22.04  default  Container started                            
0/lxd/2  started  192.168.3.27  juju-02d0c2-0-lxd-2  ubuntu@22.04  default  Container started                       
0/lxd/3  started  192.168.3.39  juju-02d0c2-0-lxd-3  ubuntu@22.04  default  Container started                    
0/lxd/4  started  192.168.3.37  juju-02d0c2-0-lxd-4  ubuntu@22.04  default  Container started

For the next test, I've destroyed the model, recreated it and ran `juju model-config default-space=stagnum` and redeployed the model with kvm placement for kubernetes control plane.

$ juju show-space stagnum
space:
  id: "1"
  name: stagnum
  subnets:
  - cidr: 192.168.3.0/24
    provider-id: "1"
    vlan-tag: 0
applications:
- calico
- ceph-fs
- ceph-mon
- ceph-osd
- containerd
- easyrsa
- etcd
- keepalived
- kubeapi-load-balancer
- kubernetes-control-plane
machine-count: 3

$ juju show-application kubernetes-control-plane
kubernetes-control-plane:
  charm: kubernetes-control-plane
  base: ubuntu@22.04
  channel: 1.28/stable
  constraints:
    arch: amd64
    cores: 60
    mem: 245760
  principal: true
  exposed: false
  remote: false
  life: alive
  endpoint-bindings:
    "": stagnum
    aws: stagnum
    aws-iam: stagnum
    azure: stagnum
    ceph-client: stagnum
    ceph-storage: stagnum
    certificates: stagnum
    cni: stagnum
    container-runtime: stagnum
    coordinator: stagnum
    dns-provider: stagnum
    etcd: stagnum
    external-cloud-provider: stagnum
    gcp: stagnum
    grafana: stagnum
    ha: stagnum
    keystone-credentials: stagnum
    kube-api-endpoint: stagnum
    kube-control: stagnum
    kube-masters: stagnum
    loadbalancer: stagnum
    loadbalancer-external: stagnum
    loadbalancer-internal: stagnum
    nrpe-external-master: stagnum
    openstack: stagnum
    prometheus: stagnum
    vault-kv: stagnum
    vsphere: stagnum

This looks better but despite of that, deployment fails:

Machine  State    Address       Inst id   Base          AZ       Message                                                     
0        started  192.168.3.25  stagnum1  ubuntu@22.04  default  Deployed                                                    
0/kvm/0  down                   pending   ubuntu@22.04           no obvious space for container "0/kvm/0", host machine has spaces: "stagnum", "undefined"                               
0/lxd/0  down                   pending   ubuntu@22.04           no obvious space for container "0/lxd/0", host machine has spaces: "stagnum", "undefined"

How can I nudge it to use the the right network?

Revision history for this message

Launchpad Janitor (janitor) wrote on 2024-03-21:

#11

[Expired for Ceph OSD Charm because there has been no activity for 60 days.]

Changed in charm-ceph-osd:
status:	Incomplete → Expired

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.