Relation between Nova-compute ceph-mon

Bug #1811867 reported by Eric Kessels on 2019-01-15
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Charms Deployment Guide
High
Peter Matulis
OpenStack nova-compute charm
Undecided
Unassigned

Bug Description

Hi,

I have an OpenStack cluster running version queens, deployed with Maas and juju (2.4.7)

Nova-compute charm = 291 series = bionic + xenial
Ceph mon = 32 series = xenial
Cinder ceph = 238

My compute host run 16.04, all fine and happy.

I want to add a nova compute node running 18.04, for nova everything is fine added to the openstack cluster, cinder-ceph is running ok.

The relation between ceph-mon and nova-compute is not completing. The error I get =

storage-backend relation's interface, ceph, is related awaiting the following data from the relationship: auth, key.

The secrets (ceph key) are not placed on the nova-compute host, secrets are not added to virsh.

1. What could be the problem ?
2. Is this configuration possible (mix 18.04 and 16.04 nova compute hosts)

Juju log on the ceph-mon:

2019-01-15 16:08:40 INFO juju.worker.uniter.operation runhook.go:135 ran "client-relation-changed" hook
2019-01-15 16:08:40 DEBUG juju.worker.uniter.operation executor.go:90 committing operation "run relation-changed (172; nova-compute-bionic/1) hook"
2019-01-15 16:08:40 DEBUG juju.machinelock machinelock.go:180 machine lock released for uniter (run relation-changed (172; nova-compute-bionic/1) hook)
2019-01-15 16:08:40 DEBUG juju.worker.uniter.operation executor.go:79 lock released
2019-01-15 16:08:40 DEBUG juju.worker.uniter resolver.go:123 no operations in progress; waiting for changes
2019-01-15 16:08:40 DEBUG juju.worker.uniter agent.go:17 [AGENT-STATUS] idle:
2019-01-15 16:11:36 DEBUG juju.worker.uniter.remotestate watcher.go:510 update status timer triggered
2019-01-15 16:11:36 DEBUG juju.worker.uniter resolver.go:123 no operations in progress; waiting for changes
2019-01-15 16:11:36 DEBUG juju.worker.uniter.operation executor.go:59 running operation run update-status hook
2019-01-15 16:11:36 DEBUG juju.machinelock machinelock.go:156 acquire machine lock for uniter (run update-status hook)
2019-01-15 16:11:36 DEBUG juju.machinelock machinelock.go:166 machine lock acquired for uniter (run update-status hook)
2019-01-15 16:11:36 DEBUG juju.worker.uniter.operation executor.go:90 preparing operation "run update-status hook"
2019-01-15 16:11:36 DEBUG juju.worker.uniter.operation executor.go:90 executing operation "run update-status hook"
2019-01-15 16:11:37 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
2019-01-15 16:11:37 DEBUG juju-log Hardening function 'update_status'
2019-01-15 16:11:37 DEBUG worker.uniter.jujuc server.go:181 running hook tool "config-get"
2019-01-15 16:11:37 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
2019-01-15 16:11:37 DEBUG juju-log No hardening applied to 'update_status'
2019-01-15 16:11:37 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
2019-01-15 16:11:37 INFO juju-log Updating status.
2019-01-15 16:11:38 DEBUG worker.uniter.jujuc server.go:181 running hook tool "application-version-set"
2019-01-15 16:11:38 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-ids"
2019-01-15 16:11:38 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:11:38 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-list"
2019-01-15 16:11:38 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:11:38 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:11:38 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:11:38 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-ids"
2019-01-15 16:11:38 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:11:38 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-list"
2019-01-15 16:11:38 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:11:38 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"

Juju log on the nova-compute:

2019-01-15 16:09:20 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
2019-01-15 16:09:20 DEBUG juju-log ceph:172: adding section 'DEFAULT'
2019-01-15 16:09:20 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
2019-01-15 16:09:20 DEBUG juju-log ceph:172: 1 section(s) found
2019-01-15 16:09:20 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:09:20 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:09:20 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:09:20 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-ids"
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "unit-get"
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-ids"
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "network-get"
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "relation-get"
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
2019-01-15 16:09:21 DEBUG juju-log ceph:172: Generating template context for cloud-credentials
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
2019-01-15 16:09:21 DEBUG juju-log ceph:172: Generating template context for ceph
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
2019-01-15 16:09:21 INFO juju-log ceph:172: Missing required data: auth key
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
2019-01-15 16:09:21 DEBUG juju-log ceph:172: Generating template context for ceph
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
2019-01-15 16:09:21 INFO juju-log ceph:172: Missing required data: auth key
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
2019-01-15 16:09:21 INFO juju-log ceph:172: ceph relation incomplete. Peer not ready?
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
2019-01-15 16:09:21 DEBUG juju-log ceph:172: Generating template context for ceph
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
2019-01-15 16:09:21 INFO juju-log ceph:172: Missing required data: auth key
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
2019-01-15 16:09:21 DEBUG juju-log ceph:172: Generating template context for ceph
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
2019-01-15 16:09:21 INFO juju-log ceph:172: Missing required data: auth key
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
2019-01-15 16:09:21 INFO juju-log ceph:172: storage-backend relation's interface, ceph, is related awaiting the following data from the relationship: auth, key.
2019-01-15 16:09:21 DEBUG worker.uniter.jujuc server.go:181 running hook tool "status-set"
2019-01-15 16:09:22 DEBUG worker.uniter.jujuc server.go:181 running hook tool "application-version-set"
2019-01-15 16:09:23 INFO juju.worker.uniter.operation runhook.go:135 ran "ceph-relation-changed" hook
2019-01-15 16:09:23 DEBUG juju.worker.uniter.operation executor.go:90 committing operation "run relation-changed (172; ceph-mon/1) hook"
2019-01-15 16:09:23 DEBUG juju.machinelock machinelock.go:180 machine lock released for uniter (run relation-changed (172; ceph-mon/1) hook)
2019-01-15 16:09:23 DEBUG juju.worker.uniter.operation executor.go:79 lock released
2019-01-15 16:09:23 DEBUG juju.worker.uniter resolver.go:123 no operations in progress; waiting for changes
2019-01-15 16:09:23 DEBUG juju.worker.uniter agent.go:17 [AGENT-STATUS] idle:
2019-01-15 16:09:23 DEBUG juju.worker.uniter resolver.go:123 no operations in progress; waiting for changes

Please advise.

Eric

Drew Freiberger (afreiberger) wrote :

This is ultimately caused due to having a new ceph-mon charm that supports expected-osd-count, and your ceph-osd charm either not reporting, or not correctly reporting, the number of OSDS bootstrapped. This prevents the ceph-mon charm from being ready to hand out auth tokens.

Most like, you just need to upgrade your ceph-osd charm(s) in your environment to 18.11 so that ceph-mon un-sticks itself. Or, if you have 18.11 charms, count your OSDs and update your expected-osd-count to that number.

You can iterate on the ceph-osd relations to your mons to relation-get -r <rid> bootstrapped-osds ceph-osd/X. this should match the juju status "(XX OSDs)" output for that unit. Add them all together and they must add up to expected-osd-count or higher for ceph-mon to properly bootstrap and hand out credentials.

tags: added: canonical-bootstack
Ryan Beisner (1chb1n) wrote :

The ceph-mon and ceph-osd charms should be updated to latest stable charm revs before taking on a migration or payload upgrade task. We should make the docs explicit about that.

FWIW, that is the case for all migrations, payload upgrades, series upgrades: we expect all charms to be on the latest stable rev.

Do we need to add further clarity to the procedure, regarding osd count?

https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-ceph-migration.html

Ryan Beisner (1chb1n) wrote :

Regarding: 2. Is this configuration possible (mix 18.04 and 16.04 nova compute hosts)?

It is not advisable or recommended. That condition will naturally exist while a cloud is being upgraded across series from Xenial to Bionic, but it shouldn't be an intended or extended state.

On the whole, a mixed-series OpenStack Charms deployment is not advisable. We should also update the docs to be explicit about this.

https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-series-upgrade.html

Changed in charm-deployment-guide:
importance: Undecided → High
Changed in charm-deployment-guide:
status: New → Triaged
Changed in charm-nova-compute:
status: New → Triaged
Changed in charm-deployment-guide:
assignee: nobody → Peter Matulis (petermatulis)
status: Triaged → In Progress

Reviewed: https://review.opendev.org/693424
Committed: https://git.openstack.org/cgit/openstack/charm-deployment-guide/commit/?id=c74ce0fbdf81996331d7bddd259e15492ce7ee1d
Submitter: Zuul
Branch: master

commit c74ce0fbdf81996331d7bddd259e15492ce7ee1d
Author: Peter Matulis <email address hidden>
Date: Tue Nov 5 16:31:19 2019 -0500

    No mixing of charm releases nor series versions

    I realise now that 'Charm upgrades' should be
    on its own page as they can be done independently
    of an OpenStack upgrade. A future PR may address
    this.

    Closes-Bug: #1811867

    Change-Id: I6c9520ba2b81a358a66bdb0be92410c90dfaccf2

Changed in charm-deployment-guide:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers