cinder-volume fails to start with enabled_backends = cinder-ceph

Bug #1719742 reported by Jason Hobbs
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Fix Released
High
Frode Nordahl
Ceph OSD Charm
Fix Released
High
Frode Nordahl
Ceph RADOS Gateway Charm
Fix Released
High
Frode Nordahl
OpenStack Ceph Charm (Retired)
Fix Released
High
Frode Nordahl
OpenStack Cinder Charm
Fix Released
High
Liam Young

Bug Description

maas version: 2.3.0~alpha3-6250-g58f83f3-0ubuntu1~16.04.1
juju version: 2.2.4
openstack: cs:bundle/openstack-base on september 22nd.

My cinder unit is ending up blocked with message "Services not running that should be: cinder-volume".

In cinder-volume.log, this error repeats:

2017-09-22 05:24:29.242 49385 ERROR cinder.cmd.volume [-] Configuration for cinder-volume does not specify "enabled_backends". Using DEFAULT section to configure drivers is not supported since Ocata.
2017-09-22 05:24:29.242 49385 ERROR cinder.cmd.volume [-] No volume service(s) started successfully, terminating.

cinder.conf has "enabled_backends = cinder-ceph" in the DEFAULT section.

I've attached a crashdump of the deployment.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
James Page (james-page)
affects: cinder (Juju Charms Collection) → charm-cinder
Liam Young (gnuoy)
Changed in charm-cinder:
assignee: nobody → Liam Young (gnuoy)
status: New → In Progress
Revision history for this message
Liam Young (gnuoy) wrote :

I haven't been able to reproduce this so far. How often are you seeing the bug ?

The only thing that look suspicious too me so far is that both cinder and cinder-ceph manage the cinder-volume service. Normally charms avoid this by sending a restart trigger down the relation so one charm can, in effect, request a service restart on another charm.

I think the error message from the cinder-volume.log is probably a red herring. This error is normal while the services are coming up. Once cinder-ceph has sent its relation data to cinder the config file is re-rendered with the correct backend set. The crash dump Jason provided shows the correctly rendered config.

It looks like cinder-volume is not being restarted after the cinder.conf is updated with the data from cinder-ceph but I can't see any code path that would lead to that behaviour.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Liam, we've seen it twice between the 22nd and now, out of about 70 test runs. We saw it again last night.

Here's the crashdump from last night:

https://10.245.162.101/artifacts/b0ae1ab7-dec5-4773-abde-18ff0b098e76/deploy_prepared_bundle_679/juju-crashdump-084f241d-c929-40df-9e25-e5e077cc78e0.tar.xz

Liam Young (gnuoy)
Changed in charm-cinder:
importance: Undecided → High
Revision history for this message
Liam Young (gnuoy) wrote :

I have reproduced this. I'm also seeing it happen about 1 in 35 times.

This maybe complete coincidence but it seems to be happening at the same time of day, around 5 in the morning.

Revision history for this message
Liam Young (gnuoy) wrote :

My current theory, and appreciate it seems a little crazy, is that the correct cinder.conf is being rendered just as systemd does its final restart retry on the service and the charms restart request which was triggered by the config render is being lost.

The cinder-volume service fails to start in ocata+ if enable-backends is not set. This causes the service to flap as systemd restarts the failed service before eventually giving up. I think that setting enable-backends to the package default of 'lvm' should stop the service flapping and thus fix the bug. I'm going to give the following patch a try:

diff --git a/hooks/cinder_contexts.py b/hooks/cinder_contexts.py
index 5893a37..d0e6080 100644
--- a/hooks/cinder_contexts.py
+++ b/hooks/cinder_contexts.py
@@ -137,12 +137,12 @@ class StorageBackendContext(OSContextGenerator):
                 backends.append('CEPH')
             if enable_lvm():
                 backends.append('LVM')
- if len(backends) > 0:
- return {
- 'active_backends': backends,
- 'backends': ",".join(backends)}
- else:
- return {}
+ # Use the package default backend to stop the service flapping.
+ if not backends:
+ backends = ['LVM']
+ return {
+ 'active_backends': backends,
+ 'backends': ",".join(backends)}

 class LoggingConfigContext(OSContextGenerator):

A charm with this patch is available at: cs:~gnuoy/cinder-0

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/512204

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-cinder (master)

Reviewed: https://review.openstack.org/512204
Committed: https://git.openstack.org/cgit/openstack/charm-cinder/commit/?id=f9654dce7d2b36e702f705da5e93899e78675fdd
Submitter: Zuul
Branch: master

commit f9654dce7d2b36e702f705da5e93899e78675fdd
Author: Liam Young <email address hidden>
Date: Mon Oct 16 07:56:52 2017 +0000

    Add LVM as default backend

    This change set the enable-backends cinder config option to the
    package default of 'lvm'. The reason is to stop the service flapping
    and fix the bug which is causing cinder-volume service to be down
    in some deployments.

    Change-Id: I29b9bccf75019b73f60c1160b8b6c7cb11d3ad28
    Closes-Bug: 1719742

Changed in charm-cinder:
status: In Progress → Fix Committed
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Will this be backported to stable?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to charm-ceph-osd (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/514263

Frode Nordahl (fnordahl)
Changed in charm-ceph-radosgw:
status: New → In Progress
Changed in charm-ceph-osd:
status: New → In Progress
Changed in charm-ceph-mon:
status: New → In Progress
Changed in charm-ceph:
status: New → In Progress
Changed in charm-ceph-radosgw:
assignee: nobody → Frode Nordahl (fnordahl)
Changed in charm-ceph-osd:
assignee: nobody → Frode Nordahl (fnordahl)
Changed in charm-ceph-mon:
assignee: nobody → Frode Nordahl (fnordahl)
Changed in charm-ceph:
assignee: nobody → Frode Nordahl (fnordahl)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to charm-ceph-mon (master)

Reviewed: https://review.openstack.org/514467
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-mon/commit/?id=72effeb362740c812f4a595f28064427a4d88288
Submitter: Zuul
Branch: master

commit 72effeb362740c812f4a595f28064427a4d88288
Author: Frode Nordahl <email address hidden>
Date: Mon Oct 23 23:44:56 2017 +0200

    Update functional test model to use cinder-ceph subordinate

    Change-Id: I8d7ffe6b06c08e56a6dc9a3a9bc20db01506c8f2
    Related-Bug: #1719742

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to charm-ceph-osd (master)

Reviewed: https://review.openstack.org/514263
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-osd/commit/?id=8ba8bce657cb2392ff2fc048e2dd2d935a269167
Submitter: Zuul
Branch: master

commit 8ba8bce657cb2392ff2fc048e2dd2d935a269167
Author: Frode Nordahl <email address hidden>
Date: Mon Oct 23 13:36:56 2017 +0200

    Update functional test model to use cinder-ceph subordinate

    Change-Id: I2d441da31e8e3b6570bf237661bf22c294d8ee73
    Related-Bug: #1719742

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to charm-ceph-radosgw (master)

Reviewed: https://review.openstack.org/514469
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-radosgw/commit/?id=ae9fa429476007737594e7e970b40866b19d2c54
Submitter: Zuul
Branch: master

commit ae9fa429476007737594e7e970b40866b19d2c54
Author: Frode Nordahl <email address hidden>
Date: Mon Oct 23 23:50:44 2017 +0200

    Update functional test model to use cinder-ceph subordinate

    Change-Id: I82054066df7440a9396b1e193d1f1059e567a769
    Related-Bug: #1719742

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to charm-ceph (master)

Reviewed: https://review.openstack.org/514466
Committed: https://git.openstack.org/cgit/openstack/charm-ceph/commit/?id=fc14a5f51dbc7f791e066113168aef80713bb49b
Submitter: Zuul
Branch: master

commit fc14a5f51dbc7f791e066113168aef80713bb49b
Author: Frode Nordahl <email address hidden>
Date: Mon Oct 23 23:38:31 2017 +0200

    Update functional test model to use cinder-ceph subordinate

    Change-Id: I521fd8d8da4daec509a7d370a5b21611cdf332cd
    Related-Bug: #1719742

Frode Nordahl (fnordahl)
Changed in charm-ceph:
status: In Progress → Fix Committed
Changed in charm-ceph-mon:
status: In Progress → Fix Committed
Changed in charm-ceph-osd:
status: In Progress → Fix Committed
Changed in charm-ceph-radosgw:
status: In Progress → Fix Committed
James Page (james-page)
Changed in charm-ceph-radosgw:
importance: Undecided → High
Changed in charm-ceph-osd:
importance: Undecided → High
Changed in charm-ceph-mon:
importance: Undecided → High
Changed in charm-ceph:
importance: Undecided → High
Changed in charm-ceph:
milestone: none → 17.11
Changed in charm-ceph-mon:
milestone: none → 17.11
Changed in charm-ceph-osd:
milestone: none → 17.11
Changed in charm-ceph-radosgw:
milestone: none → 17.11
Changed in charm-cinder:
milestone: none → 17.11
James Page (james-page)
Changed in charm-cinder:
status: Fix Committed → Fix Released
Changed in charm-ceph-osd:
status: Fix Committed → Fix Released
Changed in charm-ceph-mon:
status: Fix Committed → Fix Released
Changed in charm-ceph:
status: Fix Committed → Fix Released
Changed in charm-ceph-radosgw:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.