Services not running that should be: gnocchi-metricd, apache2

Bug #1794878 reported by Ashley Lai
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Fix Released
High
James Page
Ceph OSD Charm
Fix Released
High
James Page
Gnocchi Charm
Invalid
High
Unassigned

Bug Description

xenial-queens

      gnocchi/0:
        workload-status:
          current: blocked
          message: 'Services not running that should be: gnocchi-metricd, apache2;
            Ports which should be open, but are not: 8041'

Error from gnocchi/0 log:

2018-09-27 12:32:27 DEBUG storage-ceph-relation-changed 2018-09-27 12:32:27,625 [132440] CRITICAL root: Traceback (most recent call last):
2018-09-27 12:32:27 DEBUG storage-ceph-relation-changed File "/usr/bin/gnocchi-upgrade", line 10, in <module>
2018-09-27 12:32:27 DEBUG storage-ceph-relation-changed sys.exit(upgrade())
2018-09-27 12:32:27 DEBUG storage-ceph-relation-changed File "/usr/lib/python3/dist-packages/gnocchi/cli/manage.py", line 73, in upgrade
2018-09-27 12:32:27 DEBUG storage-ceph-relation-changed i.upgrade(conf.sacks_number)
2018-09-27 12:32:27 DEBUG storage-ceph-relation-changed File "/usr/lib/python3/dist-packages/gnocchi/incoming/__init__.py", line 68, in upgrade
2018-09-27 12:32:27 DEBUG storage-ceph-relation-changed self.set_storage_settings(num_sacks)
2018-09-27 12:32:27 DEBUG storage-ceph-relation-changed File "/usr/lib/python3/dist-packages/gnocchi/incoming/ceph.py", line 71, in set_storage_settings
2018-09-27 12:32:27 DEBUG storage-ceph-relation-changed json.dumps({self.CFG_SACKS: num_sacks}).encode())
2018-09-27 12:32:27 DEBUG storage-ceph-relation-changed File "rados.pyx", line 498, in rados.requires.wrapper.validate_func (/build/ceph-B2ToPL/ceph-12.2.4/obj-x86_64-linux-gnu/src/pybind/rados3/pyrex/rados.c:4922)
2018-09-27 12:32:27 DEBUG storage-ceph-relation-changed File "rados.pyx", line 2626, in rados.Ioctx.write_full (/build/ceph-B2ToPL/ceph-12.2.4/obj-x86_64-linux-gnu/src/pybind/rados3/pyrex/rados.c:34023)
2018-09-27 12:32:27 DEBUG storage-ceph-relation-changed rados.TimedOut: [errno 110] Ioctx.write_full(b'gnocchi'): failed to write b'gnocchi-config'
2018-09-27 12:32:27 DEBUG storage-ceph-relation-changed
2018-09-27 12:32:27 ERROR juju-log storage-ceph:91: Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-gnocchi-0/.venv/lib/python3.5/site-packages/charms/reactive/__init__.py", line 73, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-gnocchi-0/.venv/lib/python3.5/site-packages/charms/reactive/bus.py", line 382, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-gnocchi-0/.venv/lib/python3.5/site-packages/charms/reactive/bus.py", line 358, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-gnocchi-0/.venv/lib/python3.5/site-packages/charms/reactive/bus.py", line 180, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-gnocchi-0/charm/reactive/gnocchi_handlers.py", line 67, in init_db
    charm_class.db_sync()
  File "/var/lib/juju/agents/unit-gnocchi-0/.venv/lib/python3.5/site-packages/charms_openstack/charm/core.py", line 810, in db_sync
    subprocess.check_call(self.sync_cmd)
  File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['gnocchi-upgrade', '--log-file=/var/log/gnocchi/gnocchi-upgrade.log']' returned non-zero exit status 1

Revision history for this message
Ashley Lai (alai) wrote :
Revision history for this message
Ashley Lai (alai) wrote :
Revision history for this message
Ashley Lai (alai) wrote :
Revision history for this message
James Page (james-page) wrote :

Looking at the juju status output, none of the ceph-osd units have actually started any OSD's - they are all blocked with

'Incomplete relation: vault'

which would indicate that vault is not initialised and unsealed, and that the ceph-osd units have not retrieved credentials to access vault for encryption keys.

gnocchi-upgrade won't work in this situation as the ceph cluster has zero storage.

Revision history for this message
Ashley Lai (alai) wrote :

Thanks for looking James. We will setup vault.

Changed in charm-gnocchi:
status: New → Incomplete
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

All we're doing is deploying the bundle and gnocchi units are going into an error state. If gncocchi units need to wait on vault being unlocked, that is something that needs to handle in the charm - it could go into blocked state but it shouldn't go into error state.

Changed in charm-gnocchi:
status: Incomplete → New
Revision history for this message
James Page (james-page) wrote :

This issue occurs due to the delayed OSD bootstrap on the ceph-osd units; the ceph-mon charm makes some naive assumptions about when OSD's might be up based on units appearing on a relation, but it has no idea whether the units have any running OSD's.

We need to evolve the ceph-mon/ceph-osd relation to include presentation of # of bootstrapped OSD's from the ceph-osd -> ceph-mon, so the ceph-mon units can assess when its 'safe' to give out keys and create pools within the cluster.

Changed in charm-gnocchi:
status: New → Triaged
importance: Undecided → High
Changed in charm-ceph-mon:
status: New → Triaged
Changed in charm-ceph-osd:
status: New → Triaged
Changed in charm-ceph-mon:
importance: Undecided → High
Changed in charm-ceph-osd:
importance: Undecided → High
Changed in charm-ceph-mon:
assignee: nobody → James Page (james-page)
Changed in charm-ceph-osd:
assignee: nobody → James Page (james-page)
Revision history for this message
James Page (james-page) wrote :

I've pushed a prototype of the fix I'd like to make to this to:

  cs:~james-page/ceph-osd and cs:~james-page/ceph-mon

Any chance you can re-run with your failed deployment with this prototype?

Changed in charm-ceph-mon:
status: Triaged → In Progress
Changed in charm-ceph-osd:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-osd (master)

Fix proposed to branch: master
Review: https://review.openstack.org/610904

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-mon (master)

Fix proposed to branch: master
Review: https://review.openstack.org/610910

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-osd (master)

Reviewed: https://review.openstack.org/610904
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-osd/commit/?id=63f9ac2c7cd0db8f212bba6d278af2b0316b7760
Submitter: Zuul
Branch: master

commit 63f9ac2c7cd0db8f212bba6d278af2b0316b7760
Author: James Page <email address hidden>
Date: Tue Oct 16 09:33:05 2018 +0100

    Notify MON cluster of number of bootstrapped OSD's

    To allow the ceph-mon charm to better assess when the Ceph cluster
    is in a usable state, provide the number of OSD devices that where
    bootstrapped into the Ceph cluster over the relation to ceph-mon.

    This is used by the ceph-mon charm inconjunction with the
    'expected-osd-count' configuration option to delay pool creation
    and issue of keys for clients until the expected number of OSD's
    have been bootstrapped into the cluster.

    Change-Id: I1370524f0f31120e3cb7305c5bc509a6494c5586
    Closes-Bug: 1794878

Changed in charm-ceph-osd:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (master)

Reviewed: https://review.openstack.org/610910
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-mon/commit/?id=ed99bd2b5708d0494a9139e6805a03940b41eefb
Submitter: Zuul
Branch: master

commit ed99bd2b5708d0494a9139e6805a03940b41eefb
Author: James Page <email address hidden>
Date: Tue Oct 16 09:44:48 2018 +0100

    Guard cluster operations until sufficient OSD's booted

    Ensure that broker requests are not processed and that client
    access keys are not issued until the expected number of OSD's
    have been bootstrapped into the cluster.

    This depends on presentation of the number of bootstrapped
    OSD's from the ceph-osd charm (see Depends-On).

    For upgraders, keys will have already been issued so there
    should be no impact on existing access to the Ceph cluster;
    The ceph-osd units will present the required relation data
    post upgrade at which point the charm will mark the cluster
    as ready for service and continue to process and pending
    requests.

    Change-Id: Id67e13c176fc8fd4953ba7c2cf7e33252810940c
    Depends-On: I1370524f0f31120e3cb7305c5bc509a6494c5586
    Closes-Bug: 1794878

Changed in charm-ceph-mon:
status: In Progress → Fix Committed
Revision history for this message
Ashley Lai (alai) wrote :

@James - The prototype worked. Thanks for the fix.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-osd (stable/18.08)

Fix proposed to branch: stable/18.08
Review: https://review.openstack.org/611619

Revision history for this message
Ashley Lai (alai) wrote :

We still get the failure pointing to cs:ceph-mon and cs:ceph-osd. When will this fix be merged?

Revision history for this message
Ravi (raviponnaiah) wrote :

This bug effecting me as well.
hook failed: "storage-ceph-relation-changed"
Services not running that should be: gnocchi-metricd, apache2; Ports which should be open, but are not: 8041

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-mon (stable/18.08)

Fix proposed to branch: stable/18.08
Review: https://review.openstack.org/613345

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-osd (stable/18.08)

Reviewed: https://review.openstack.org/611619
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-osd/commit/?id=331a59db336a06e1cc0ad4047204acb2f781a261
Submitter: Zuul
Branch: stable/18.08

commit 331a59db336a06e1cc0ad4047204acb2f781a261
Author: James Page <email address hidden>
Date: Tue Oct 16 09:33:05 2018 +0100

    Notify MON cluster of number of bootstrapped OSD's

    To allow the ceph-mon charm to better assess when the Ceph cluster
    is in a usable state, provide the number of OSD devices that where
    bootstrapped into the Ceph cluster over the relation to ceph-mon.

    This is used by the ceph-mon charm inconjunction with the
    'expected-osd-count' configuration option to delay pool creation
    and issue of keys for clients until the expected number of OSD's
    have been bootstrapped into the cluster.

    Change-Id: I1370524f0f31120e3cb7305c5bc509a6494c5586
    Closes-Bug: 1794878
    (cherry picked from commit 63f9ac2c7cd0db8f212bba6d278af2b0316b7760)

James Page (james-page)
Changed in charm-gnocchi:
status: Triaged → Invalid
Changed in charm-ceph-osd:
status: Fix Committed → Fix Released
milestone: none → 18.11
Changed in charm-ceph-mon:
milestone: none → 18.11
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (stable/18.08)

Reviewed: https://review.openstack.org/613345
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-mon/commit/?id=bae179d5815c884725e92824220e63480718c0b7
Submitter: Zuul
Branch: stable/18.08

commit bae179d5815c884725e92824220e63480718c0b7
Author: James Page <email address hidden>
Date: Tue Oct 16 09:44:48 2018 +0100

    Guard cluster operations until sufficient OSD's booted

    Ensure that broker requests are not processed and that client
    access keys are not issued until the expected number of OSD's
    have been bootstrapped into the cluster.

    This depends on presentation of the number of bootstrapped
    OSD's from the ceph-osd charm (see Depends-On).

    For upgraders, keys will have already been issued so there
    should be no impact on existing access to the Ceph cluster;
    The ceph-osd units will present the required relation data
    post upgrade at which point the charm will mark the cluster
    as ready for service and continue to process and pending
    requests.

    Change-Id: Id67e13c176fc8fd4953ba7c2cf7e33252810940c
    Closes-Bug: 1794878
    (cherry picked from commit ed99bd2b5708d0494a9139e6805a03940b41eefb)

David Ames (thedac)
Changed in charm-ceph-mon:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.