Ceph + Gnocchi fails non-containerized deployment

Bug #1751359 reported by Tim Rozet
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Invalid
Medium
Tim Rozet

Bug Description

The failure occurs because Gnocchi db sync requires the Ceph pool to exist (step 3), but the pool is not created until step 4:

step: overcloud.AllNodesDeploySteps.ControllerDeployment_Step3.0
Error: /Stage[main]/Gnocchi::Db::Sync/Exec[gnocchi-db-sync]: Failed to call refresh: gnocchi-upgrade --config-file /etc/gnocchi/gnocchi.conf --sacks$
number 128 returned 1 instead of one of [0]"

# pool is created in step 4:
https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/base/ceph/mon.pp

[root@overcloud-controller-0 ~]# gnocchi-upgrade --config-file /etc/gnocchi/gnocchi.conf --sacks-number 128
No handlers could be found for logger "oslo_config.cfg"
2018-02-23 19:40:06,291 [103816] INFO gnocchi.service: Gnocchi version 4.0.5
2018-02-23 19:40:06,587 [103816] INFO gnocchi.cli: Upgrading indexer <gnocchi.indexer.sqlalchemy.SQLAlchemyIndexer object at 0x402ff50>
2018-02-23 19:40:06,653 [103816] INFO gnocchi.storage.common.ceph: Ceph storage backend use 'cradox' python library
2018-02-23 19:40:06,675 [103816] CRITICAL root: Traceback (most recent call last):
  File "/bin/gnocchi-upgrade", line 10, in <module>
    sys.exit(upgrade())
  File "/usr/lib/python2.7/site-packages/gnocchi/cli.py", line 66, in upgrade
    s = storage.get_driver(conf)
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 163, in get_driver
    conf.storage, incoming, coord)
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/ceph.py", line 48, in __init__
    self.rados, self.ioctx = ceph.create_rados_connection(conf)
  File "/usr/lib/python2.7/site-packages/gnocchi/storage/common/ceph.py", line 72, in create_rados_connection
    ioctx = conn.open_ioctx(conf.ceph_pool)
  File "cradox.pyx", line 444, in cradox.requires.wrapper.validate_func (cradox.c:4719)
  File "cradox.pyx", line 1091, in cradox.Rados.open_ioctx (cradox.c:13860)
ObjectNotFound: error opening pool 'metrics'

After creating the pool it works:
[root@overcloud-controller-0 ~]# ceph osd pool ls
rbd
[root@overcloud-controller-0 ~]# ceph osd pool create metrics 64 64
pool 'metrics' created
[root@overcloud-controller-0 ~]# gnocchi-upgrade --config-file /etc/gnocchi/gnocchi.conf --sacks-number 128
No handlers could be found for logger "oslo_config.cfg"
2018-02-23 19:45:38,818 [106814] INFO gnocchi.service: Gnocchi version 4.0.5
2018-02-23 19:45:39,105 [106814] INFO gnocchi.cli: Upgrading indexer <gnocchi.indexer.sqlalchemy.SQLAlchemyIndexer object at 0x58ecf50>
2018-02-23 19:45:39,171 [106814] INFO gnocchi.storage.common.ceph: Ceph storage backend use 'cradox' python library
2018-02-23 19:45:39,191 [106814] INFO gnocchi.cli: Upgrading storage <gnocchi.storage.ceph.CephStorage object at 0x72ca790>
[root@overcloud-controller-0 ~]#

Tim Rozet (trozet)
Changed in tripleo:
assignee: nobody → Tim Rozet (trozet)
importance: Undecided → Medium
milestone: none → queens-rc1
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (master)

Fix proposed to branch: master
Review: https://review.openstack.org/547671

Revision history for this message
John Fulton (jfulton-org) wrote :

Root cause is that starting in Pike this command:

 gnocchi-upgrade --config-file /etc/gnocchi/gnocchi.conf --sacks-number 128

Now requires that the pool already be created.

You won't have this issue in Pike if you deploy w/ containerized ceph as described in the documentation:

 https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html

If you follow the Pike sections of the above, then the metrics pool will be created during step2 and you won't have this issue.

So the fix is in code which is deprecated. Giulio added a few comments on to the review:

 https://review.openstack.org/#/c/547671/

I think the easiest thing to for this bug is to switch to the new method of Ceph deployment in TripleO.

Also, this issue shouldn't be confused with this one:

 https://bugs.launchpad.net/tripleo/+bug/1749544

It has a similar error message but the pools were not created in step2 because luminous requires the PGs to be balanced or the pools won't be created:

 https://ceph.com/community/new-luminous-pg-overdose-protection

Changed in tripleo:
milestone: queens-rc1 → rocky-1
Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Changed in tripleo:
milestone: stein-3 → train-1
Changed in tripleo:
status: In Progress → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (master)

Change abandoned by "James Slagle <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/puppet-tripleo/+/547671
Reason: Abandoning this patch per the TripleO Patch Abandonment guidelines
(https://specs.openstack.org/openstack/tripleo-specs/specs/policy/patch-abandonment.html).
If you wish to have this restored and cannot do so yourself, please reach out
via #tripleo on OFTC or the OpenStack Dev mailing list.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.