missing keyring, Couldn't init storage provider

Bug #1607961 reported by Andreas Hasenack
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Autopilot Log Analyser
Fix Released
Undecided
Ursula Junque
ceph (Juju Charms Collection)
Fix Released
High
James Page
ceph-mon (Juju Charms Collection)
Fix Released
High
James Page
ceph-radosgw (Juju Charms Collection)
Invalid
High
Unassigned

Bug Description

We got this new radosgw failure, where just restarting it doesn't fix it.

It's one of the first runs with the current stable charms from the store, it might be that we are relating the services incorrectly for these 16.07 charms, but here it goes.

Used ceph charms:
ceph-mon active false cs:trusty/ceph-mon-2
ceph-osd active false cs:trusty/ceph-osd-236
ceph-radosgw blocked false cs:trusty/ceph-radosgw-242

radosgw fails to start with this error:
2016-07-29 19:10:00.405114 7f13bba87900 -1 auth: unable to find a keyring on /etc/ceph/keyring.rados.gateway: (2) No such file or directory
2016-07-29 19:10:00.405152 7f13bba87900 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
2016-07-29 19:10:00.405154 7f13bba87900 0 librados: client.radosgw.gateway initialization error (2) No such file or directory
2016-07-29 19:10:00.405951 7f13bba87900 -1 Couldn't init storage provider (RADOS)

And indeed, there is no /etc/ceph/keyring.rados.gateway.

Logs from the deployment attached.

Tags: landscape

Related branches

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Changed in autopilot-log-analyser:
assignee: nobody → Andreas Hasenack (ahasenack)
status: New → In Progress
tags: removed: kanban-cross-team
Ursula Junque (ursinha)
Changed in autopilot-log-analyser:
assignee: Andreas Hasenack (ahasenack) → Ursula Junque (ursinha)
Revision history for this message
James Page (james-page) wrote :

Reproduced locally with latest stable charm version; digging into this underlying problem ASAP.

Changed in ceph-radosgw (Juju Charms Collection):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
James Page (james-page) wrote :
Download full text (3.7 KiB)

2016-08-01 15:17:48 INFO worker.uniter.jujuc server.go:174 running hook tool "juju-log" ["-l" "DEBUG" "Processing op='create-pool'"]
2016-08-01 15:17:48 DEBUG juju-log radosgw:3: Processing op='create-pool'
2016-08-01 15:17:49 INFO worker.uniter.jujuc server.go:174 running hook tool "juju-log" ["-l" "ERROR" "math domain error"]
2016-08-01 15:17:49 ERROR juju-log radosgw:3: math domain error
2016-08-01 15:17:50 INFO worker.uniter.jujuc server.go:174 running hook tool "juju-log" ["-l" "ERROR" "Unexpected error occurred while processing requests: {u'api-version': 1, u'request-id': u'f9eaad6a-57fa-11e6-822a-00163e572e04', u'ops': [{u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.buckets', u'weight': 20, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.root', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.control', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.gc', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.buckets.index', u'weight': 1.0, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.buckets.extra', u'weight': 1.0, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.log', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.intent-log', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.usage', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.users', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.users.email', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.users.swift', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.users.uid', u'weight': 0.1, u'replicas': 3}]}"]
2016-08-01 15:17:50 ERROR juju-log radosgw:3: Unexpected error occurred while processing requests: {u'api-version': 1, u'request-id': u'f9eaad6a-57fa-11e6-822a-00163e572e04', u'ops': [{u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.buckets', u'weight': 20, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.root', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.control', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.gc', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.buckets.index', u'weight': 1.0, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.buckets.extra', u'weight': 1.0, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.log', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.inten...

Read more...

Revision history for this message
James Page (james-page) wrote :

And from the reporters deployment:

2016-07-29 18:13:37 DEBUG juju-log radosgw:72: Processing op='create-pool'
2016-07-29 18:13:37 ERROR juju-log radosgw:72: math domain error
2016-07-29 18:13:37 ERROR juju-log radosgw:72: Unexpected error occurred while processing requests: {u'api-version': 1, u'request-id': u'0d0cbfe0-55b8-11e6-ba6a-00163ee1b019', u'ops': [{u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.buckets', u'weight': 20, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.root', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.control', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.gc', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.buckets.index', u'weight': 1.0, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.rgw.buckets.extra', u'weight': 1.0, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.log', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.intent-log', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.usage', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.users', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.users.email', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.users.swift', u'weight': 0.1, u'replicas': 3}, {u'op': u'create-pool', u'pg_num': None, u'name': u'default.users.uid', u'weight': 0.1, u'replicas': 3}]}

Revision history for this message
James Page (james-page) wrote :
Download full text (3.4 KiB)

unit-ceph-mon-0: 2016-08-01 16:03:34 INFO worker.uniter.jujuc server.go:174 running hook tool "juju-log" ["-l" "DEBUG" "Processing op='create-pool'"]
unit-ceph-mon-0: 2016-08-01 16:03:34 DEBUG unit.ceph-mon/0.juju-log server.go:270 radosgw:5: Processing op='create-pool'
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit.ceph-mon/0.radosgw-relation-changed logger.go:40 Traceback (most recent call last):
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit.ceph-mon/0.radosgw-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-mon-0/charm/hooks/radosgw-relation-changed", line 709, in <module>
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit.ceph-mon/0.radosgw-relation-changed logger.go:40 hooks.execute(sys.argv)
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit.ceph-mon/0.radosgw-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-mon-0/charm/hooks/charmhelpers/core/hookenv.py", line 715, in execute
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit.ceph-mon/0.radosgw-relation-changed logger.go:40 self._hooks[hook_name]()
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit.ceph-mon/0.radosgw-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-mon-0/charm/hooks/radosgw-relation-changed", line 564, in radosgw_relation
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit.ceph-mon/0.radosgw-relation-changed logger.go:40 rsp = process_requests(settings['broker_req'])
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit.ceph-mon/0.radosgw-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-mon-0/charm/hooks/ceph_broker.py", line 92, in decode_inner
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit.ceph-mon/0.radosgw-relation-changed logger.go:40 return json.dumps(f(json.loads(req)))
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit.ceph-mon/0.radosgw-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-mon-0/charm/hooks/ceph_broker.py", line 109, in process_requests
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit.ceph-mon/0.radosgw-relation-changed logger.go:40 resp = process_requests_v1(reqs['ops'])
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit.ceph-mon/0.radosgw-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-mon-0/charm/hooks/ceph_broker.py", line 301, in process_requests_v1
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit.ceph-mon/0.radosgw-relation-changed logger.go:40 ret = handle_replicated_pool(request=req, service=svc)
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit.ceph-mon/0.radosgw-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-mon-0/charm/hooks/ceph_broker.py", line 203, in handle_replicated_pool
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit.ceph-mon/0.radosgw-relation-changed logger.go:40 name=pool_name, **kwargs)
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit.ceph-mon/0.radosgw-relation-changed logger.go:40 File "/var/lib/juju/agents/unit-ceph-mon-0/charm/hooks/charmhelpers/contrib/storage/linux/ceph.py", line 295, in __init__
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit.ceph-mon/0.radosgw-relation-changed logger.go:40 self.pg_num = self.get_pgs(self.replicas, percent_data)
unit-ceph-mon-0: 2016-08-01 16:03:34 INFO unit...

Read more...

Revision history for this message
James Page (james-page) wrote :

Code block:

        # The CRUSH algorithm has a slight optimization for placement groups
        # with powers of 2 so find the nearest power of 2. If the nearest
        # power of 2 is more than 25% below the original value, the next
        # highest value is used. To do this, find the nearest power of 2 such
        # that 2^n <= num_pg, check to see if its within the 25% tolerance.
        exponent = math.floor(math.log(num_pg, 2))
        nearest = 2 ** exponent
        if (num_pg - nearest) > (num_pg * 0.25):
            # Choose the next highest power of 2 since the nearest is more
            # than 25% below the original value.
            return int(nearest * 2)
        else:
            return int(nearest)

Revision history for this message
James Page (james-page) wrote :

A bit more debug:

0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
20.0
{'percent_data': 0.2, 'osd_list': [0, 1, 2], 'self': <charmhelpers.contrib.storage.linux.ceph.ReplicatedPool object at 0x7fc6c6d04190>, 'osd_count': 3, 'pool_size': 3, 'target_pgs_per_osd': 100, 'num_pg': 20.0, 'expected': 0}
0.0
{'percent_data': 0.001, 'osd_list': [0, 1, 2], 'self': <charmhelpers.contrib.storage.linux.ceph.ReplicatedPool object at 0x7fc6c6d04190>, 'osd_count': 3, 'pool_size': 3, 'target_pgs_per_osd': 100, 'num_pg': 0.0, 'expected': 0}

Ursula Junque (ursinha)
Changed in autopilot-log-analyser:
status: In Progress → Fix Released
Revision history for this message
James Page (james-page) wrote :

The problem here occurs with a deployment with a low number of OSD's (say 3) - a number of pools in the ceph radosgw pool setup get 0.1% of the storage available, so the pg_num calculation based on the number of OSD's in the cluster ends up == 0. This results in the error seen (math domain error).

I think we should impose a minimum pg_num to ensure that this does not happen.

James Page (james-page)
Changed in ceph-radosgw (Juju Charms Collection):
status: Confirmed → Invalid
Changed in ceph-mon (Juju Charms Collection):
importance: Undecided → High
Changed in ceph (Juju Charms Collection):
importance: Undecided → High
Changed in ceph-mon (Juju Charms Collection):
assignee: nobody → James Page (james-page)
Changed in ceph (Juju Charms Collection):
status: New → In Progress
Changed in ceph-mon (Juju Charms Collection):
milestone: none → 16.10
Changed in ceph (Juju Charms Collection):
assignee: nobody → James Page (james-page)
Changed in ceph-mon (Juju Charms Collection):
status: New → In Progress
Changed in ceph (Juju Charms Collection):
milestone: none → 16.10
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph (master)

Reviewed: https://review.openstack.org/349916
Committed: https://git.openstack.org/cgit/openstack/charm-ceph/commit/?id=9b37f57e45d5b311fa345e1116c41349379855d8
Submitter: Jenkins
Branch: master

commit 9b37f57e45d5b311fa345e1116c41349379855d8
Author: James Page <email address hidden>
Date: Tue Aug 2 11:02:36 2016 +0100

    Resync charm helpers for PG calc fixes

    When the Ceph RADOS Gateway is used with small OSD configurations
    (3 OSDs), the light pools consume 0.1% of storage by default; this
    results in PG counts below or equal to 0; enforce a minimum PG count
    of 2 as specified in the official Ceph PG calculator.

    Change-Id: I44748807540ddde1ac76493bd85bb0f1c708e16c
    Closes-Bug: 1607961

Changed in ceph (Juju Charms Collection):
status: In Progress → Fix Committed
Changed in ceph-mon (Juju Charms Collection):
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (master)

Reviewed: https://review.openstack.org/349913
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-mon/commit/?id=9d73455515f200239926950eddec813e4b6b617d
Submitter: Jenkins
Branch: master

commit 9d73455515f200239926950eddec813e4b6b617d
Author: James Page <email address hidden>
Date: Tue Aug 2 10:59:04 2016 +0100

    Resync charm helpers for PG calc fixes

    When the Ceph RADOS Gateway is used with small OSD configurations
    (3 OSDs), the light pools consume 0.1% of storage by default; this
    results in PG counts below or equal to 0; enforce a minimum PG count
    of 2 as specified in the official Ceph PG calculator.

    Change-Id: I33df3ade8ce85f1928f44d402b8a2f445c479efb
    Closes-Bug: 1607961

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-mon (stable/16.07)

Fix proposed to branch: stable/16.07
Review: https://review.openstack.org/350077

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph (stable/16.07)

Fix proposed to branch: stable/16.07
Review: https://review.openstack.org/350078

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (stable/16.07)

Reviewed: https://review.openstack.org/350077
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-mon/commit/?id=38381f7fd6447b8dd8f539fe92f5f043caa5d60d
Submitter: Jenkins
Branch: stable/16.07

commit 38381f7fd6447b8dd8f539fe92f5f043caa5d60d
Author: James Page <email address hidden>
Date: Tue Aug 2 10:59:04 2016 +0100

    Resync charm helpers for PG calc fixes

    When the Ceph RADOS Gateway is used with small OSD configurations
    (3 OSDs), the light pools consume 0.1% of storage by default; this
    results in PG counts below or equal to 0; enforce a minimum PG count
    of 2 as specified in the official Ceph PG calculator.

    Change-Id: I33df3ade8ce85f1928f44d402b8a2f445c479efb
    Closes-Bug: 1607961
    (cherry picked from commit 9d73455515f200239926950eddec813e4b6b617d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph (stable/16.07)

Reviewed: https://review.openstack.org/350078
Committed: https://git.openstack.org/cgit/openstack/charm-ceph/commit/?id=b7a29f806a986e49f693a79e407536414abaa84f
Submitter: Jenkins
Branch: stable/16.07

commit b7a29f806a986e49f693a79e407536414abaa84f
Author: James Page <email address hidden>
Date: Tue Aug 2 11:02:36 2016 +0100

    Resync charm helpers for PG calc fixes

    When the Ceph RADOS Gateway is used with small OSD configurations
    (3 OSDs), the light pools consume 0.1% of storage by default; this
    results in PG counts below or equal to 0; enforce a minimum PG count
    of 2 as specified in the official Ceph PG calculator.

    Change-Id: I44748807540ddde1ac76493bd85bb0f1c708e16c
    Closes-Bug: 1607961
    (cherry picked from commit 9b37f57e45d5b311fa345e1116c41349379855d8)

James Page (james-page)
Changed in ceph-mon (Juju Charms Collection):
status: Fix Committed → Fix Released
Changed in ceph (Juju Charms Collection):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.