ceph-mon (re)tries to process broker requests after they have been completed on each relation hook

Bug #1773910 reported by Chris MacNaughton on 2018-05-29
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack ceph-mon charm
Low
Pedro Guimarães

Bug Description

When the ceph-mon charm processes requests, there is no tracking of completed requests, so broker requests are retried on eaach hook execution. Completed requests should be tracked in Juju's KV store or Ceph's KV store (preferred, as cluster leader will process requests but leadership can move around)

Alvaro Uria (aluria) on 2018-06-05
tags: added: canonical-bootstack
Alvaro Uria (aluria) wrote :

Is there a workaround for this, apart from skipping mon_relation() exec?

@hooks.hook('mon-relation-departed',
            'mon-relation-changed',
            'leader-settings-changed',
            'bootstrap-source-relation-departed')
def mon_relation():
    # XXX(aluria): lp#1773910
    log('Bug lp#1773910 - skip mon_relation() exec')
    return

Alvaro Uria (aluria) wrote :

Sample:
2018-06-05 10:53:21 DEBUG juju-log Processing request 18ecae9a-68a0-11e8-83b7-00163e1f050e
2018-06-05 10:53:27 DEBUG juju-log Processing request 56290312-68a0-11e8-9d9d-7cfe90aac490
2018-06-05 10:53:31 DEBUG juju-log Processing request 56c638da-68a0-11e8-ab14-7cfe90a82500

root@juju-958f87-1-lxd-9:~# awk '/Processing request/ {print $7}' /var/log/juju/unit-ceph-mon-3.log | tail -100 | sort | uniq -c | sort -n
      6 4bdd1fce-68a0-11e8-8c25-7cfe90aedfd0
      6 5113e6c6-68a0-11e8-9528-7cfe90aedc50
      6 53087b36-68a0-11e8-9a86-7cfe90aedf30
      7 01100c44-68a1-11e8-9488-90e2bafb4d2c
      7 4e1ae12c-68a0-11e8-a678-7cfe90aedc10
      7 581b1e12-68a0-11e8-a81d-7cfe90bb0990
      7 e6ffd3ee-689f-11e8-8fe2-00163e206594
      7 e865437c-689f-11e8-bebf-00163e5e6455
      7 eb92c45c-689f-11e8-a92f-00163e449e77
      8 17afef42-68a0-11e8-81a5-00163e0956f5
      8 18ecae9a-68a0-11e8-83b7-00163e1f050e
      8 1a94f270-68a0-11e8-8281-00163e12c00d
      8 56290312-68a0-11e8-9d9d-7cfe90aac490
      8 56c638da-68a0-11e8-ab14-7cfe90a82500

Alvaro Uria (aluria) wrote :

The above happens on cs:xenial/ceph-mon-24

Alvaro Uria (aluria) wrote :

Request is "create-pool", but it is skipped because pools already exist (this upgrade to ceph-mon was part of an operation to deprecate ceph applications, which have already been removed).
https://pastebin.canonical.com/p/kWpRMVHcJ4/

A change is in progress at https://review.openstack.org/#/c/574734/to resolve this by tracking which broker requests have been processed. Once that is done, we can re-add broker processing to more hooks as they won't be re-processed.

Changed in charm-ceph-mon:
assignee: nobody → Pedro Guimarães (pguimaraes)

Change abandoned by Chris MacNaughton (<email address hidden>) on branch: master
Review: https://review.openstack.org/574738
Reason: superseded

Drew Freiberger (afreiberger) wrote :

Alvaro,

What you're describing is how the charm works. My question is, what is the "bug". What's the failure state? Sure, it keeps processing those create-pools which hopefully are just going through as no-ops. Is there danger to a cloud if this event keeps being triggered? Is it taking processor cycles or causing hook errors?

Drew Freiberger (afreiberger) wrote :

Sorry, I guess #7 should be addressed to Chris.

James Page (james-page) on 2018-07-17
Changed in charm-ceph-mon:
status: New → In Progress
importance: Undecided → Low

On a large enough cloud, it has gotten into a state where processing broker requests takes longer than the leadership change timeouts, which causes leadership to change, which causes the charm to re-process all broker requests, which takes longer ... ad infinitum. A partial change has been made reducing that happening, but is an incomplete solution. It is certainly burning CPU cycles and can begin to take a very long time as it has to process key requests for _every_ related unit, be that clients, OSDs, or the various other relations

Pedro Guimarães (pguimaraes) wrote :

This issue is related to Ceph Broker. AFAIK, the broker was adopted as a way of access for ceph clients without actually calling ceph directly. For that, ceph clients register their intent (ex.: create pools) on relation data and that information is collected and processed by Ceph MON's process_request call. The ceph broker provides a simpler API for standard calls: create pools, delete them, etc.

The point is, once a broker request is collected and processed by MON, it is kept in the relation data, until the same client changes its value for another one.

The main point of concern is that anything related to the ceph charm stack will trigger MON to look into multiple relations and execute same broker_req over and over again. If you, for example, add a new Ceph OSD, ceph-osd-relation-joined hook will be triggered on Ceph MON, which calls for function: https://github.com/openstack/charm-ceph-mon/blob/master/hooks/ceph_hooks.py#L451

If we look closer into this function, there is a call for notify_radosgw and notify_client, which will eventually look for a broker request.

This scenario happens on mon-relation-*, leader-settings-*, osd-relation-*, client-relation-* and update-status hooks every time. If it is a new broker request, it must be processed, however, if it is a broker request already treated, it will try to execute it again and ceph will return a fail (ex.: create same pools over and over again).

Pedro Guimarães (pguimaraes) wrote :

I proposed to have clients taking out their broker requests once they're done with.
Fix for Ceph Radosgw on: https://review.openstack.org/#/c/582172/
Will extend this solution for other ceph clients

Pedro Guimarães (pguimaraes) wrote :

This fix: https://review.openstack.org/584506 gets most of the process_request retrials filtered out. However, when a new unit is added, it will send a request for pool creation and that very first request is never filtered.

Example:
juju add-unit glance

This triggers ceph-mon, which verifies in its relation data if there is any broker_req available. It will execute this broker_req as none was treated before in this relation. However, the pools for glance (as a service) already exist, and it eventually gets blocked by Ceph itself:

2018-08-09 15:14:41 DEBUG juju-log client:42: Processing request ed2bcad3-9be6-11e8-aa87-fa163e3d6c70
2018-08-09 15:14:41 INFO juju-log client:42: Processing 1 ceph broker requests
2018-08-09 15:14:41 DEBUG juju-log client:42: Processing op='create-pool'
2018-08-09 15:14:41 DEBUG client-relation-changed obtained 'cephx.groups.images'
2018-08-09 15:14:42 DEBUG client-relation-changed set cephx.groups.images
2018-08-09 15:14:42 DEBUG juju-log client:42: Pool 'glance' already exists - skipping create

However, a series of other process_request that should be triggered will get filtered out. All these requests were treated before and will get blocked then:

2018-08-09 15:24:19 INFO juju-log osd:30: req_already_sent: {"api-version": 1, "ops": [], "request-id": "12e6ea8b-9bc5-11e8-8da0-fa163ebfef9d"}
2018-08-09 15:24:28 INFO juju-log osd:30: req_already_sent: {"api-version": 1, "ops": [{"op": "create-pool", "name": "glance", "replicas": 3, "pg_num": null, "weight": 5, "group": "images", "group-namespace": null}], "request-id": "2d5bb743-9bc5-11e8-8468-fa163e134cd7"}
2018-08-09 15:24:31 INFO juju-log osd:30: req_already_sent: {"api-version": 1, "ops": [{"op": "create-pool", "name": "glance", "replicas": 3, "pg_num": null, "weight": 5, "group": "images", "group-namespace": null}], "request-id": "ed2bcad3-9be6-11e8-aa87-fa163e3d6c70"}
2018-08-09 15:24:40 INFO juju-log osd:30: req_already_sent: {"api-version": 1, "request-id": "202ed45b-9bc5-11e8-9321-fa163e21a406", "ops": [{"group": "volumes", "name": "cinder-ceph", "weight": 40, "replicas": 3, "pg_num": null, "group-namespace": null, "op": "create-pool"}]}
2018-08-09 15:24:43 INFO juju-log osd:30: req_already_sent: {"api-version": 1, "request-id": "53618277-9bc7-11e8-bff1-fa163e96bf6b", "ops": [{"group": "volumes", "name": "cinder-ceph", "weight": 40, "replicas": 3, "pg_num": null, "group-namespace": null, "op": "create-pool"}]}

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers