New Ceph deployment immediately goes to HEALTH_WARN - AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure global_id reclaim

Bug #1929262 reported by Nobuto Murata
68
This bug affects 10 people
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Fix Committed
Medium
Chris MacNaughton
Prometheus Ceph Exporter Charm
Invalid
Medium
Unassigned
prometheus-ceph-exporter snap
Fix Released
Undecided
Unassigned

Bug Description

$ juju deploy --series focal -n3 ceph-mon

$ juju run --unit ceph-mon/leader -- sudo ceph health detail
HEALTH_WARN mons are allowing insecure global_id reclaim; OSD count 0 < osd_pool_default_size 3
[WRN] AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure global_id reclaim
    mon.ip-172-31-14-108 has auth_allow_insecure_global_id_reclaim set to true
    mon.ip-172-31-20-25 has auth_allow_insecure_global_id_reclaim set to true
    mon.ip-172-31-33-59 has auth_allow_insecure_global_id_reclaim set to true
[WRN] TOO_FEW_OSDS: OSD count 0 < osd_pool_default_size 3

So the Ceph deployment never goes to HEALTH_OK. I suppose this has been introduced by the upstream change as CVE-2021-20288 and SRU completed on May 20th, 2021:
https://docs.ceph.com/en/latest/security/CVE-2021-20288/
https://ubuntu.com/security/CVE-2021-20288
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1921349

We need to properly clear out the warning, otherwise any sort of monitoring doesn't work. We cannot unconditionally set auth_allow_insecure_global_id_reclaim as false for existing deployments since it would kick out not-updated clients.

Revision history for this message
Nobuto Murata (nobuto) wrote :

Subscribing ~field-high. It's hitting field deployments. It's not a critical blocker, but the charm needs to react to the upstream change appropriately.
https://docs.ceph.com/en/latest/security/CVE-2021-20288/

Workaround: it's not technically a workaround, but for new deployments with up-to-date packages, the following command can be executed to set the recommended value by the upstream:
$ juju run --unit ceph-mon/leader -- \
    sudo ceph config set mon auth_allow_insecure_global_id_reclaim false
https://docs.ceph.com/en/latest/security/CVE-2021-20288/#recommendations

As a side note, in verification steps of SRU, it would be nice to check `ceph health` so we would have noticed this earlier.

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

On the topic of SRU verification, this is exactly a known state and is expected, Ceph going into warning because of the change landing isn't unexpected. As noted, the Ceph charms should be improved to handle this in a sane way, but it's a risky change because making the change on an existing cluster could drop all out-of-date clients.

The correct workaround is also noted on the Ceph CVE bug reference linked above, but here again for easy access:

https://docs.ceph.com/en/latest/security/CVE-2021-20288/#recommendations

   1. Users should upgrade to a patched version of Ceph at their earliest convenience.

    2. Users should upgrade any unpatched clients at their earliest convenience. By default, these clients can be easily identified by checking the ceph health detail output for the AUTH_INSECURE_GLOBAL_ID_RECLAIM alert.

    3. If all clients cannot be upgraded immediately, the health alerts can be temporarily muted with:

    ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1w # 1 week
    ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w # 1 week

    4. After all clients have been updated and the AUTH_INSECURE_GLOBAL_ID_RECLAIM alert is no longer present, the cluster should be set to prevent insecure global_id reclaim with:

    ceph config set mon auth_allow_insecure_global_id_reclaim false

Revision history for this message
Drew Freiberger (afreiberger) wrote :

After updating and rebooting an entire cloud, I see that there's an issue with client.prometheus-ceph-exporter using an old client with the insecure global_id reclaim.

Adding prometheus-ceph-exporter charm to this bug.

Ultimately, if all you see is that client alerting in 'ceph health detail' such as:
$ sudo ceph health detail
HEALTH_WARN client is using insecure global_id reclaim; mons are allowing insecure global_id reclaim
[WRN] AUTH_INSECURE_GLOBAL_ID_RECLAIM: client is using insecure global_id reclaim
    client.prometheus-ceph-exporter at v1:10.0.0.123:0/2623829913 is using insecure global_id reclaim
[WRN] AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure global_id reclaim
    mon.juju-e75ae8-5-lxd-0 has auth_allow_insecure_global_id_reclaim set to true
    mon.juju-e75ae8-4-lxd-0 has auth_allow_insecure_global_id_reclaim set to true
    mon.juju-e75ae8-3-lxd-0 has auth_allow_insecure_global_id_reclaim set to true

I suggest adding relation between prometheus:target and ceph-mon:prometheus to start capturing ceph stats from ceph-mon directly, and then removing application prometheus-ceph-exporter and then running the command:

    ceph config set mon auth_allow_insecure_global_id_reclaim false

lp#1912557 is tracking the need to add dashboards compatible with ceph-mon's built-in exporter for grafana to display the new data.

Edin S (exsdev)
Changed in charm-prometheus-ceph-exporter:
importance: Undecided → Medium
Revision history for this message
Nobuto Murata (nobuto) wrote :

> client.prometheus-ceph-exporter at v1:10.0.0.123:0/2623829913 is using insecure global_id reclaim

Can we rebuild the prometheus-ceph-exporter snap to pull up-to-date librados properly as a short-term solution here?

Revision history for this message
Nobuto Murata (nobuto) wrote :

> > client.prometheus-ceph-exporter at v1:10.0.0.123:0/2623829913 is using insecure global_id reclaim
>
> Can we rebuild the prometheus-ceph-exporter snap to pull up-to-date librados properly as a short-term solution here?

The following patch would be band-aid, and I've confirmed it's working with focal-ussuri(Octopus).

https://github.com/nobuto-m/snap-prometheus-ceph-exporter/compare/stable/3.0.0-nautilus...3.0.0-nautilus/edge
$ git diff stable/3.0.0-nautilus
diff --git a/snapcraft.yaml b/snapcraft.yaml
index 5316954..0dc2ca4 100644
--- a/snapcraft.yaml
+++ b/snapcraft.yaml
@@ -5,9 +5,16 @@ summary: Prometheus Ceph Exporter
 description: |
   Exporter that exposes information gathered from Ceph for use by the Prometheus monitoring system
 confinement: strict
+package-repositories:
+ - type: apt
+ components: [main]
+ suites: [bionic-updates/ussuri]
+ key-id: 391A9AA2147192839E9DB0315EDB1B62EC4926EA
+ url: http://ubuntu-cloud.archive.canonical.com/ubuntu
 parts:
   ceph-exporter:
     plugin: go
+ go-channel: 1.13/stable
     source: https://github.com/digitalocean/ceph_exporter.git
     go-importpath: github.com/digitalocean/ceph_exporter
     source-tag: 3.0.0-nautilus

Just for the record, using "nautilus" branch in the upstream(digitalocean) exporter doesn't work out of the box. It requires another hack to make it work with the existing scheme with our charm. Also, it's not compatible from an exporter output point of view so it actually breaks some graphs set up by the charm...
https://github.com/nobuto-m/snap-prometheus-ceph-exporter/compare/stable/3.0.0-nautilus...nautilus/edge

Upgrading to core20 instead of cloud-archive:ussuri didn't work either.
https://snapcraft.io/docs/go-plugin#heading--core20
"The go plugin in core20 exclusively required the use of go.mod"
So the build fails without cherry-picking go.mod changes in the upstream on top of the 3.0.0-nautilus tag.

Revision history for this message
Nobuto Murata (nobuto) wrote :

Fwiw, this is the snap binary built with go 1.13 and cloud: bionic-ussuri a above.

Revision history for this message
Nobuto Murata (nobuto) wrote :
Changed in charm-prometheus-ceph-exporter:
status: New → Invalid
Revision history for this message
Xav Paice (xavpaice) wrote :

Prometheus-ceph-exporter snap has been updated on the candidate channel with a newly built snap using v3, and the patch mentioned above.

Note that the stable snap channel still has v2, for older deployments.

Changed in snap-prometheus-ceph-exporter:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-mon (master)
Changed in charm-ceph-mon:
status: New → In Progress
Changed in charm-ceph-mon:
assignee: nobody → Chris MacNaughton (chris.macnaughton)
James Page (james-page)
Changed in charm-ceph-mon:
importance: Undecided → Medium
milestone: none → 22.04
Revision history for this message
Vern Hart (vern) wrote :

Has this charm fix been abandoned?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (master)

Reviewed: https://review.opendev.org/c/openstack/charm-ceph-mon/+/805305
Committed: https://opendev.org/openstack/charm-ceph-mon/commit/a1d0518c80fd4d745bd979e973e82795e04384c0
Submitter: "Zuul (22348)"
Branch: master

commit a1d0518c80fd4d745bd979e973e82795e04384c0
Author: Chris MacNaughton <email address hidden>
Date: Thu Aug 19 16:05:34 2021 -0500

    Disable insecure global-id reclamation

    Closes-Bug: #1929262
    Change-Id: Id9f4cfdd70bab0090b66cbc8aeb258936cbf909e

Changed in charm-ceph-mon:
status: In Progress → Fix Committed
Revision history for this message
Nobuto Murata (nobuto) wrote :

Can we backport the fix down to Yoga/Quincy please?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.