Manila overwrite existing Ceph users

Bug #1904015 reported by Babel Jahson
36
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Shared File Systems Service (Manila)
Fix Released
High
Goutham Pacha Ravi
ceph (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Description
=============

I'm currently testing manila with CephFS and I stumbled upon a behavior
where manila is able to overwrite existing Ceph users.
In my testing setup Glance, Nova, Cinder and Manila share the same Ceph
cluster. However they have different users.
When a share is created and an "allow-access" is made on that share for a service user (cinder/nova/glance) it will overwrite the existing user, removing access on the pools in order to set permissions for the share.

Steps to reproduce
==================

* Having a running OpenStack with Cinder/Glance/Nova/Manila all configured with one Ceph cluster using different pools.
* Create a share and allow access to it with one of the users used for OpenStack services (Cinder/Nova/Glance..)
manila create --share-type cephfstype --name Share1 cephfs 25
manila access-allow Share1 cephx cindertest

Expected result
===============

A better option would be to prevent the creation by Manila of users used by others OpenStack services.

Actual result
=============

It works but this user is used by Ceph and OpenStack to provide access on pools for running services. Changing it to access only one share will result in breaking all resources that was using it.

Environment
===========

I'm currently running OpenStack Rocky, with Ceph Nautilus.

Logs & Configs
==============
Just an example of how the user change in the Ceph cluster config : http://paste.openstack.org/show/799959/

Jahson

CVE References

Vida Haririan (vhariria)
Changed in manila:
importance: Undecided → Medium
Revision history for this message
Vida Haririan (vhariria) wrote :
Revision history for this message
Giulio Fidente (gfidente) wrote :

I am looking at if we can implement some safety measures in the Ceph cluster; for example it would be nice if we had the ability to configure a cephx account for it to be allowed to create entried but edit/delete only those it created in the first place

from the ceph docs [1] it doesn't look like the authorization (capabilities) for a user can be configured to do so; if a cephx user has admin rights it can mess up with any existing user

I will dig more and see what if anything is possible

1. https://docs.ceph.com/en/latest/rados/operations/user-management/#authorization-capabilities

Revision history for this message
Goutham Pacha Ravi (gouthamr) wrote :

Hi,

Thank you for raising this issue. I am switching the visibility of this bug and have made this a security bug for the moment while we reproduce this and discuss the impact on Ceph and OpenStack infrastructure. Please feel free to add any further information, however, I request that you refrain from adding subscribers that you do not trust.

Thanks,
Goutham

information type: Public → Private
Revision history for this message
Babel Jahson (jbabel) wrote :

Hello,

Okay, maybe I'll add one or two colleagues who work with me but it should be all.
If you need me to do some testing or something else, feel free to ask I'll be glad to help if I can.

Jahson

Revision history for this message
Goutham Pacha Ravi (gouthamr) wrote :
Download full text (3.2 KiB)

Hello,

Thanks Jahson.

I was able to reproduce this bug, and break my openstack infrastructure.
Specifically, manila's access control allows three kinds of damage to existing
users:

1) "manila access-allow" *can* reset the "caps" (capabilities) of pre-existing
   ceph users, thereby breaking any ceph user's non-manila workloads
2) "manila access-deny" can delete ceph users, even users that were not created
   by "manila access-allow"
3) "manila access-list" can leak privileged/infrastructure users' access-keys:
   in cases where pre-existing ceph capabilities are maintained, leaking ceph
   keys through manila would mean users can wreak havoc on the ceph cluster.

lets break it down a bit more:

a) The CephFS driver in manila interacts with ceph via the python-cephfs
   package, which is derived from a python module in the ceph repository [1].
   When you allow access to a ceph client user, this volume client code
   checks whether the client user exists, or it will go ahead and create one.
   If the ceph client user exists, it checks whether there are any
   pre-existing "mds" caps to carefully craft an update [2]. If there are no
   mds caps,
   the pre-existing caps are ignored and overwritten. This seems to be a
   problem, because ceph client user may not have any mds caps, but may have
   osd/mgr/mon caps. So, if we fix the check in [2], you will not have manila
   breaking other users' pre-existing capabilities.
b) On removing access from a share (cephfs subvolume), if a ceph client user
   has access to no other cephfs subvolume, the ceph-side code deletes the user.
   This is again a huge problem, ceph client users may have non-manila
   workloads and deleting the user will break those workloads.
c) There's a question of whether manila (or specifically the cephfs driver)
   should be modifying users that it doesn't create at all.
   It's possible that that's desired - you may have pre-created cephfs client
   users, and would like them to use manila shares.
   However, currently, there's no way to prevent anyone but the manila service
   user [3] to be used for authentication.
   There's two ways to achieve this Should we have a "denylist" of users that
   are not allowed to be used for auth with manila?
   We have two places where this could go:
   - We can make a mutable configuration option in manila to specify a
     list/regex of client users that the driver can prevent using for auth. This
     gives an OpenStack administrator the power to protect their infra users.
   - We can make ceph handle a denylist - I dunno how we can achieve this piece,
     we can consult with ceph developers - this may be more dynamic, since we
     can add or remove users from such a deny list without having to mess with
     configuration.

Introducing a configurable denylist *somewhere* may prevent all three problems
highlighted. Does anyone think of any drawbacks to this approach?

[1] https://github.com/ceph/ceph/blob/master/src/pybind/ceph_volume_client.py
[2] https://github.com/ceph/ceph/blob/a0e1a8f17372b361db68cc4994120e729d8e484a/src/pybind/ceph_volume_client.py#L1115-L1116
[3] https://opendev.org/openstack/manila/src/co...

Read more...

Changed in manila:
status: New → Confirmed
importance: Medium → High
Revision history for this message
Goutham Pacha Ravi (gouthamr) wrote :

An update regarding this bug.

We have held several rounds of brainstorming sessions with CephFS engineers in the past couple of weeks. It became apparent in the earliest discussions that a manila side resolution by maintaining a "denylist" of ceph users isn't helpful in the long run. It would merely allow OpenStack administrators set aside some unusable user names - but leaves the security hole unplugged for other non OpenStack consumers.

The current solution we're working on is in the ceph_volume_client library - and not in manila. When this fix lands, only users created by manila can be manipulated by manila. This will disallow pre-existing users consuming CephFS shares via manila. The consequence is that you may have to make up a cephx user name to interact with manila if your "manila access-allow" command fails. This is a bit of a workaround, and we'll document this behavior loud and clear. We'll also work on adding an asynchronous user message in manila to enhance the user experience and allow users to discover this if they don't read documentation.

A CVE has been reserved for this vulnerability in the ceph_volume_client: CVE-2020-27781 "ceph: vulnerability in RHCS"

This issue is still under embargo. We expect this embargo to end in mid-december 2020. At the embargo end-date, we'll have patches submitted against ceph (https://github.com/ceph/ceph) to patch this vulnerability.

It's however, entirely possible that we don't keep these timelines. Please bear with me as we're working through this via various teams/channels.

Thanks!
Goutham

Revision history for this message
Babel Jahson (jbabel) wrote :

Hi,

I apologize for the delay in my response but it's been some big weeks for me.
Thanks for the update, that's great news ! I didn't expect it to be taken care of so quickly.
It's nice to see the path you've taken to resolve this. I naively believed that a manila's side correction would be enough when I submitted this issue.
Everything seems clear to me and on it's way to be fixed nicely.

Thank you for all the work on this.

Jahson

Revision history for this message
Goutham Pacha Ravi (gouthamr) wrote :

Hello all,

Thank you for your patience with this issue. This morning, we finished our embargo period on this bug. MITRE will be notified about the patch submissions to the Ceph project - at which point the CVE page [1] will be available publicly. These are the associated patch links:

Ceph Octopus: https://github.com/ceph/ceph/commit/1b8a634fdcd94dfb3ba650793fb1b6d09af65e05
Ceph Nautilus: https://github.com/ceph/ceph/commit/7e3e4e73783a98bb07ab399438eb3aab41a6fc8b
Ceph Luminous: https://github.com/ceph/ceph/commit/956ceb853a58f6b6847b31fac34f2f0228a70579

You will see these show up in releases of Ceph Octopus and Ceph Nautilus. The patch to Luminous has been provided for courtesy, the ceph community no longer produces updates for that release. Please see the Ceph Release Guide for more information on the Ceph release train [2].

I'm now converting this bug to "Public", and since there are no changes to OpenStack Manila code that are necessary, you will see me publishing a security note to the mailing lists with details about this vulnerability and recommendations.

The OpenStack Security Note is under review here: https://review.opendev.org/767417

[1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=2020-27781
[2] https://docs.ceph.com/en/latest/releases/general/

information type: Private → Public
Changed in manila:
status: Confirmed → Fix Released
milestone: none → wallaby-rc1
assignee: nobody → Goutham Pacha Ravi (gouthamr)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to manila (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/manila/+/773419
Committed: https://opendev.org/openstack/manila/commit/8f969689efe27f4adf1546a99d3cdfa71266671d
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 8f969689efe27f4adf1546a99d3cdfa71266671d
Author: Goutham Pacha Ravi <email address hidden>
Date: Mon Jan 25 23:44:32 2021 -0800

    [Native CephFS] Add messages for async ACL ops

    Access rules added to CephFS shares can fail
    at the driver, or by the ceph volume client library.
    Since the share manager can supply rule changes to
    the driver in batches, the driver has to gracefully
    handle individual rule failures.

    Further some of the causes of the access rule
    failures can be remedied by end users, therefore
    asynchronous user messages would be a good vehicle
    to register user faults that can be examined and
    corrected.

    Related-Bug: #1904015
    [1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=2020-27781

    Change-Id: I3882fe5b1ad4a6cc71c13ea70fd6aea10430c42e
    Signed-off-by: Goutham Pacha Ravi <email address hidden>
    (cherry picked from commit da3ab2cf4512716fa47a16315e98e610fbaed829)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to manila (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/manila/+/792770

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to manila (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/manila/+/792770
Committed: https://opendev.org/openstack/manila/commit/3b31aae991f080e7b4eefd4d651feb06dfd7abb0
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 3b31aae991f080e7b4eefd4d651feb06dfd7abb0
Author: Goutham Pacha Ravi <email address hidden>
Date: Mon Jan 25 23:44:32 2021 -0800

    [Native CephFS] Add messages for async ACL ops

    Access rules added to CephFS shares can fail
    at the driver, or by the ceph volume client library.
    Since the share manager can supply rule changes to
    the driver in batches, the driver has to gracefully
    handle individual rule failures.

    Further some of the causes of the access rule
    failures can be remedied by end users, therefore
    asynchronous user messages would be a good vehicle
    to register user faults that can be examined and
    corrected.

    Related-Bug: #1904015
    [1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=2020-27781

    Change-Id: I3882fe5b1ad4a6cc71c13ea70fd6aea10430c42e
    Signed-off-by: Goutham Pacha Ravi <email address hidden>
    (cherry picked from commit da3ab2cf4512716fa47a16315e98e610fbaed829)
    (cherry picked from commit 8f969689efe27f4adf1546a99d3cdfa71266671d)

tags: added: in-stable-ussuri
James Page (james-page)
Changed in ceph (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.