Ubuntu is missing /dev/infiniband/rdma_cm group ownership udev rule

Bug #256216 reported by Roland Dreier on 2008-08-08
6
Affects Status Importance Assigned to Milestone
udev (Ubuntu)
Undecided
Scott James Remnant (Canonical)

Bug Description

The Debian version of udev ships with the rule

    KERNEL=="rdma_cm", GROUP="rdma"

in /etc/udev/rules.d/020_permissions.rules. This means that loading the rdma_ucm module results in a device file like

    crw-rw---- 1 root rdma 10, 60 2008-08-08 14:39 /dev/infiniband/rdma_cm

That is, the rdma_cm file is owned by group rdma, which means that ordinary users can be given permission to use the RDMA CM by adding them to the group "rdma". (The rdma_ucm kernel driver is designed so that this file is safe for ordinary users to access)

Ubuntu does not include this group ownership rule, due to a different policy about what rules are shipped as part of the udev package; the policy is that the group ownership rule should be in the package that uses the file, in this case librdmacm1.

This means that on Ubuntu systems, the device files end up as

    crw-rw---- 1 root root 10, 59 2008-08-08 14:33 rdma_cm

and only root can use the RDMA CM.

Roland Dreier (roland.dreier) wrote :

Here is a debdiff that bumps the version to -ubuntu1 and adds a librdmacm1.udev file with the required udev rule. It would be great if this could be integrated into the Ubuntu package

Roland Dreier (roland.dreier) wrote :

Is there any possibility of this getting reviewed and sponsored for Intrepid?

While this is technically correct, by policy, I would hesitate before applying this to Intrepid.

Too often, upstreams shirk the entire permissions problem by just telling distributions to "create another group and put users into it". It's neither a scalable nor even desirable solution, because it isn't a solution - it's just a workaround.

Instead we should ask more fundamental questions.

What is RDMA? What kind of user would need access to these device nodes? How will they use them? Are they connected to some kind of physical hardware attached to the machine, or a pluggable device that any user who inserts it at their seat would expect to be able to use?

Do users ever use these devices directly, or do they run programs that access them by talking a special protocol? Do users even run these programs at all, or are they daemons that manage the device, and present a user-space interface of their own?

If a device is present on the system, should any user be able to access it? Or is it a privilege only for certain users, or even the system adminstrator?

Roland Dreier (roland.dreier) wrote :

Fair questions, although it probably would have been better to look at this before the other group "rdma" changes went into libibverbs as part of hardy (see bug #225788). And I wish we could have had this discussion two months ago, rather than two weeks before the Intrepid release.

Anyway, I'll try to answer the questions:

 - RDMA stands for "remote direct memory access," and it is a type of high performance networking implemented by InfiniBand and some 10 GbE adapters. Part of RDMA is "kernel bypass," which allows userspace process direct access to hardware registers to reduce latency and CPU overhead in performing RDMA operations. http://en.wikipedia.org/wiki/RDMA has a more complete overview.

 - Users that are running high-performance jobs would need access to these device nodes; it makes sense to me that administrators would not necessarily want to allow all users to have direct access to do things that might interfere with other jobs on a high-performance network.

 - The device nodes in this particular bug are actually virtual devices that are used for connection setup; the actual direct-access nodes have permissions covered by the udev rules in the libibverbs1 package. In any case, RDMA hardware is generally a PCI Express or PCI-X card (basically a high-end NIC), although some systems have hardware directly on a system bus (AMD hypertransport, or IBM system p GX bus). The hardware is only pluggable via something like PCI hot-swap, which is generally a high-end server feature. It's definitely not something that a user on a multi-seat system is going to plug into a USB port.

 - Not sure what it would mean for users to use the devices directly -- obviously device access is through software (rather than poking solder balls with a wire or something like that). For the rdma_cm node specifically that this bug is about, typical user will link their application to librdmacm and use the library to establish RDMA connections. Users will then run their application directly (or possibly through a job submission queue for large shared clusters).

 - As I said before, the rdma_cm device nodes should be usable by non-administrator users, but the administrator probably wants the ability to restrict access to only certain users.

Let me ask on fundamental question of my own: if upstreams are shirking responsibility by suggesting that standard group permissions be used by administrators to set policy, what do you feel is a better way for upstreams to provide this mechanism?

Roland Dreier (roland.dreier) wrote :

How do we make progress on this? As it stands only root (or suid apps) can use librdmacm with Intrepid. (Intrepid is the first release to include librdmacm)

We need to think about it.

Since this is an issue of security and permissions, it's far better to ship requiring root access or administrator intervention than it is to ship with too light permissions.

This will be addressed in 8.10

Roland Dreier (roland.dreier) wrote :

Seems like time is running out to address this in 8.10?

This isn't really a security issue -- the rdma_cm device node is designed to be safe for unprivileged users to access, and Debian has been shipping udev rules that give group "rdma" access for quite a while with no reported security issues. And given that no users are in group "rdma" by default anyway, administrator intervention is required for this to make a difference even with the patch applied.

I thought the objection was to having an "rdma" group, and I'd like to make progress on some more acceptable alternative mechanism, but I need some hint as to what that would be.

On Fri, 2008-10-24 at 02:03 +0000, Roland Dreier wrote:

> Seems like time is running out to address this in 8.10?
>
Sorry, I mean 9.04

Scott
--
Scott James Remnant
<email address hidden>

Roland Dreier (roland.dreier) wrote :

Is now an appropriate time to address this for Jaunty?

On Sun, 2008-11-16 at 05:17 +0000, Roland Dreier wrote:

> Is now an appropriate time to address this for Jaunty?
>
We can have the discussion,

This is very low down my priority stack right now - with a resync with
the upstream udev rules much higher on it.

Scott
--
Scott James Remnant
<email address hidden>

Roland Dreier (roland.dreier) wrote :

Yes, let's have the discussion please. I understand that this isn't a high priority for you, but taking a little time early in the release cycle seems to be the only way to get this resolved for Jaunty. I don't see any way to make progress unless you give a hint as to what you feel is a better solution than my proposed patch. We should also take into account the fact that the libibverbs package has shipped group rdma rules for other device nodes for at least two Ubuntu releases now.

Past history is irrelevant.

We are attempting to avoid "groups for device node access" wherever possible.

Roland Dreier (roland.dreier) wrote :

 > Past history is irrelevant.

I think backwards compatibility and simplicity count for something. The only point about libibverbs is that the group rdma permissions are already applied to /dev/infiniband/uverbsX, and the class of users/applications that use those nodes are the same as the ones that would use /dev/infiniband/rdma_cm. So given that any systems where libibverbs is in use have already configured the rdma group, it seems we should at least consider whether we want to break that setup and/or introduce a different mechanism to do the same thing.

 > We are attempting to avoid "groups for device node access" wherever possible.

I think I've gotten that message. But could you give a hint as to what alternative mechanism you are using? It's very frustrating to get replies to all my comments except for when I ask what you want me to do.

On Mon, 2008-11-17 at 18:23 +0000, Roland Dreier wrote:

> > We are attempting to avoid "groups for device node access" wherever
> possible.
>
> I think I've gotten that message. But could you give a hint as to what
> alternative mechanism you are using? It's very frustrating to get
> replies to all my comments except for when I ask what you want me to do.
>
Access to system devices is provided through the HAL or DeviceKit
interface. Permission to access is managed through the PolicyKit layer,
where the D-Bus system bus service providing the device access
negotiates privilege with the application requesting it.

Scott
--
Scott James Remnant
<email address hidden>

Roland Dreier (roland.dreier) wrote :

D-Bus/PolicyKit seems very much overengineered and too complex for this issue, and it doesn't fit the model of RDMA very well anyway, since the whole point of RDMA is that unprivileged userspace applications use RDMA hardware directly without the overhead of a system call into the kernel, let alone a D-Bus method call to another process.

Anyway I don't think it makes sense to try and implement what you're talking about just for Ubuntu, so I suggest you close this bug as "won't fix" and I'll just point people at fixed packages in my PPA.

On Wed, 2008-11-19 at 16:22 +0000, Roland Dreier wrote:

> D-Bus/PolicyKit seems very much overengineered and too complex for this
> issue, and it doesn't fit the model of RDMA very well anyway, since the
> whole point of RDMA is that unprivileged userspace applications use RDMA
> hardware directly without the overhead of a system call into the kernel,
> let alone a D-Bus method call to another process.
>
I don't agree.

Adding a PolicyKit authorization to use RDMA devices is not practically
any harder than adding a group; in fact, maintenance-wise it's
substantially easier.

HAL may then be used to apply an ACL to the devices automatically if you
want raw library aaccess.

Scott
--
Scott James Remnant
<email address hidden>

Roland Dreier (roland.dreier) wrote :

 > Adding a PolicyKit authorization to use RDMA devices is not practically
 > any harder than adding a group; in fact, maintenance-wise it's
 > substantially easier.

I have to admit I have no idea how to do that. But anyway, what's the point? No one is wants to change their app to talk to PolicyKit to request access to a device node just to run on Ubuntu.

 > HAL may then be used to apply an ACL to the devices automatically if you
 > want raw library aaccess.

How would that work? On a system with the rdma_ucm module loaded (so /dev/infiniband/rdma_cm exists), I see nothing promising-looking in lshal to tell hal about. And what's the advantage to using hal to change group permissions on a file when udev can create it with the correct permissions?

A very common use case would be a multi-user cluster, where MPI jobs spawn processes on many nodes using ssh, with possibly multiple different users processes running on a node at once. For example, Open MPI (already in the Ubuntu archive) is typically used with RDMA in this way. And MPI is hard enough to configure without trying to explain to users that they get permission denied errors because their ConsoleKit can't connect to the session bus or something like that.

Luca Falavigna (dktrkranz) wrote :

I'm unsubscribing u-u-s for now waiting for a debdiff against current Jaunty version.
Could you please coordinate with Scott to have this uploaded? Thanks!

Just winding back this discussion briefly:

> D-Bus/PolicyKit seems very much overengineered and too complex for this issue, and it doesn't fit the model of
> RDMA very well anyway, since the whole point of RDMA is that unprivileged userspace applications use RDMA
> hardware directly without the overhead of a system call into the kernel, let alone a D-Bus method call to another
> process.
>
I missed a key part of this paragraph before. You say that the whole point is that unprivileged userspace applications can use RDMA directly?

If that's the case, should these devices not simply have -rw-rw-rw permissions (like /dev/net/tun, /dev/fuse, etc.) so that all userspace applications can use them?

Roland Dreier (roland.dreier) wrote :

 > I missed a key part of this paragraph before. You say that the whole point is that
 > unprivileged userspace applications can use RDMA directly?

Yes, non-suid executables run by normal users should be able to use RDMA directly in a safe fashion.

 > If that's the case, should these devices not simply have -rw-rw-rw permissions (like
 > /dev/net/tun, /dev/fuse, etc.) so that all userspace applications can use them?

Having 0666 permissions would not necessarily be a bad idea, but the consensus among other distributions is to limit RDMA access to an "rdma" group so that administrators have some control over who gets direct hardware access (even though in theory it is safe for anyone, there is the possibility of untrusted users consuming network bandwidth at least). Also, RDMA often requires increasing the amount of locked memory allowed in /etc/security/limits.conf, and doing that by group "rdma" is convenient as well.

Given that you seem to have moved fuse from 0660 to 0666 between Intrepid and Jaunty, I guess it would be consistent to have the same permission for rdma access. Is there some reason that you keep the "fuse" group around and make /dev/fuse owned by it, or is that just a leftover from the old udev rules?

On Fri, 2009-01-23 at 05:46 +0000, Roland Dreier wrote:

> > I missed a key part of this paragraph before. You say that the whole point is that
> > unprivileged userspace applications can use RDMA directly?
>
> Yes, non-suid executables run by normal users should be able to use RDMA
> directly in a safe fashion.
>
> > If that's the case, should these devices not simply have -rw-rw-rw permissions (like
> > /dev/net/tun, /dev/fuse, etc.) so that all userspace applications can use them?
>
> Having 0666 permissions would not necessarily be a bad idea, but the
> consensus among other distributions is to limit RDMA access to an "rdma"
> group so that administrators have some control over who gets direct
> hardware access
>
Any rule we add will be in upstream udev; so all the distributions would
end up with it anyway. Upstream udev strongly discourages groups for
device access that users are placed in.

> (even though in theory it is safe for anyone, there is
> the possibility of untrusted users consuming network bandwidth at
> least).
>
It's pretty easy to consume network bandwidth from userspace, you open
lots of sockets to somewhere and start reading or writing ;-)

Likewise it's pretty trivial to consume memory.

> Also, RDMA often requires increasing the amount of locked
> memory allowed in /etc/security/limits.conf, and doing that by group
> "rdma" is convenient as well.
>
So it sounds like there's other limits in place anyway to what people
can do with RDMA? Sounds safe

> Given that you seem to have moved fuse from 0660 to 0666 between
> Intrepid and Jaunty, I guess it would be consistent to have the same
> permission for rdma access. Is there some reason that you keep the
> "fuse" group around and make /dev/fuse owned by it, or is that just a
> leftover from the old udev rules?
>
The group is leftover from before.

Scott
--
Scott James Remnant
<email address hidden>

We seem to be at consensus here, so I'll have the following rule merged upstream:

  KERNEL=="rdma_cm", MODE="0666"

Changed in librdmacm:
assignee: nobody → scott

Following rules present in jaunty:

KERNEL=="umad[0-9]*", NAME="infiniband/%k"
KERNEL=="issm[0-9]*", NAME="infiniband/%k"
KERNEL=="ucm[0-9]*", NAME="infiniband/%k", MODE="0666"
KERNEL=="uverbs[0-9]*", NAME="infiniband/%k", MODE="0666"
KERNEL=="uat", NAME="infiniband/%k", MODE="0666"
KERNEL=="ucma", NAME="infiniband/%k", MODE="0666"
KERNEL=="rdma_cm", NAME="infiniband/%k", MODE="0666"

Changed in udev:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers