Probable DOS in linuxbridge

Bug #1732294 reported by Sarah Newman on 2017-11-14
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Security Advisory
Undecided
Unassigned
OpenStack Security Notes
Undecided
Unassigned
neutron
Critical
Brian Haley

Bug Description

We experienced a DOS yesterday on a system (not openstack based) which would have been mitigated if a mac address whitelist in ebtables had occurred in the nat PREROUTING chain rather than the filter FORWARD chain.

At least with kernel version 4.9, with rapidly cycling mac addresses the linux bridge appears to get bogged down in learning new MAC addresses if this is not explicitly turned off with brctl setageing <bridge> 0.

We deployed a workaround to our own infrastructure but I believe https://git.openstack.org/cgit/openstack/neutron/tree/neutron/plugins/ml2/drivers/linuxbridge/agent/arp_protect.py#n158 means that openstack has the same vulnerability.

It should be possible to move all logic related to checking the input to the ebtables nat PREROUTING chain using the ebtables_nat module.

To duplicate, in a VM on a host with bridged networking and mac spoofing protection in place, install dsniff and run:

macof -i <ethernet device> -s <valid local IP> -d <valid remote IP> -n 50000000 &> /dev/null

Observe on the host that ksoftirqd usage goes to near 100% on one core, that 'perf top' will show br_fdb_update as taking significant resources, and that 'brctl showmacs <bridge>' will probably hang.

Sarah Newman (srn-f) on 2017-11-14
description: updated
description: updated

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

description: updated

It sounds like a kernel bug... Neutron-coresec, could you check if the propose mitigation could be implemented?

Sarah, maybe it would help to know what is the kernel version you are running?

On 11/14/2017 05:01 PM, Tristan Cacqueray wrote:
> It sounds like a kernel bug... Neutron-coresec, could you check if the
> propose mitigation could be implemented?
>
> Sarah, maybe it would help to know what is the kernel version you are
> running?
>

It was 4.9.39 under Xen, but I'm able to reproduce with 3.18.25.

Regardless of whether filtering is enabled, unless I missed something in newer kernels, there appears to be no way to limit either the size of the mac
address cache or rate limit how often the bulk of br_fdb_update runs. You're right that probably something needs to change in the kernel and I'm not
sure of the best place to direct that conversation. Do you think that would be <email address hidden> or one of the network related mailing lists?

For the kernel, I think the fastest solution to implement would be a hard limit controlled by sysctl (maybe defaulting to 1024, the same as ipv4/ipv6
gc_thresh3) and for the mac address count to be incremented in br_fdb_update and decremented in fdb_delete.

--Sarah

Contacting <email address hidden> with as much detail as can sounds like a good things to do, see https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html .

IIUC, the load happen when an instance simply tries to spoof many new mac address? If so, that doesn't seems like something we need to keep under embargo and it could probably be easier to fix this issue in public.

Sarah Newman (srn-f) wrote :

On 11/14/2017 07:11 PM, Tristan Cacqueray wrote:
> Contacting <email address hidden> with as much detail as can sounds like a
> good things to do, see https://www.kernel.org/doc/html/latest/admin-
> guide/security-bugs.html .
>
> IIUC, the load happen when an instance simply tries to spoof many new
> mac address? If so, that doesn't seems like something we need to keep
> under embargo and it could probably be easier to fix this issue in
> public.
>

MAC spoofing is obvious and easy, but maybe so obvious that it's gone out of style to try it. I have to assume so since I don't think anybody has
bothered to fix it.

Give me a chance to duplicate with net-next and contact kernel.org? They will probably want to make it public ASAP since there's an easy mitigation
and there's also prior discussion from 2013 https://www.keypressure.com/blog/linux-bridge-port-security/ , but I don't have a feel for these things.

--Sarah

Sure, it's better to be safe than sorry.

For the record we do not have (yet) a hard limit on the acceptable duration of report embargoes. In the event we open a bug report, it will have to go through an embargo-exception first, as explained here: https://security.openstack.org/vmt-process.html#embargo-exceptions

Also it's worth noting that vulnerability reporters retain final control over the disclosure of their findings. If for some reason they are uncomfortable with our process, their choice of disclosure terms prevails.

Thanks.

Sarah Newman (srn-f) wrote :

On 11/14/2017 08:52 PM, Tristan Cacqueray wrote:
> Sure, it's better to be safe than sorry.
>
> For the record we do not have (yet) a hard limit on the acceptable
> duration of report embargoes. In the event we open a bug report, it will
> have to go through an embargo-exception first, as explained here:
> https://security.openstack.org/vmt-process.html#embargo-exceptions
>
> Also it's worth noting that vulnerability reporters retain final control
> over the disclosure of their findings. If for some reason they are
> uncomfortable with our process, their choice of disclosure terms
> prevails.
>
> Thanks.
>

Do what you will regarding any embargo.

I posted publicly to netdev at the request of <email address hidden>. Hopefully a fix will get in within the next few weeks.

Jeremy Stanley (fungi) wrote :

Thanks for the heads up! It's our policy to go ahead and end embargoes once an issue is publicly disclosed, so we'll move forward triaging this as class C2 "A vulnerability, but not in OpenStack supported code, e.g., in a dependency" per our report taxonomy: https://security.openstack.org/vmt-process.html#incident-report-taxonomy

Adding a new OSSN task in case the security note editors want to publish something about this prior to or once the kernel fix is available.

description: updated
information type: Private Security → Public
tags: added: security
Changed in ossa:
status: New → Won't Fix
Sarah Newman (srn-f) wrote :

I'd recommend deploying the mitigation of moving the mac filtering rules to the ebtables nat PREROUTING chain rather than relying on a kernel fix.

On 11/15/2017 12:35 PM, Jeremy Stanley wrote:
> Thanks for the heads up! It's our policy to go ahead and end embargoes
> once an issue is publicly disclosed, so we'll move forward triaging this
> as class C2 "A vulnerability, but not in OpenStack supported code, e.g.,
> in a dependency" per our report taxonomy: https://security.openstack.org
> /vmt-process.html#incident-report-taxonomy
>
> Adding a new OSSN task in case the security note editors want to publish
> something about this prior to or once the kernel fix is available.
>
> ** Description changed:
>
> - This issue is being treated as a potential security risk under embargo.
> - Please do not make any public mention of embargoed (private) security
> - vulnerabilities before their coordinated publication by the OpenStack
> - Vulnerability Management Team in the form of an official OpenStack
> - Security Advisory. This includes discussion of the bug or associated
> - fixes in public forums such as mailing lists, code review systems and
> - bug trackers. Please also avoid private disclosure to other individuals
> - not already approved for access to this information, and provide this
> - same reminder to those who are made aware of the issue prior to
> - publication. All discussion should remain confined to this private bug
> - report, and any proposed fixes should be added to the bug as
> - attachments.
> -
> - --
> -
> We experienced a DOS yesterday on a system (not openstack based) which
> would have been mitigated if a mac address whitelist in ebtables had
> occurred in the nat PREROUTING chain rather than the filter FORWARD
> chain.
>
> At least with kernel version 4.9, with rapidly cycling mac addresses the
> linux bridge appears to get bogged down in learning new MAC addresses if
> this is not explicitly turned off with brctl setageing <bridge> 0.
>
> We deployed a workaround to our own infrastructure but I believe
> https://git.openstack.org/cgit/openstack/neutron/tree/neutron/plugins/ml2/drivers/linuxbridge/agent/arp_protect.py#n158
> means that openstack has the same vulnerability.
>
> It should be possible to move all logic related to checking the input to
> the ebtables nat PREROUTING chain using the ebtables_nat module.
>
> To duplicate, in a VM on a host with bridged networking and mac spoofing
> protection in place, install dsniff and run:
>
> macof -i <ethernet device> -s <valid local IP> -d <valid remote IP> -n
> 50000000 &> /dev/null
>
> Observe on the host that ksoftirqd usage goes to near 100% on one core,
> that 'perf top' will show br_fdb_update as taking significant resources,
> and that 'brctl showmacs <bridge>' will probably hang.
>
> ** Information type changed from Private Security to Public
>
> ** Tags added: security
>
> ** Also affects: ossn
> Importance: Undecided
> Status: New
>
> ** Changed in: ossa
> Status: New => Won't Fix
>

Changed in neutron:
importance: Undecided → Critical

Fix proposed to branch: master
Review: https://review.openstack.org/520249

Changed in neutron:
assignee: nobody → Brian Haley (brian-haley)
status: New → In Progress
Brian Haley (brian-haley) wrote :

Sarah - is it possible for you to test my proposed change? It's basically doing what you said - move the rules to the nat table PREROUTING chain. Thanks.

On 11/16/2017 07:35 AM, Brian Haley wrote:
> Sarah - is it possible for you to test my proposed change? It's
> basically doing what you said - move the rules to the nat table
> PREROUTING chain. Thanks.
>

We have very similar code and deployed a similar change to our own systems, but as I said we're not bona fide openstack users. I can't easily test it.

It looks to me like the change as written may cause issues if the update is made to a system actively running VMs. I don't know if that's supposed to
be a supported use case or not.

--Sarah

Miguel Lavalle (minsel) on 2018-01-26
Changed in neutron:
milestone: none → queens-rc1

Reviewed: https://review.openstack.org/520249
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=08108c41992a13c6959b717cccfe2b929e55d2eb
Submitter: Zuul
Branch: master

commit 08108c41992a13c6959b717cccfe2b929e55d2eb
Author: Brian Haley <email address hidden>
Date: Wed Nov 15 19:24:22 2017 -0500

    Move Linuxbridge ARP spoofing to nat table PREROUTING chain

    It was found that adding ebtables rules to the filter table
    FORWARD chain could be vulnerable to a DoS attack. Moving
    to the nat table PREROUTING chain should mitigate this as
    it is consulted prior to allowing the frame in.

    In order to make this work with upgrades, had to make the code
    detect and remove any old rules that might still exist in
    the filter table. That can be removed after a cycle.

    Added some unit tests in addition to the existing functional
    tests.

    Change-Id: I87852b21db4404c58c83789cc267812030ac7d5f
    Closes-bug: #1732294

Changed in neutron:
status: In Progress → Fix Released

This issue was fixed in the openstack/neutron 12.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers