fwaas: firewall rules not applied on L3 agents reboot in case of neutron-fwaas outage

Bug #1669482 reported by Bertrand Lallau
260
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Security Advisory
Won't Fix
Undecided
Unassigned
neutron
Won't Fix
Undecided
Unassigned

Bug Description

On L3 agent reboot (fwaas v1) or L2/L3 agents reboot (fwaas v2)
the networking stack is flushed by the LINUX system (NET namespace, iptables, ...),
hence Neutron needs to resynchronize the networking configuration.
Therefore on agents restart 'sync_routers' RPC call will be sent by agents to retrieve all the networking stacks (one by tenants).
On response they will configure: net namespace, interfaces, routing table, etc...

With "fwaas" extension enabled, iptables rules need to be apply too.
Sadly, this is not always the case due to the following bug:
https://bugs.launchpad.net/neutron/+bug/1659760

To resume the previous bug, fwaas implementation has a general RPC usage issue:
=> "fanout" is always used instead of "call".
On all CRUDs methods used on FWaaS resources v1 and v2 (Firewall, FirewallPolicy, FirewallRule, Firewallgroup, ...) an AMQP fanout cast is sent to all L3(L2) agents and all agents will respond back to neutron server.

Simple example using 40 L3 agents nodes:
Scenario: user just UPDATE a firewall rule 'name'
=> 40 RPC calls will be sent to agents (with or without routers, with or without firewall associated) and neutron server will receive back 40 responses.

This lead to a "flooded" neutron server process (or RPC workers).
=> RPC timeout will appear and the following second bug will be triggered:
https://bugs.launchpad.net/neutron/+bug/1618244

Neutron-server fwaas worker is out or order :(
If a L3(L2) agent reboot during this neutron-server "fwaas" outage,
agent will get a RPC Timeout response to get_tenants_with_firewalls, get_firewalls_for_tenant call, get_projects_with_firewall_groups and get_firewall_groups_for_project.
=> all networking stacks will be setup (namespace, interfaces, ips, routing, nat, ..), but there will be no "fwaas" iptables rules applied (ACCEPT iptables policy will be set by default).
=> all network traffic is authorized

Much worse, even if neutron-server "fwaas" worker became ready, "fwaas" iptables rules are not applied, and they will never be.
There is no exception in logs, all seems fine but iptables rules are not set.
The only solution in order to recover will be to sent updates HTTP requests on fwaas resources or restarting agents.

User is not protected by firewall.

* Step-by-step reproduction steps:
1. simulate neutron fwaas outage (on neutron-server side)
   - populate many messages in q-firewall-plugin queue
   - or unbind q-firewall-plugin queue from neutron-server
2. reboot an L3 agent with some router and firewall associated
3. => RPC Timeout appears in L3 agents logs (get_tenants_with_firewalls, get_firewalls_for_tenant)
4. networking stacks will be recreated (interfaces, ip, iptables NAT, ...)
   but without fwaas iptables rules
5. hence traffic to/from vm is allowed :(

6. neutron fwaas outage ended
   - purge messages from q-firewall-plugin queue
   - or restart neutron-server (if q-firewall-plugin queue has been unbind in step 1)
7. no more RPC Timeout appears in L3 agents logs
8. but fwaas iptables rules are not set

* Version: all neutron-fwaas versions impacted (v1 and v2)

Tags: fwaas
Revision history for this message
Bertrand Lallau (bertrand-lallau) wrote :
description: updated
Revision history for this message
Bertrand Lallau (bertrand-lallau) wrote :
Revision history for this message
Jeremy Stanley (fungi) wrote :

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

Changed in ossa:
status: New → Incomplete
description: updated
Revision history for this message
Bertrand Lallau (bertrand-lallau) wrote :

Another major security fix will be to change when fwaas iptables are applied.
Actually this order is as followed:
1. the networking stacks are created (interfaces, ip, iptables NAT, ...)
2. fwaas iptables rules are applied

This is performed in the following code:

neutron/agent/l3/agent.py

    def _process_added_router(self, router):
        self._router_added(router['id'], router)
        ri = self.router_info[router['id']]
        ri.router = router
        ri.process()
        registry.notify(resources.ROUTER, events.AFTER_CREATE, self, router=ri)
        self.l3_ext_manager.add_router(self.context, router)

For a security perspective it is mandatory to a

Revision history for this message
Bertrand Lallau (bertrand-lallau) wrote :

apply iptables before creating networking stacks.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

fwaas is an experimental service plugin for Neutron. I'm not sure how we should rate the severity for something like that.

Let me add in someone more familiar with the repo to review the proposed patch.

Revision history for this message
YAMAMOTO Takashi (yamamoto) wrote :

this problem might be severe for those using it.
while it has always been experimental, i've heard a production deployment with fwaas v1.
(i don't know which implementation they are using though.
midonet implementation doesn't have these issues.)

Changed in neutron:
status: New → Confirmed
Revision history for this message
Kevin Benton (kevinbenton) wrote :

@YAMAMOTO, can you confirm the fix is okay? It seems orthogonal to timeout handling and I'm not familiar enough with it to ensure that it catches the correct path.

Revision history for this message
YAMAMOTO Takashi (yamamoto) wrote :

i looked at the patch in comment #2 but i'm not sure if i understand it correctly.
i have a few questions.
- is it a fix for the "Much worse," part of the bug description?
- is it tested? i couldn't figure out how process_services_sync is called these days.

i guess the intention is ok, but it doesn't seem like a complete fix.
probably it's safer to block all traffic in case fwaas communication failed.
besides, as Bertrand pointed out in the other comment, the order of applying the rules seems inappropriate regardless of timeouts.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Bertrand, any progress on the patch you proposed? Assuming neutron-fwass can not be DoS remotely, it doesn't seems necessary to keep this private, I suggest we subscribe ossg to discuss an embargo exception as defined here: https://security.openstack.org/vmt-process.html#embargo-exceptions

Revision history for this message
Jeremy Stanley (fungi) wrote :

In keeping with recent OpenStack vulnerability management policy changes, no report should remain under private embargo for more than 90 days. Because this report predates the change in policy, the deadline for public disclosure is being set to 90 days from today. If the report is not resolved within the next 90 days, it will revert to our public workflow as of 2020-05-27. Please see http://lists.openstack.org/pipermail/openstack-discuss/2020-February/012721.html for further details.

description: updated
Revision history for this message
Jeremy Stanley (fungi) wrote :

It doesn't look like this report has seen any activity since my update two months ago, so consider this a friendly reminder:

The embargo for this report is due to expire one month from today, on May 27, and will be switched public on or shortly after that day if it is not already resolved sooner.

Thanks!

Jeremy Stanley (fungi)
description: updated
Revision history for this message
Jeremy Stanley (fungi) wrote :

The embargo for this report has expired and is now lifted, so it's acceptable to discuss further in public.

description: updated
information type: Private Security → Public Security
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Neutron-fwaas project is now deprecated and is not maintained anymore. Due to that I'm moving this bug to "Won't fix" now.

Changed in neutron:
status: Confirmed → Won't Fix
Jeremy Stanley (fungi)
Changed in ossa:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.