Attaching or detaching an interface to a router causes all VPNaaS daemons to be restarted.

Bug #1393589 reported by David Clarke
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Low
Numan Siddique

Bug Description

'sync' in services/vpn/device_drivers/ipsec.py is called any time an interface is attached or detached from a router. This occurs whether or not the edited router hosts a VPNaaS instance.

'sync' loops through the results of 'get_vpn_services_on_host' and stops/starts all IPsec daemons on the network node that hosts the router being edited, regardless of if they're on the router being edited, or even the same tenant.

An authorized user can trivially loop through the attach/detach API calls, causing the IPsec daemons for every tenant to be continuously restarted.

Tags: vpnaas
Revision history for this message
Paul Michali (pcm) wrote :

Yeah, I've noticed that currently, when a change is made, the reference implementation does a sync, which applies to ALL services on the host, and doesn't limit it to the corresponding router.

I was thinking of filing a bug myself, but wanted to check with Nachi Ueno to see if there was any impact to filtering the call based on the router affected. I sent him an email last week, but haven't heard back yet. Maybe he can chime in on this bug.

Revision history for this message
James Dempsey (jamespd) wrote :

It seems that BaseSwanProcess.update() gets called during this sync process, which calls BaseSwanProcess.enable() for every VPN that is admin_state_up. The problem is that BaseSwanProcess.enable() isn't idempotent. If a process is already running, it gets restarted. neutron/services/vpn/device_drivers/ipsec.py:243 is the line of code that issues the restart command.

Changed in neutron:
importance: Undecided → Low
tags: added: vpnaas
Changed in neutron:
status: New → Confirmed
Changed in neutron:
assignee: nobody → Numan Siddique (numansiddique)
Revision history for this message
Paul Michali (pcm) wrote :

I talked to Nachi Ueno, and he agrees that the get_vpn_services_on_host() should be updated to only obtain the services for the router being updated and not all routers on the host.

Note: This applies to both the reference and Cisco VPN implementations, IIRC.

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Numan Siddique (numansiddique) wrote :

Presently "def sync(self, context, routers)" in services/vpn/device_drivers/ipsec.py does the below
 1. updates all the vpnservices returned by get_vpn_services_on_host()
2. Delete any IPSec processes that are associated with routers, but are not running the VPN service
3. Delete any IPSec processes running VPN that do not have an associated router

For (2) and (3) it needs to know all the vpnservices running on the host.
If get_vpn_services_on_host() is updated to return the vpnservices for the router being updated, then (2) and (3) will be broken.

The solution could be to
a. update the get_vpn_services_on_host() function as suggested by you. This would require modifying the 'sync' function to update or sync only those services attached to the 'routers' being synced.
b. Don't modify get_vpn_services_on_host() function. Instead update only the vpnservices which are attached to the 'routers' . This way (2) and (3) will not be broken.

Please let me know your thoughts and suggestions on this.

Please

Revision history for this message
Paul Michali (pcm) wrote :

Good summary Numan, I hadn't looked into the details of the reference implementation for sync() in a long time.

Tough call on the options. Option (a) has a huge performance improvement, both in the amount of info RPCed for a single service change, and in the processing time. That really has me leaning toward that option.

However, option (a) is more work for sync. For the reference implementation, it means only doing (2) and (3) for the service changed, and changing get_vpn_services_on_host() to allow filtering on router. For the Cisco implementation, it means modifying the marking to only mark the one service as "dirty" and changing get_vpn_services_on_host().

Option (b) will require changing the updating (1) on the reference implementation to only handle the single service, and will have no changes for the Cisco implementation. But, both will still suffer from the same poor performance (not the focus of this bug, though).

It would be good to hear from some of the cores as to whether it is better to do (b) as a short term effort to fix this, and have a separate bug to address performance (as the current way is very poor, IMO), or to do (a) and effectively handle both issues.

Revision history for this message
James Dempsey (jamespd) wrote :

I think it is important to talk to some non-obvious behaviour of sync()...

In the reference implementation, process.update() will restart an IPsec process which is already running. The fact that sync() runs across more routers than are necessary just causes this bug to be reached more often.

Even if sync() only touched routers that have changed, the current behaviour of process.update() would still cause unnecessary process restarts. (because not all router changes require an IPsec process restart.)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/136858

Revision history for this message
Numan Siddique (numansiddique) wrote :

Paul - For the fix I have proposed, I have taken the solution (b). Please let me know your comments.

James - I agree with you. I will look into if restart of IPSec process is really required or not.

Revision history for this message
Paul Michali (pcm) wrote :

OK with (b). Wondering if a bug should be created, though, to address the performance issue with passing all the VPN services for all routers?

Revision history for this message
Sridhar Ramaswamy (srics-r) wrote :

FWIW this bug, and the solution proposed, also affects Brocade Vyatta VPN plugin which is currently in flight,

https://review.openstack.org/#/c/136693/

I agree it make sense to go with (b) for now.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-vpnaas (master)

Fix proposed to branch: master
Review: https://review.openstack.org/141625

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Numan Siddique (<email address hidden>) on branch: master
Review: https://review.openstack.org/136858
Reason: Abandoning as resubmitted it on neutron_vpnaas repo

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron-vpnaas (master)

Reviewed: https://review.openstack.org/141625
Committed: https://git.openstack.org/cgit/openstack/neutron-vpnaas/commit/?id=ae02852296f19c9e960f48ac08adc64da5ba26e1
Submitter: Jenkins
Branch: master

commit ae02852296f19c9e960f48ac08adc64da5ba26e1
Author: Numan Siddique <email address hidden>
Date: Sun Dec 14 12:31:50 2014 +0530

    Do not restart vpn processes for every router update

    Presently all the vpn processes are restarted when any
    router is updated, even if the updated router do not
    host any vpn services. 'device_drivers.ipsec.sync()'
    updates all the vpn services which results in stopping
    and then starting of the vpn processes.
    This patch addresses the issue by
    1. Passing router id information to the vpnservice_updated
    2. (Re)starting the vpn process only if the
          - the vpn service is newly added or
          - the agent has been restarted before the sync or
          - the sync has the router id of the vpn service

    Change-Id: I12825bc071033109bdcb93c89daa32012a8babb6
    Closes-bug: #1393589

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → kilo-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.