VPNaaS: Active VPN connection goes down after controller shutdown/start

Bug #1500876 reported by Anna Babich
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
Medium
Elena Ezhova
7.0.x
Won't Fix
Medium
Elena Ezhova
8.0.x
Fix Released
Medium
Elena Ezhova

Bug Description

It has been reproduced on ISO #301, vpnaas-plugin-1.2-1.2.0-1.noarch.rpm, both vlan and vxlan clusters (3 controllers, 2 computes)
Steps to reproduce:
1. Create VPN connection between tenant1 and tenant2 and check that it's active
2. Find a controller where one of the routers-participants of VPN connection is scheduled (tenant1's router, for example)
3. Shutdown this controller, wait some time and check that tenant1's router is rescheduled successfully, and VPN connection is restored
4. Start the controller which was shut downed and wait some time while it's completely booted
5. Reschedule tenant1's router back to its origin controller, which was under shutdown/start, wait some time and check that tenant1's router is rescheduled successfully, and VPN connection is restored

Actual result: tenant1's router is rescheduled, VMs can ping external hosts, but VPN connection goes to DOWN state on tenant1's side with the following error in vpn-agent.log on a controller where tenant1's router was rescheduled back in p.5:

2015-09-29 12:40:34.654 17607 ERROR neutron.agent.linux.utils [req-10b2197d-2325-4305-b976-8f63e881f749 ]
Command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-ce4c008f-4725-4c5c-ba92-3bf6beb06347',
'ipsec', 'pluto', '--ctlbase', '/var/lib/neutron/ipsec/ce4c008f-4725-4c5c-ba92-3bf6beb06347/var/run/pluto/', '--ipsecdir', '/var/lib/neutro
n/ipsec/ce4c008f-4725-4c5c-ba92-3bf6beb06347/etc', '--use-netkey', '--uniqueids', '--nat_traversal', '--secretsfile', '/var/lib/neutron/ips
ec/ce4c008f-4725-4c5c-ba92-3bf6beb06347/etc/ipsec.secrets', '--virtual_private', '%v4:192.168.1.0/24,%v4:172.16.1.0/24']
Exit code: 10
Stdin:
Stdout:
Stderr: adjusting ipsec.d to /var/lib/neutron/ipsec/ce4c008f-4725-4c5c-ba92-3bf6beb06347/etc
pluto: lock file "/var/lib/neutron/ipsec/ce4c008f-4725-4c5c-ba92-3bf6beb06347/var/run/pluto/.pid" already exists

A little more detailed trace - http://paste.openstack.org/show/474676/

Tags: area-neutron
Revision history for this message
Anna Babich (ababich) wrote :
Changed in mos:
assignee: nobody → MOS Neutron (mos-neutron)
Revision history for this message
Elena Ezhova (eezhova) wrote :

Pluto processes are running in qrouter namespace (or snat in case of DVR). When a controller is being shut down all namespaces get deleted (as they are stored in tmpfs), but pluto .pid and .ctl files remain as they are stored in /var/lib/neutron/ipsec/<router-id>/var/run/pluto/.

Then, when router is rescheduled back to the origin controller, vpn agent attempts to start pluto process and pluto fails when it finds that a .pid file already exists. Such behavior of pluto is determined by the flags that are used to open this file [1] and it is most probably a defense against accidental rewriting of .pid file .

So, as it is not a pluto bug, the solution might be to add a workaround to VPNaaS that will clean-up .ctl and .pid files on start-up.
Essentially, the same approach was used for libreswan driver [2] (this code is available in Liberty).

[1] https://github.com/xelerance/Openswan/blob/master/programs/pluto/plutomain.c#L258-L259
[2] https://github.com/openstack/neutron-vpnaas/commit/00b633d284f0f21aa380fa47a270c612ebef0795

Revision history for this message
Alexander Ignatov (aignatov) wrote :

Suggest to wait for backport to 7.0 until it's fixed in master/8.0

Revision history for this message
Elena Ezhova (eezhova) wrote :
Revision history for this message
Elena Ezhova (eezhova) wrote :
Revision history for this message
Alexander Ignatov (aignatov) wrote :

The VPNaaS has the lowest priority in MOS, so won't fix in 7.0-updates

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/neutron-vpnaas (openstack-ci/fuel-8.0/liberty)

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Elena Ezhova <email address hidden>
Review: https://review.fuel-infra.org/13881

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/neutron-vpnaas (openstack-ci/fuel-8.0/liberty)

Reviewed: https://review.fuel-infra.org/13881
Submitter: Pkgs Jenkins <email address hidden>
Branch: openstack-ci/fuel-8.0/liberty

Commit: a022654b4d2a59e1aa55f4712f299b6ff4dfa28f
Author: Elena Ezhova <email address hidden>
Date: Mon Nov 16 12:42:29 2015

Cleanup .ctl/.pid files for both OpenSwan and LibreSwan

Change I5c215d70c348524979b740f882029f74e400e6d7 introduced cleanup
of pluto ctl/pid files on starting and restarting of pluto daemon
for LibreSwan driver. But the problem with managing these files is
also common for the OpenSwan driver: pluto daemon fails to start if
a pid file it tries to create already exists (see bug report for
details).

This change moves the cleaup functionality to the OpenSwanProcess so
that is will be used by both OpenSwan and LibreSwan drivers.
Also fixed a typo in _cleanup_control_files where it was attempted to
remove pluto.ctl.ctl file instead of pluto.ctl

Changed the name of 'libreswan' configuration section to 'pluto'.

DocImpact

Conflicts:
 neutron_vpnaas/services/vpn/device_drivers/libreswan_ipsec.py

Cherry-picked from https://review.openstack.org/#/c/235817
Closes-Bug: #1500876
Change-Id: I717e8fcc1add35b7099c977235e4eff5da9e093b

tags: added: area-neutron
removed: neutron
Revision history for this message
Kristina Berezovskaia (kkuznetsova) wrote :

Can't verify this bug on 8-0 env because of impossibility to deploy env with VPNaaS plugin before release. Work for compatibility mos 8.0 and VPNaaS will start after release 8.0. So verified only that code was merged

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.