VPNaaS: Active VPN connection goes down after controller shutdown/start
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mirantis OpenStack |
Fix Released
|
Medium
|
Elena Ezhova | ||
7.0.x |
Won't Fix
|
Medium
|
Elena Ezhova | ||
8.0.x |
Fix Released
|
Medium
|
Elena Ezhova |
Bug Description
It has been reproduced on ISO #301, vpnaas-
Steps to reproduce:
1. Create VPN connection between tenant1 and tenant2 and check that it's active
2. Find a controller where one of the routers-
3. Shutdown this controller, wait some time and check that tenant1's router is rescheduled successfully, and VPN connection is restored
4. Start the controller which was shut downed and wait some time while it's completely booted
5. Reschedule tenant1's router back to its origin controller, which was under shutdown/start, wait some time and check that tenant1's router is rescheduled successfully, and VPN connection is restored
Actual result: tenant1's router is rescheduled, VMs can ping external hosts, but VPN connection goes to DOWN state on tenant1's side with the following error in vpn-agent.log on a controller where tenant1's router was rescheduled back in p.5:
2015-09-29 12:40:34.654 17607 ERROR neutron.
Command: ['sudo', 'neutron-rootwrap', '/etc/neutron/
'ipsec', 'pluto', '--ctlbase', '/var/lib/
n/ipsec/
ec/ce4c008f-
Exit code: 10
Stdin:
Stdout:
Stderr: adjusting ipsec.d to /var/lib/
pluto: lock file "/var/lib/
A little more detailed trace - http://
Changed in mos: | |
assignee: | nobody → MOS Neutron (mos-neutron) |
tags: |
added: area-neutron removed: neutron |
Pluto processes are running in qrouter namespace (or snat in case of DVR). When a controller is being shut down all namespaces get deleted (as they are stored in tmpfs), but pluto .pid and .ctl files remain as they are stored in /var/lib/ neutron/ ipsec/< router- id>/var/ run/pluto/ .
Then, when router is rescheduled back to the origin controller, vpn agent attempts to start pluto process and pluto fails when it finds that a .pid file already exists. Such behavior of pluto is determined by the flags that are used to open this file [1] and it is most probably a defense against accidental rewriting of .pid file .
So, as it is not a pluto bug, the solution might be to add a workaround to VPNaaS that will clean-up .ctl and .pid files on start-up.
Essentially, the same approach was used for libreswan driver [2] (this code is available in Liberty).
[1] https:/ /github. com/xelerance/ Openswan/ blob/master/ programs/ pluto/plutomain .c#L258- L259 /github. com/openstack/ neutron- vpnaas/ commit/ 00b633d284f0f21 aa380fa47a270c6 12ebef0795
[2] https:/