vpnaas problem:ipsec pluto not running centos 8 victoria wallaby
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
Bodo Petermann |
Bug Description
Hello.
I apologize if I don't do things right to explain the bug.
I am using Centos 8 and I install openstak with, kolla ansible. Whether it is Ussuri, Victoria or Wallaby, when establishing the connection between the 2 networks(with vpnaas), the error message is as follows:
ipsec whack --status" (no "/run/pluto/
The problem would be present with the Libreswan version 4.X which does not include the option "--use-netkey " used by the ipsec pluto command
This option was present in Libreswan 3.X.
So the command "ipsec pluto....." failed , so no "/run/pluto/
tags: | added: vpnaas |
Changed in neutron: | |
importance: | Undecided → High |
status: | New → Triaged |
Franck VEDEL (vedelf) wrote : | #1 |
Jacolex (jacolex) wrote : | #2 |
Hello
It's still not working under xena.
My workaround:
modify in neutron_l3_agent container
/var/lib/
def start_pluto(self):
cmd = ['pluto',
/var/lib/
config setup
#nat_
Ian Kumlien (pomac) wrote : | #3 |
--use-netkey is only available in older libreswan releases - newer versions doesn't support this switch.
It goes for all distros (ubuntu, debian, centos * stream etc etc)
Just remove the line.
Ian Kumlien (pomac) wrote : | #4 |
New error with ipsec.conf:3 nat_traversal - which is obsolete and shouldn't be there.
With todays problem, i recommend the following:
git diff
diff --git a/neutron_
index 90731f7a4.
--- a/neutron_
+++ b/neutron_
@@ -106,7 +106,6 @@ class LibreSwanProces
def start_pluto(self):
cmd = ['pluto',
- '--use-netkey',
if self.conf.
diff --git a/neutron_
index 450bef517.
--- a/neutron_
+++ b/neutron_
@@ -1,6 +1,5 @@
# Configuration for {{vpnservice.id}}
config setup
- nat_traversal=yes
virtual_
conn %default
keylife=60m
Ian Kumlien (pomac) wrote : | #5 |
Reference:
https:/
nat_traversal
OBSOLETE. Support for NAT Traversal is always enabled.
---
log:
023-04-17 12:47:14.358 2524 ERROR neutron_
Franck VEDEL (vedelf) wrote : Re: [Bug 1938571] vpnaas problem:ipsec pluto not running centos 8 victoria wallaby | #6 |
Hi.
Thanks a lot for this help.
I haven't tried Vpnaas for 18 months. I no longer know where I was. But I absolutely have to find a solution because it worked really well and it was really educational for my students.
Thanks again.
Franck VED
Dép. Réseaux Informatiques & Télécoms
IUT1 - Univ GRENOBLE Alpes
0476824462
Stages, Alternance, Emploi.
> Le 17 avr. 2023 à 13:34, Ian Kumlien <email address hidden> a écrit :
>
> Reference:
> https:/
>
> nat_traversal
>
> OBSOLETE. Support for NAT Traversal is always enabled.
>
> ---
>
> log:
> 023-04-17 12:47:14.358 2524 ERROR neutron_
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> vpnaas problem:ipsec pluto not running centos 8 victoria wallaby
>
> Status in neutron:
> Triaged
>
> Bug description:
> Hello.
> I apologize if I don't do things right to explain the bug.
> I am using Centos 8 and I install openstak with, kolla ansible. Whether it is Ussuri, Victoria or Wallaby, when establishing the connection between the 2 networks(with vpnaas), the error message is as follows:
> ipsec whack --status" (no "/run/pluto/
>
> The problem would be present with the Libreswan version 4.X which does not include the option "--use-netkey " used by the ipsec pluto command
> This option was present in Libreswan 3.X.
> So the command "ipsec pluto....." failed , so no "/run/pluto/
>
> To manage notifications about this bug go to:
> https:/
>
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-vpnaas (master) | #7 |
Fix proposed to branch: master
Review: https:/
Changed in neutron: | |
status: | Triaged → In Progress |
Bodo Petermann (bpetermann) wrote : | #8 |
The patch above does not try to maintain compatibility with libreswan 3.x. v4 is out for 3 years already, so I didn't try a more complicated approach to also cope with v3.
Changed in neutron: | |
assignee: | nobody → Bodo Petermann (bpetermann) |
Franck VEDEL (vedelf) wrote : Re: [Bug 1938571] vpnaas problem:ipsec pluto not running centos 8 victoria wallaby | #9 |
Thanks a lot for this.
I will try the patch as soon as possible.
Franck VEDEL
Dép. Réseaux Informatiques & Télécoms
IUT1 - Univ GRENOBLE Alpes
0476824462
Stages, Alternance, Emploi.
> Le 19 sept. 2023 à 17:46, Bodo Petermann <email address hidden> a écrit :
>
> The patch above does not try to maintain compatibility with libreswan
> 3.x. v4 is out for 3 years already, so I didn't try a more complicated
> approach to also cope with v3.
>
> ** Changed in: neutron
> Assignee: (unassigned) => Bodo Petermann (bpetermann)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> vpnaas problem:ipsec pluto not running centos 8 victoria wallaby
>
> Status in neutron:
> In Progress
>
> Bug description:
> Hello.
> I apologize if I don't do things right to explain the bug.
> I am using Centos 8 and I install openstak with, kolla ansible. Whether it is Ussuri, Victoria or Wallaby, when establishing the connection between the 2 networks(with vpnaas), the error message is as follows:
> ipsec whack --status" (no "/run/pluto/
>
> The problem would be present with the Libreswan version 4.X which does not include the option "--use-netkey " used by the ipsec pluto command
> This option was present in Libreswan 3.X.
> So the command "ipsec pluto....." failed , so no "/run/pluto/
>
> To manage notifications about this bug go to:
> https:/
>
Ian Kumlien (pomac) wrote : | #10 |
So i don't get this... the implementation starts pluto multiple times, but pluto should only be started once and then you should use the .ctl file - which explains why the /lib/run is added to all namespaces (even if it should be /run on newer versions of ipsec)
But starting pluto everytime means it fails because the .pid file is already there.
I seems to me like the whole approach needs to be reworked.
Either way, this makes it so that your patch doesn't help much with libreswam 4.12 (standard rocky linux 9)
Bodo Petermann (bpetermann) wrote : | #11 |
My understanding is that pluto is called once per VPN service, each time in its own namespace. The wrapper will call something like "ip netns exec <namespace> neutron-
And neutron-
(1) mount --bind /var/lib/
(2) mount --bind /var/lib/
(3) ipsec pluto --use-xfrm --uniqueids
This way the pluto process will not see the /etc or /run of the host anymore, but the bind-mounted directories instead. So each pluto will create its own pid file, not conflicting with other Plutos.
From outside the neutron-
Other commands like ipsec whack will also be run in such a wrapper, so they bind-mount /etc and /run in the same way, so they have access to the per-service ctl file or pid file
Ian Kumlien (pomac) wrote : | #12 |
I find this odd... should there be one big common ipsec pluto or not?
ipsec wack --shutdown kills pluto, so either the code has to be rewritten and use one common server or it has to be changed to run things in network namespaces, properly...
Bodo Petermann (bpetermann) wrote : | #13 |
The current implementation in neutron-vpnaas does use network namespaces to run the pluto in, one namespace per router and one pluto per router in that namespace. So if there are multiple routers, there will be multiple plutos, separated by namespaces. The whack commands will be executed in a wrapper that will run them in the namespace, with bind-mounted /etc and /run. This way a ipsec whack --shutdown should only shut down the one pluto in the per-router namespace.
If the deployment uses ML2/OVS the namespace will be the one created for the router and pluto will be started by the L3 agent, where vpnaas is loaded as an extension. For the to-be-released implementation for ML2/OVN there's no L3 agent and instead a stand-alone VPN agent will take the responsibility to create the namespace (again: per router id).
Ian Kumlien (pomac) wrote : | #14 |
So this is an incorrect assumption.
There is one /run, so there is one shared ctl - thus *one* pluto
ipsec whack --shutdown kills the daemon via the ctl, thus kills the *one* pluto instance
Which explains the error messages:
https:/
I have let neutron-vpnaas create it's own pluto instance and shut it down via ipsec whack --shutdown from another namespace...
Ian Kumlien (pomac) wrote : | #15 |
So i have been trough multiple iterations of changes, still testing...
Ian Kumlien (pomac) wrote (last edit ): | #16 |
So, your patch verbatim, i think, results in multiple:
[changing to paste]
Bodo Petermann (bpetermann) wrote : | #17 |
My patch doesn't really change the architecture with the namespace and bind-mount. That was there for a while before. I only tried to adapt to the changes of libreswan's run-dir and the --use-xfrm instead of --use-netkey.
I will still try to reproduce the issue. Maybe the whole idea is not working in some environments. We use strongswan and Ubuntu, but the architecture is similar (1 charon per router, 1 namespace per router, same bind-mount wrapper) and didn't encounter those issues.
What versions do you use (libreswan, operating system, neutron)?
Ian Kumlien (pomac) wrote : | #18 |
libreswan, rocky linux 9, zed - ;)
It looks like the lack of /run in the namespace (i assume it's chrooted) means that there is no /run/pluto/
I have left it running for quite a while, and two pluto processes has spawned on one machine (eventually... ) but it seems like ipsec whack can't communicate with them... :/
Basically, if a namespace is ok, you can do ipsec status -- returns 33 if not working and 0 if it's running.
024-01-16 15:48:59.970 34311 INFO neutron.
Command: ['mount', '--bind', '/var/lib/
Command: ['mount', '--bind', '/var/lib/
Command: ['ipsec', 'whack', '--status'] Exit code: 33 Stdout: Stderr: whack: is Pluto running? connect() for "/run/pluto/
; Stderr:
Ian Kumlien (pomac) wrote (last edit ): | #19 |
I actually assume something like this could fix it, the question is if the pid file will have to be handled as well though...
neutron_
@@ -60,6 +62,7 @@ class LibreSwanProces
pass
with open('%
pass
+ os.makedirs(
Bodo Petermann (bpetermann) wrote : | #20 |
I see. I was going to ask if you could check the folder /var/lib/
From the logs you posted the "mount --bind" commands didn't return error codes, so I assumed that the bind-mount worked. If it didn't because /var/lib/
The bind-mount uses {config_
os.makedirs(
I will check, if that's what was missing.
Bodo Petermann (bpetermann) wrote : | #21 |
at least {config_
So could you check the {config_
Unfortunately I cannot spend much time and don't have a rocky linux 9 / Kolla environment at hand to quickly try out if it works for me. I will still try to reproduce it, but need to ask for some patience.
Ian Kumlien (pomac) wrote : | #22 |
so, .pid and .ctl is in /run/pluto/ since a while, which is why i was tinkering with the idea of all wanting access to the same .ctl file...
Ian Kumlien (pomac) wrote : | #23 |
Also, didn't seem to help enough:
ls */run/pluto
40a419e5-
5e873513-
752aa2d2-
Unless, they have to be owned by root as well.. :/
And, I'm really thankful for you providing the information that it SHOULD all be separated, somehow we've managed to have it running without separation in the past - don't ask me how...
Ian Kumlien (pomac) wrote : | #24 |
Also, just to confirm... Many errors along the lines of:
Command: ['ipsec', 'whack', '--status'] Exit code: 33 Stdout: Stderr: whack: is Pluto running? connect() for "/run/pluto/
Command: ['ipsec', 'whack', '--status'] Exit code: 33 Stdout: Stderr: whack: is Pluto running? connect() for "/run/pluto/
Command: ['ipsec', 'whack', '--status'] Exit code: 33 Stdout: Stderr: whack: is Pluto running? connect() for "/run/pluto/
Command: ['ipsec', 'whack', '--status'] Exit code: 33 Stdout: Stderr: whack: is Pluto running? connect() for "/run/pluto/
Ian Kumlien (pomac) wrote : | #25 |
Also, to add insult to injury, the strongswan binary is named strongswan ;)
(it's in epel for rocky 9...)
Ian Kumlien (pomac) wrote : | #26 |
multiple issues with strongswan as well..
cp complains about /etc/strongswan.d missing (it's /etc/strongswan
and /etc/strongswan
Will look closer at the code, this might be easier to fix...
Ian Kumlien (pomac) wrote : | #27 |
So renaming the libreswan ipsec to something else, creating a symlink from ipsec to strongswan and applying this patch means that i now have strongswan up and running on one of the nodes.
Will go trough all of them and see if can get this part working at least
diff --git a/etc/neutron/
index 846ac2d1c.
--- a/etc/neutron/
+++ b/etc/neutron/
@@ -8,11 +8,11 @@
[Filters]
-cp: RegExpFilter, cp, root, cp, -a, .*, .*/strongswan.d
+cp: RegExpFilter, cp, root, cp, -a, .*, .*/strongswan
ip: IpFilter, ip, root
ip_exec: IpNetnsExecFilter, ip, root
ipsec: CommandFilter, ipsec, root
-rm: RegExpFilter, rm, root, rm, -rf, (.*/strongswan.
+rm: RegExpFilter, rm, root, rm, -rf, (.*/strongswan|
rm_file: RegExpFilter, rm, root, rm, -f, .*/ipsec.secrets
strongswan: CommandFilter, strongswan, root
neutron_
diff --git a/neutron_
index 708952a1f.
--- a/neutron_
+++ b/neutron_
@@ -51,7 +51,7 @@ strongswan_opts = [
- '/etc/strongswa
+ '/etc/strongswan'),
]
@@ -150,7 +150,7 @@ class StrongSwanProce
0o600)
- self._get_
+ self._get_
def get_status(self):
return self._execute(
Bodo Petermann (bpetermann) wrote : | #28 |
Well, this bug ticket here is about the centos/libreswan issue that libreswan changed its run directory and cli parameters. Strongswan-related issues should go to a different ticket.
On ubuntu the config directory of strongswan installed by the package is /etc/strongswan.d. If setting the configuration key [strongswan] default_config_area = /etc/strongswan
Ian Kumlien (pomac) wrote : | #29 |
Summary and the changes needed to get strongswan working on rocky9:
https:/
I gave up on libreswan for now...
Bodo Petermann (bpetermann) wrote : | #30 |
I cannot reproduce the issues
I tried out the neutron-vpnaas patch 895824 in a fresh all-in-one kolla-ansible setup and vpn worked as expected.
- VM with rockylinux 9.3 as the host for the openstack deployment
- installed kolla-ansible in it (all-in-one), with openstack services running in rockylinux-based containers
- openstack zed release
- ML2/OVS (neutron_
- built a custom container image for neutron_l3_agent that includes vpnaas patch 895824
- in the container: Libreswan 4.12 (package version libreswan.x86_64 4.12-1.el9)
without the patch there were errors like
Command: ['ipsec', '_stackmanager', 'start'] Exit code: 1 Stdout: Stderr: cannot load config '/etc/ipsec.conf': /etc/ipsec.conf:3: syntax error [nat_traversal]
cannot load config '/etc/ipsec.conf': /etc/ipsec.conf:3: syntax error [nat_traversal]
with the patch this error is gone and only a few errors appear in the creation phase of a new vpn service:
Command: ['ipsec', 'whack', '--status'] Exit code: 33 Stdout: Stderr: whack: Pluto is not running (no "/run/pluto/
But those stop as soon as the pluto is actually started.
In this setup the vpn service is run inside the neutron-l3-agent container.
As expected the /run/pluto directory stayed empty and the pluto.ctl and pluto.pid files appeared in
/var/lib/
Inside the container I could check ipsec status with
ipsec status --rundir /var/lib/
Ian Kumlien (pomac) wrote : | #31 |
we never actually got a vpnaas service to be active until we switched...
Our users are happier as well... =)
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron-vpnaas (master) | #32 |
Reviewed: https:/
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 55558e8f3b5a1d0
Author: Bodo Petermann <email address hidden>
Date: Tue Sep 19 15:58:56 2023 +0200
Support for libreswan 4
With libreswan 4 some command line option changed, the rundir is now
/run/pluto instead of /var/run/pluto, and nat_traversal must not be set
in ipsec.conf.
Adapt the libreswan device driver accordingly.
Users will require libreswan v4.0 or higher, compatibility with v3.x is
not maintained.
Closes-Bug: #1938571
Change-Id: Ib55e3c3f9cfbe3
Changed in neutron: | |
status: | In Progress → Fix Released |
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron-vpnaas 25.0.0.0rc1 | #33 |
This issue was fixed in the openstack/
Hello, excuse my question, I don't know the procedure.
Can we have an idea for the correction of this bug? Will these be patches to apply? How do we know about it?
How long does it take?
Will it be present in a future version?
Thanks in advance.