vpnaas problem:ipsec pluto not running centos 8 victoria wallaby

Bug #1938571 reported by Franck VEDEL
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
In Progress
High
Bodo Petermann

Bug Description

Hello.
I apologize if I don't do things right to explain the bug.
I am using Centos 8 and I install openstak with, kolla ansible. Whether it is Ussuri, Victoria or Wallaby, when establishing the connection between the 2 networks(with vpnaas), the error message is as follows:
ipsec whack --status" (no "/run/pluto/pluto.ctl")

The problem would be present with the Libreswan version 4.X which does not include the option "--use-netkey " used by the ipsec pluto command
This option was present in Libreswan 3.X.
So the command "ipsec pluto....." failed , so no "/run/pluto/pluto.ctl".

Tags: vpnaas
tags: added: vpnaas
Changed in neutron:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Franck VEDEL (vedelf) wrote :

Hello, excuse my question, I don't know the procedure.
Can we have an idea for the correction of this bug? Will these be patches to apply? How do we know about it?
How long does it take?
Will it be present in a future version?

Thanks in advance.

Revision history for this message
Jacolex (jacolex) wrote :

Hello
It's still not working under xena.
My workaround:

modify in neutron_l3_agent container
/var/lib/kolla/venv/lib/python3.6/site-packages/neutron_vpnaas/services/vpn/device_drivers/libreswan_ipsec.py
    def start_pluto(self):
        cmd = ['pluto',
               '--use-netkey', #delete it
               '--uniqueids']

/var/lib/kolla/venv/lib/python3.6/site-packages/neutron_vpnaas/services/vpn/device_drivers/template/openswan/ipsec.conf.template
config setup
    #nat_traversal=yes # hash it

Revision history for this message
Ian Kumlien (pomac) wrote :

--use-netkey is only available in older libreswan releases - newer versions doesn't support this switch.

It goes for all distros (ubuntu, debian, centos * stream etc etc)

Just remove the line.

Revision history for this message
Ian Kumlien (pomac) wrote :

New error with ipsec.conf:3 nat_traversal - which is obsolete and shouldn't be there.

With todays problem, i recommend the following:

git diff
diff --git a/neutron_vpnaas/services/vpn/device_drivers/libreswan_ipsec.py b/neutron_vpnaas/services/vpn/device_drivers/libreswan_ipsec.py
index 90731f7a4..5b5f648b2 100644
--- a/neutron_vpnaas/services/vpn/device_drivers/libreswan_ipsec.py
+++ b/neutron_vpnaas/services/vpn/device_drivers/libreswan_ipsec.py
@@ -106,7 +106,6 @@ class LibreSwanProcess(ipsec.OpenSwanProcess):

     def start_pluto(self):
         cmd = ['pluto',
- '--use-netkey',
                '--uniqueids']

         if self.conf.ipsec.enable_detailed_logging:
diff --git a/neutron_vpnaas/services/vpn/device_drivers/template/openswan/ipsec.conf.template b/neutron_vpnaas/services/vpn/device_drivers/template/openswan/ipsec.conf.template
index 450bef517..bf06cd95d 100644
--- a/neutron_vpnaas/services/vpn/device_drivers/template/openswan/ipsec.conf.template
+++ b/neutron_vpnaas/services/vpn/device_drivers/template/openswan/ipsec.conf.template
@@ -1,6 +1,5 @@
 # Configuration for {{vpnservice.id}}
 config setup
- nat_traversal=yes
     virtual_private={{virtual_privates}}
 conn %default
     keylife=60m

Revision history for this message
Ian Kumlien (pomac) wrote :

Reference:
https://manpages.debian.org/experimental/libreswan/ipsec.conf.5.en.html

nat_traversal

OBSOLETE. Support for NAT Traversal is always enabled.

---

log:
023-04-17 12:47:14.358 2524 ERROR neutron_vpnaas.services.vpn.device_drivers.ipsec cannot load config '/etc/ipsec.conf': /etc/ipsec.conf:3: syntax error, unexpected STRING [nat_traversal]

Revision history for this message
Franck VEDEL (vedelf) wrote : Re: [Bug 1938571] vpnaas problem:ipsec pluto not running centos 8 victoria wallaby

Hi.
Thanks a lot for this help.
I haven't tried Vpnaas for 18 months. I no longer know where I was. But I absolutely have to find a solution because it worked really well and it was really educational for my students.
Thanks again.

Franck VED
Dép. Réseaux Informatiques & Télécoms
IUT1 - Univ GRENOBLE Alpes
0476824462
Stages, Alternance, Emploi.

> Le 17 avr. 2023 à 13:34, Ian Kumlien <email address hidden> a écrit :
>
> Reference:
> https://manpages.debian.org/experimental/libreswan/ipsec.conf.5.en.html
>
> nat_traversal
>
> OBSOLETE. Support for NAT Traversal is always enabled.
>
> ---
>
> log:
> 023-04-17 12:47:14.358 2524 ERROR neutron_vpnaas.services.vpn.device_drivers.ipsec cannot load config '/etc/ipsec.conf': /etc/ipsec.conf:3: syntax error, unexpected STRING [nat_traversal]
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1938571
>
> Title:
> vpnaas problem:ipsec pluto not running centos 8 victoria wallaby
>
> Status in neutron:
> Triaged
>
> Bug description:
> Hello.
> I apologize if I don't do things right to explain the bug.
> I am using Centos 8 and I install openstak with, kolla ansible. Whether it is Ussuri, Victoria or Wallaby, when establishing the connection between the 2 networks(with vpnaas), the error message is as follows:
> ipsec whack --status" (no "/run/pluto/pluto.ctl")
>
> The problem would be present with the Libreswan version 4.X which does not include the option "--use-netkey " used by the ipsec pluto command
> This option was present in Libreswan 3.X.
> So the command "ipsec pluto....." failed , so no "/run/pluto/pluto.ctl".
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/neutron/+bug/1938571/+subscriptions
>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-vpnaas (master)
Changed in neutron:
status: Triaged → In Progress
Revision history for this message
Bodo Petermann (bpetermann) wrote :

The patch above does not try to maintain compatibility with libreswan 3.x. v4 is out for 3 years already, so I didn't try a more complicated approach to also cope with v3.

Changed in neutron:
assignee: nobody → Bodo Petermann (bpetermann)
Revision history for this message
Franck VEDEL (vedelf) wrote : Re: [Bug 1938571] vpnaas problem:ipsec pluto not running centos 8 victoria wallaby

Thanks a lot for this.
I will try the patch as soon as possible.

Franck VEDEL
Dép. Réseaux Informatiques & Télécoms
IUT1 - Univ GRENOBLE Alpes
0476824462
Stages, Alternance, Emploi.

> Le 19 sept. 2023 à 17:46, Bodo Petermann <email address hidden> a écrit :
>
> The patch above does not try to maintain compatibility with libreswan
> 3.x. v4 is out for 3 years already, so I didn't try a more complicated
> approach to also cope with v3.
>
> ** Changed in: neutron
> Assignee: (unassigned) => Bodo Petermann (bpetermann)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1938571
>
> Title:
> vpnaas problem:ipsec pluto not running centos 8 victoria wallaby
>
> Status in neutron:
> In Progress
>
> Bug description:
> Hello.
> I apologize if I don't do things right to explain the bug.
> I am using Centos 8 and I install openstak with, kolla ansible. Whether it is Ussuri, Victoria or Wallaby, when establishing the connection between the 2 networks(with vpnaas), the error message is as follows:
> ipsec whack --status" (no "/run/pluto/pluto.ctl")
>
> The problem would be present with the Libreswan version 4.X which does not include the option "--use-netkey " used by the ipsec pluto command
> This option was present in Libreswan 3.X.
> So the command "ipsec pluto....." failed , so no "/run/pluto/pluto.ctl".
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/neutron/+bug/1938571/+subscriptions
>

Revision history for this message
Ian Kumlien (pomac) wrote :

So i don't get this... the implementation starts pluto multiple times, but pluto should only be started once and then you should use the .ctl file - which explains why the /lib/run is added to all namespaces (even if it should be /run on newer versions of ipsec)

But starting pluto everytime means it fails because the .pid file is already there.

I seems to me like the whole approach needs to be reworked.

Either way, this makes it so that your patch doesn't help much with libreswam 4.12 (standard rocky linux 9)

Revision history for this message
Bodo Petermann (bpetermann) wrote :

My understanding is that pluto is called once per VPN service, each time in its own namespace. The wrapper will call something like "ip netns exec <namespace> neutron-vpn-netns-wrapper --mount_paths=/etc:/var/lib/neutron/xyz/ipsec/etc,/run:/var/lib/neutron/xyz/ipsec/var/run --cmd=ipsec,pluto,--use-xfrm,--uniqueids".

And neutron-vpn-netns-wrapper will call
(1) mount --bind /var/lib/neutron/xyz/ipsec/etc /etc
(2) mount --bind /var/lib/neutron/xyz/ipsec/var/run /run
(3) ipsec pluto --use-xfrm --uniqueids

This way the pluto process will not see the /etc or /run of the host anymore, but the bind-mounted directories instead. So each pluto will create its own pid file, not conflicting with other Plutos.
From outside the neutron-vpn-netns-wrapper you won't see the pid file in /run, but only in /var/lib/neutron/xyz/ipsec/var/run.

Other commands like ipsec whack will also be run in such a wrapper, so they bind-mount /etc and /run in the same way, so they have access to the per-service ctl file or pid file

Revision history for this message
Ian Kumlien (pomac) wrote :

I find this odd... should there be one big common ipsec pluto or not?

ipsec wack --shutdown kills pluto, so either the code has to be rewritten and use one common server or it has to be changed to run things in network namespaces, properly...

Revision history for this message
Bodo Petermann (bpetermann) wrote :

The current implementation in neutron-vpnaas does use network namespaces to run the pluto in, one namespace per router and one pluto per router in that namespace. So if there are multiple routers, there will be multiple plutos, separated by namespaces. The whack commands will be executed in a wrapper that will run them in the namespace, with bind-mounted /etc and /run. This way a ipsec whack --shutdown should only shut down the one pluto in the per-router namespace.

If the deployment uses ML2/OVS the namespace will be the one created for the router and pluto will be started by the L3 agent, where vpnaas is loaded as an extension. For the to-be-released implementation for ML2/OVN there's no L3 agent and instead a stand-alone VPN agent will take the responsibility to create the namespace (again: per router id).

Revision history for this message
Ian Kumlien (pomac) wrote :

So this is an incorrect assumption.

There is one /run, so there is one shared ctl - thus *one* pluto

ipsec whack --shutdown kills the daemon via the ctl, thus kills the *one* pluto instance

Which explains the error messages:
https://paste.openstack.org/show/bJQDomD1IbOgUZrUImhQ/

I have let neutron-vpnaas create it's own pluto instance and shut it down via ipsec whack --shutdown from another namespace...

Revision history for this message
Ian Kumlien (pomac) wrote :

So i have been trough multiple iterations of changes, still testing...

Revision history for this message
Ian Kumlien (pomac) wrote (last edit ):

So, your patch verbatim, i think, results in multiple:
[changing to paste]

https://paste.openstack.org/show/bsy72g60PFQizuJsJ7tm/

Revision history for this message
Bodo Petermann (bpetermann) wrote :

My patch doesn't really change the architecture with the namespace and bind-mount. That was there for a while before. I only tried to adapt to the changes of libreswan's run-dir and the --use-xfrm instead of --use-netkey.

I will still try to reproduce the issue. Maybe the whole idea is not working in some environments. We use strongswan and Ubuntu, but the architecture is similar (1 charon per router, 1 namespace per router, same bind-mount wrapper) and didn't encounter those issues.

What versions do you use (libreswan, operating system, neutron)?

Revision history for this message
Ian Kumlien (pomac) wrote :

libreswan, rocky linux 9, zed - ;)

It looks like the lack of /run in the namespace (i assume it's chrooted) means that there is no /run/pluto/pluto.ctl file - so there is no communication between the daemon and the commandline...

I have left it running for quite a while, and two pluto processes has spawned on one machine (eventually... ) but it seems like ipsec whack can't communicate with them... :/

Basically, if a namespace is ok, you can do ipsec status -- returns 33 if not working and 0 if it's running.

024-01-16 15:48:59.970 34311 INFO neutron.common.config [-] /var/lib/kolla/venv/bin/neutron-vpn-netns-wrapper version 21.2.1.dev19
Command: ['mount', '--bind', '/var/lib/neutron/ipsec/752aa2d2-1172-48ab-8f37-a45411c01fc4/etc', '/etc'] Exit code: 0 Stdout: Stderr: 2024-01-16 15:48:59.987 34311 INFO neutron_vpnaas.services.vpn.common.netns_wrapper [-] /var/lib/neutron/ipsec/752aa2d2-1172-48ab-8f37-a45411c01fc4/etc has been bind-mounted in /etc
Command: ['mount', '--bind', '/var/lib/neutron/ipsec/752aa2d2-1172-48ab-8f37-a45411c01fc4/var/run', '/run'] Exit code: 0 Stdout: Stderr: 2024-01-16 15:49:00.001 34311 INFO neutron_vpnaas.services.vpn.common.netns_wrapper [-] /var/lib/neutron/ipsec/752aa2d2-1172-48ab-8f37-a45411c01fc4/var/run has been bind-mounted in /run
Command: ['ipsec', 'whack', '--status'] Exit code: 33 Stdout: Stderr: whack: is Pluto running? connect() for "/run/pluto/pluto.ctl" failed (111 Connection refused)
; Stderr:

Revision history for this message
Ian Kumlien (pomac) wrote (last edit ):

I actually assume something like this could fix it, the question is if the pid file will have to be handled as well though...

neutron_vpnaas/services/vpn/device_drivers/libreswan_ipsec.py:
@@ -60,6 +62,7 @@ class LibreSwanProcess(ipsec.OpenSwanProcess):
             pass
         with open('%s/etc/resolv.conf' % self.config_dir, 'a'):
             pass
+ os.makedirs('%s/run/pluto' % self.config_dir, exist_ok=True)

Revision history for this message
Bodo Petermann (bpetermann) wrote :

I see. I was going to ask if you could check the folder /var/lib/neutron/ipsec/752aa2d2-1172-48ab-8f37-a45411c01fc4/var/run/pluto/. The .ctl file and .pid file should appear there and not in /run. But only if the bind-mount works.

From the logs you posted the "mount --bind" commands didn't return error codes, so I assumed that the bind-mount worked. If it didn't because /var/lib/neutron/ipsec/752aa2d2-1172-48ab-8f37-a45411c01fc4/var/run doesn't exist, it explains, why the real /run directory is used instead.

The bind-mount uses {config_dir}/var/run though, so I guess it should be

os.makedirs('%s/var/run/pluto' % self.config_dir, exists_ok=True)

I will check, if that's what was missing.

Revision history for this message
Bodo Petermann (bpetermann) wrote :

at least {config_dir}/var/run is already created: by BaseSwanProcess.ensure_config_dir. That should allow the bind-mount to work. I'm not sure if one needs to create the pluto directory manually.

So could you check the {config_dir}/var/run folder? There should be a pluto directory in it with the .pid and .ctl files.

Unfortunately I cannot spend much time and don't have a rocky linux 9 / Kolla environment at hand to quickly try out if it works for me. I will still try to reproduce it, but need to ask for some patience.

Revision history for this message
Ian Kumlien (pomac) wrote :

so, .pid and .ctl is in /run/pluto/ since a while, which is why i was tinkering with the idea of all wanting access to the same .ctl file...

Revision history for this message
Ian Kumlien (pomac) wrote :

Also, didn't seem to help enough:
ls */run/pluto
40a419e5-788a-4e13-ae29-82233f9c0c03/run/pluto:

5e873513-8099-4d60-b293-e6ba45596de4/run/pluto:

752aa2d2-1172-48ab-8f37-a45411c01fc4/run/pluto:

Unless, they have to be owned by root as well.. :/

And, I'm really thankful for you providing the information that it SHOULD all be separated, somehow we've managed to have it running without separation in the past - don't ask me how...

Revision history for this message
Ian Kumlien (pomac) wrote :

Also, just to confirm... Many errors along the lines of:
Command: ['ipsec', 'whack', '--status'] Exit code: 33 Stdout: Stderr: whack: is Pluto running? connect() for "/run/pluto/pluto.ctl" failed (111 Connection refused)
Command: ['ipsec', 'whack', '--status'] Exit code: 33 Stdout: Stderr: whack: is Pluto running? connect() for "/run/pluto/pluto.ctl" failed (111 Connection refused)
Command: ['ipsec', 'whack', '--status'] Exit code: 33 Stdout: Stderr: whack: is Pluto running? connect() for "/run/pluto/pluto.ctl" failed (111 Connection refused)
Command: ['ipsec', 'whack', '--status'] Exit code: 33 Stdout: Stderr: whack: is Pluto running? connect() for "/run/pluto/pluto.ctl" failed (111 Connection refused)

Revision history for this message
Ian Kumlien (pomac) wrote :

Also, to add insult to injury, the strongswan binary is named strongswan ;)
(it's in epel for rocky 9...)

Revision history for this message
Ian Kumlien (pomac) wrote :

multiple issues with strongswan as well..

cp complains about /etc/strongswan.d missing (it's /etc/strongswan/strongswan.d)

and /etc/strongswan/ipsec.conf missing

Will look closer at the code, this might be easier to fix...

Revision history for this message
Ian Kumlien (pomac) wrote :

So renaming the libreswan ipsec to something else, creating a symlink from ipsec to strongswan and applying this patch means that i now have strongswan up and running on one of the nodes.

Will go trough all of them and see if can get this part working at least

diff --git a/etc/neutron/rootwrap.d/vpnaas.filters b/etc/neutron/rootwrap.d/vpnaas.filters
index 846ac2d1c..dc21cc6b1 100644
--- a/etc/neutron/rootwrap.d/vpnaas.filters
+++ b/etc/neutron/rootwrap.d/vpnaas.filters
@@ -8,11 +8,11 @@

 [Filters]

-cp: RegExpFilter, cp, root, cp, -a, .*, .*/strongswan.d
+cp: RegExpFilter, cp, root, cp, -a, .*, .*/strongswan
 ip: IpFilter, ip, root
 ip_exec: IpNetnsExecFilter, ip, root
 ipsec: CommandFilter, ipsec, root
-rm: RegExpFilter, rm, root, rm, -rf, (.*/strongswan.d|.*/ipsec/[0-9a-z-]+)
+rm: RegExpFilter, rm, root, rm, -rf, (.*/strongswan|.*/ipsec/[0-9a-z-]+)
 rm_file: RegExpFilter, rm, root, rm, -f, .*/ipsec.secrets
 strongswan: CommandFilter, strongswan, root
 neutron_netns_wrapper: CommandFilter, neutron-vpn-netns-wrapper, root
diff --git a/neutron_vpnaas/services/vpn/device_drivers/strongswan_ipsec.py b/neutron_vpnaas/services/vpn/device_drivers/strongswan_ipsec.py
index 708952a1f..30cdabed5 100644
--- a/neutron_vpnaas/services/vpn/device_drivers/strongswan_ipsec.py
+++ b/neutron_vpnaas/services/vpn/device_drivers/strongswan_ipsec.py
@@ -51,7 +51,7 @@ strongswan_opts = [
         'default_config_area',
         default=os.path.join(
             TEMPLATE_PATH,
- '/etc/strongswan.d'),
+ '/etc/strongswan'),
         help=_('The area where default StrongSwan configuration '
                'files are located.'))
 ]
@@ -150,7 +150,7 @@ class StrongSwanProcess(ipsec.BaseSwanProcess):
             self.vpnservice,
             0o600)
         self.copy_and_overwrite(cfg.CONF.strongswan.default_config_area,
- self._get_config_filename('strongswan.d'))
+ self._get_config_filename('strongswan'))

     def get_status(self):
         return self._execute([self.binary, 'status'],

Revision history for this message
Bodo Petermann (bpetermann) wrote :

Well, this bug ticket here is about the centos/libreswan issue that libreswan changed its run directory and cli parameters. Strongswan-related issues should go to a different ticket.

On ubuntu the config directory of strongswan installed by the package is /etc/strongswan.d. If setting the configuration key [strongswan] default_config_area = /etc/strongswan/strongswan.d is not enough, please open a new bug.

Revision history for this message
Ian Kumlien (pomac) wrote :

Summary and the changes needed to get strongswan working on rocky9:
https://bugs.launchpad.net/neutron/+bug/2049624

I gave up on libreswan for now...

Revision history for this message
Bodo Petermann (bpetermann) wrote :

I cannot reproduce the issues

I tried out the neutron-vpnaas patch 895824 in a fresh all-in-one kolla-ansible setup and vpn worked as expected.

- VM with rockylinux 9.3 as the host for the openstack deployment
- installed kolla-ansible in it (all-in-one), with openstack services running in rockylinux-based containers
- openstack zed release
- ML2/OVS (neutron_plugin_agent: "openvswitch")
- built a custom container image for neutron_l3_agent that includes vpnaas patch 895824
- in the container: Libreswan 4.12 (package version libreswan.x86_64 4.12-1.el9)

without the patch there were errors like

Command: ['ipsec', '_stackmanager', 'start'] Exit code: 1 Stdout: Stderr: cannot load config '/etc/ipsec.conf': /etc/ipsec.conf:3: syntax error [nat_traversal]
cannot load config '/etc/ipsec.conf': /etc/ipsec.conf:3: syntax error [nat_traversal]

with the patch this error is gone and only a few errors appear in the creation phase of a new vpn service:

Command: ['ipsec', 'whack', '--status'] Exit code: 33 Stdout: Stderr: whack: Pluto is not running (no "/run/pluto/pluto.ctl")

But those stop as soon as the pluto is actually started.

In this setup the vpn service is run inside the neutron-l3-agent container.
As expected the /run/pluto directory stayed empty and the pluto.ctl and pluto.pid files appeared in
/var/lib/neutron/kolla/ipsec/f2ea4724-465d-4b8a-a465-8b5a9ed9f8f9/var/run/pluto.

Inside the container I could check ipsec status with

ipsec status --rundir /var/lib/neutron/kolla/ipsec/f2ea4724-465d-4b8a-a465-8b5a9ed9f8f9/var/run/pluto

Revision history for this message
Ian Kumlien (pomac) wrote :

we never actually got a vpnaas service to be active until we switched...

Our users are happier as well... =)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.