The LimitNPROC line in /lib/systemd/system/openvpn@.service has to be commented out in order to be able to start OpenVPN

Bug #1631104 reported by Frank Kjerstein
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
openvpn (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

This issue manifests on some fresh install and update of Ubuntu 16.04.1, clean iptables and latest version of OpenVPN (installed by Nyr's installer - https://github.com/Nyr/openvpn-install). This bug will not reproduce on all installs of Ubuntu 16.04.1 or 14.04.5, but is somewhat widespread.

Only OS post-install modification on servers reproducing this bug is 1) added sudo user, 2) changed to pub key login via ssh and 3) removed Apache (sudo service apache2 stop + sudo apt purge --auto-remove apache2*). Then 4) straight to downloading and running Nyr's OpenVPN install script.

Relevant log files:
Oct 1 21:38:17 openvpnserver systemd[1]: Starting OpenVPN connection to server...
Oct 1 21:38:17 openvpnserver systemd[1]: Starting OpenVPN service...
Oct 1 21:38:17 openvpnserver systemd[1]: Started OpenVPN service.
Oct 1 21:38:17 openvpnserver ovpn-server[229]: OpenVPN 2.3.10 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [EPOLL] [PKCS11] [MH] [IPv6] built on Feb 2 2016
Oct 1 21:38:17 openvpnserver systemd[1]: Failed to start OpenVPN connection to server.

I setup OpenVPN across Ubuntu 14.04 and 16.04 via Nyr's OpenVPN install script a lot of the time. OpenVPN stopped running after OS updates and system reboots (a while ago, maybe two months). I couldn't get OpenVPN started again immediately and I had some OpenVPN backup servers on routers that covered, so I didn't get around to troubleshooting and fixing it until now.

I had this issue manifesting on a fresh Ubuntu 16.04.1 running on a Windows Azure dedicated VPS and on CrownCloud, as well as on a fresh Ubuntu 14.04.5 and 16.04.1 on some LowEndSpirit (LES) NAT servers, but not all of them. For LES servers it appears to be location dependent, as LES Sweden worked (until the location was recently shut down), while Dallas didn't.

For servers affected, commenting out the limitNPROC line in /lib/systemd/system/openvpn@.service works as a temporary workaround. There are similar user reports spread around the interwebs, but it is clearly not affecting all 16.04.1 and 14.04.5 installations.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in openvpn (Ubuntu):
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
thank you to for your report and your help to make Ubuntu better.

I was quickly trying to set up a vpn in a container but failed.
I'd need to create a better matching two KVM multi network config to try to reproduce.

But even then I wanted to ask if this is a specific issue with the Nyr installer?
Or if you would run into the same if you would follow e.g. the basic setup guide at https://help.ubuntu.com/16.04/serverguide/openvpn.html ?

The config option you listed limitNPROC is meant to change the amount of allowed processes like "ulimit -u" would. Is the Nyr openvpn installer configuring it in a way that spawns many processes?

Eventually the reason it fails only in some environments could be that it only triggers once enough clients logged in reaching the limit.

I have given this limit some thought and checked where it comes from.
It is from upstream itself, neither Debian nor Ubuntu added it.
It is in since https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=792907 which means >=xenial.
The upstream commit says "This unit file also tries to reduce the capabilities of the running openvpn process.".

So I'd expect that this is a limit to protect from being exploited and if any given setup needs more the admin has to adapt that.

That said if any this sounds like an upstream bug to me. If this can be confirmed as an upstream bug, the best route to getting it fixed in Ubuntu in this case would be to file an upstream bug if you're able to do that. Otherwise, I'm not sure what we can do directly in Ubuntu to fix the problem.

If you do end up filing an upstream bug, please link to it from here. Thanks!

Changed in openvpn (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Nyr (nyr7) wrote :

Thanks for your reply,

This issue isn't specific to my installer. There are reports on other places about it.

By the way, the installer spawns a single process and is pretty barebones all around, I doubt it has anything do do with it.

Frank (or anyone else able to reproduce), could you please try to reproduce this in Debian?

I know the change was upstream, but I've had multiple reports from Ubuntu users and none from Debian, that's why I'm assuming this isn't an issue in Debian although I don't understand why.

Regards

Revision history for this message
linex83 (linex83) wrote :

I'm having the very same problem and a quick search reveals that there are more people out there experiencing this. My workaround was not to uncomment the line in the original file located in /lib, but to create an superseding config:

sudo systemctl edit openvpn@server

Then increasing the limit:

[Service]
LimitNPROC=100

However, it's just a workaround.

Revision history for this message
Nyr (nyr7) wrote :

Confirmation that people experiencing this issue is unable to reproduce in the same environment when running Debian instead of Ubuntu:

https://github.com/Nyr/openvpn-install/issues/206#issuecomment-256945446

I still don't have a clue on what's going on with that.

Revision history for this message
Brian Morton (rokclimb15) wrote :

Had the same error, but commenting/adjusting LimitNPROC didn't fix the issue. Ultimately I had to make the container unconfined by AA and privileged to get openvpn to start.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Nyr to link the discussion you had.
I see there were later updates report they see it on Debian as well.
I'm really puzzled what that should be.

Since I couldn't confirm it before using openvpn the way I do it usually I tried with the installer - even if I agree that it likely isn't the issue but it might help triggering for a repro.

$ lxc launch ubuntu-daily:xenial xenial-test-openvpn
# there is no apache to remove in the image
$ lxc exec xenial-test-openvpn -- wget https://git.io/vpn -O openvpn-install.sh
$ lxc exec xenial-test-openvpn -- chmod +x openvpn-install.sh
$ lxc exec xenial-test-openvpn -- bash -c "~/openvpn-install.sh"
# enter on all questions
$ lxc exec xenial-test-openvpn -- service openvpn start
$ lxc exec xenial-test-openvpn -- service openvpn status
● openvpn.service - OpenVPN service
   Loaded: loaded (/lib/systemd/system/openvpn.service; enabled; vendor preset: enabled)
   Active: active (exited) since Mon 2017-03-20 15:45:55 UTC; 1s ago
  Process: 1320 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
 Main PID: 1320 (code=exited, status=0/SUCCESS)

Mar 20 15:45:55 xenial-test-openvpn systemd[1]: Starting OpenVPN service...
Mar 20 15:45:55 xenial-test-openvpn systemd[1]: Started OpenVPN service.

Working just fine.
From here we need to find what has to be different to trigger the issue.
With it seems multiple people hitting it there must be something to it, but so far I fail to see how to recreate.

Revision history for this message
Simon Déziel (sdeziel) wrote : Re: [Bug 1631104] Re: The LimitNPROC line in /lib/systemd/system/openvpn@.service has to be commented out in order to be able to start OpenVPN

I've ran OpenVPN in containers before and it worked fine for me as well
but I was only launching a few OpenVPN processes.

On 03/20/2017 11:47 AM, ChristianEhrhardt wrote:
> Working just fine.
>>From here we need to find what has to be different to trigger the issue.
> With it seems multiple people hitting it there must be something to it, but so far I fail to see how to recreate.

I am not sure but I vaguely recall that rlimits are/were not namespace
aware? Maybe one could try spawning/cloning 11 OpenVPN containers and
check that theory?

Simon

Revision history for this message
Nyr (nyr7) wrote :

Just to clarify: the issue seems to happen on setups where only one process should be running.

Additionally, there are conflicting reports of the affected users being able to reproduce the issue on Debian.

Hopefully someone affected will be able to do further troubleshooting, because this is very weird looking from the outside.

Revision history for this message
Daniel F (zabullet) wrote :

I've got an Ubuntu VPS that shows this behaviour if you're interested in using it to investigate.

zb

Revision history for this message
Daniel F (zabullet) wrote :

I've investigated this further by debugging through the startup process. openvpn appears to be correct and systemd is doing exactly what it is being told to do. Forks should fail after the limit. LimitNPROC=10 seems like an arbitrary limit. Why have it at all or why 10. FWIW 30 seems to be roughly the number that allows openvpn 2.4.2 on Ubuntu 16.04 to initialise and load.

D.

Revision history for this message
Diko Parvanov (dparv) wrote :

On bionic it still exists with 10 and causes issues when starting in lxd container:

daemon() failed or unsupported: Resource temporarily unavailable (errno=11)

On focal the parameter is set to 100, but still causes this issue.

Changed in openvpn (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Diko Parvanov (dparv) wrote :

Reviewing https://github.com/Nyr/openvpn-install/blob/master/openvpn-install.sh#L215 should disable LimitNPROC in containers, but doesn't seem to work.

Revision history for this message
Paride Legovini (paride) wrote :

Hi,

I quickly tried to launch a Bionic LXD container and setup openvpn using the Nyr installer. I couldn't reproduce the error, but it's not clear to me if it's only triggered after a number of clients (try to) connect to the server, or if the failure happens as soon as the service is started.

To begin investigating this issue we first of all need a reproducer, possibly a minimal set of steps beginning with a `lxd launch` command and ending with the command or file showing the error message.

I'm setting this report to Incomplete for the moment as steps to reproduce or further information are needed to move it forward. Please change it back to New after commenting back and we'll look at it again. Thanks!

Changed in openvpn (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Nyr (nyr7) wrote :

> using the Nyr installer. I couldn't reproduce the error

I do not think that you will be able to reproduce it using the latest version of my script as I worked around it by setting "LimitNPROC=infinity" if installing inside a container.

> To begin investigating this issue we first of all need a reproducer

I have not reproduced it, but have a pretty good idea of why it happens. It is not a problem with containers, it is a problem with shared environments (OpenVZ VPS mainly) where lots of OpenVPN processes are already running, so it has to do with those environments not being fully isolated and LimitNPROC hitting the limit somehow.

The only possible fix would be to disable LimitNPROC or set it to a very high limit, as far as I know. Anyway this should probably be addressed upstream.

Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

While working on https://bugs.launchpad.net/ubuntu/+source/openvpn/+bug/1934781, I did face a similar issue.

This was triggered when trying to start openvpn with the openvpn-server@.service unit file in a lxc container fetched from the daily-ubuntu:${series} images.

The error observed was slightly different though:

  openvpn_execve: unable to fork: Resource temporarily unavailable (errno=11)

The error was observed in bionic, focal, and groovy. The error was __not__ observed in hirsute and impish.

The following reports may be relevant here:

https://github.com/systemd/systemd/issues/6011
https://lists.linuxcontainers.org/pipermail/lxc-users/2018-June/014329.html

To verify the issue was indeed not related to a systemd change, I built the openvpn package version available in hirsute for groovy and installed it in my grovy lxc container. This time, I could __not__ observe the issue.

I am attaching a script slightly based on the openvpn autopkgtest suite that can be run from withing a lxc container, which servers as a reproducer for the issue. The script works for anything newer than focal, but would need changes if were to be run in bionic due to recent changes in easy-rsa.

Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

A git bisect shows that upstream's aec4a3d [1] fixes the issue I am experiencing here. While the issue seems to be fixed by removing the calls to openvpn_execve_check, this is done by using a new networking API, which is possibly not a good fit for an SRU, if the original error was only seen in an unprivileged container as it was my case.

Finally, note that this is fixed in hirsute and in impish.

P.S., I am re-uploading the reproducer with s minor fix to it.

[1] https://github.com/OpenVPN/openvpn/commit/aec4a3d1b6a9e4d9e584b368126da061c15b174b
?field.comment=A git bisect shows that upstream's aec4a3d [1] fixes the issue I am experiencing here. While the issue seems to be fixed by removing the calls to openvpn_execve_check, this is done by using a new networking API, which is possibly not a good fit for an SRU, if the original error was only seen in an unprivileged container as it was my case.

Finally, note that this is fixed in hirsute and in impish.

P.S., I am re-uploading the reproducer with s minor fix to it.

[1] https://github.com/OpenVPN/openvpn/commit/aec4a3d1b6a9e4d9e584b368126da061c15b174b

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.