The LimitNPROC line in /lib/systemd/system/openvpn@.service has to be commented out in order to be able to start OpenVPN

Bug #1631104 reported by Frank Kjerstein on 2016-10-06
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
openvpn (Ubuntu)
Undecided
Unassigned

Bug Description

This issue manifests on some fresh install and update of Ubuntu 16.04.1, clean iptables and latest version of OpenVPN (installed by Nyr's installer - https://github.com/Nyr/openvpn-install). This bug will not reproduce on all installs of Ubuntu 16.04.1 or 14.04.5, but is somewhat widespread.

Only OS post-install modification on servers reproducing this bug is 1) added sudo user, 2) changed to pub key login via ssh and 3) removed Apache (sudo service apache2 stop + sudo apt purge --auto-remove apache2*). Then 4) straight to downloading and running Nyr's OpenVPN install script.

Relevant log files:
Oct 1 21:38:17 openvpnserver systemd[1]: Starting OpenVPN connection to server...
Oct 1 21:38:17 openvpnserver systemd[1]: Starting OpenVPN service...
Oct 1 21:38:17 openvpnserver systemd[1]: Started OpenVPN service.
Oct 1 21:38:17 openvpnserver ovpn-server[229]: OpenVPN 2.3.10 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [EPOLL] [PKCS11] [MH] [IPv6] built on Feb 2 2016
Oct 1 21:38:17 openvpnserver systemd[1]: Failed to start OpenVPN connection to server.

I setup OpenVPN across Ubuntu 14.04 and 16.04 via Nyr's OpenVPN install script a lot of the time. OpenVPN stopped running after OS updates and system reboots (a while ago, maybe two months). I couldn't get OpenVPN started again immediately and I had some OpenVPN backup servers on routers that covered, so I didn't get around to troubleshooting and fixing it until now.

I had this issue manifesting on a fresh Ubuntu 16.04.1 running on a Windows Azure dedicated VPS and on CrownCloud, as well as on a fresh Ubuntu 14.04.5 and 16.04.1 on some LowEndSpirit (LES) NAT servers, but not all of them. For LES servers it appears to be location dependent, as LES Sweden worked (until the location was recently shut down), while Dallas didn't.

For servers affected, commenting out the limitNPROC line in /lib/systemd/system/openvpn@.service works as a temporary workaround. There are similar user reports spread around the interwebs, but it is clearly not affecting all 16.04.1 and 14.04.5 installations.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in openvpn (Ubuntu):
status: New → Confirmed

Hi,
thank you to for your report and your help to make Ubuntu better.

I was quickly trying to set up a vpn in a container but failed.
I'd need to create a better matching two KVM multi network config to try to reproduce.

But even then I wanted to ask if this is a specific issue with the Nyr installer?
Or if you would run into the same if you would follow e.g. the basic setup guide at https://help.ubuntu.com/16.04/serverguide/openvpn.html ?

The config option you listed limitNPROC is meant to change the amount of allowed processes like "ulimit -u" would. Is the Nyr openvpn installer configuring it in a way that spawns many processes?

Eventually the reason it fails only in some environments could be that it only triggers once enough clients logged in reaching the limit.

I have given this limit some thought and checked where it comes from.
It is from upstream itself, neither Debian nor Ubuntu added it.
It is in since https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=792907 which means >=xenial.
The upstream commit says "This unit file also tries to reduce the capabilities of the running openvpn process.".

So I'd expect that this is a limit to protect from being exploited and if any given setup needs more the admin has to adapt that.

That said if any this sounds like an upstream bug to me. If this can be confirmed as an upstream bug, the best route to getting it fixed in Ubuntu in this case would be to file an upstream bug if you're able to do that. Otherwise, I'm not sure what we can do directly in Ubuntu to fix the problem.

If you do end up filing an upstream bug, please link to it from here. Thanks!

Changed in openvpn (Ubuntu):
status: Confirmed → Incomplete
Nyr (nyr-nzone) wrote :

Thanks for your reply,

This issue isn't specific to my installer. There are reports on other places about it.

By the way, the installer spawns a single process and is pretty barebones all around, I doubt it has anything do do with it.

Frank (or anyone else able to reproduce), could you please try to reproduce this in Debian?

I know the change was upstream, but I've had multiple reports from Ubuntu users and none from Debian, that's why I'm assuming this isn't an issue in Debian although I don't understand why.

Regards

linex83 (linex83) wrote :

I'm having the very same problem and a quick search reveals that there are more people out there experiencing this. My workaround was not to uncomment the line in the original file located in /lib, but to create an superseding config:

sudo systemctl edit openvpn@server

Then increasing the limit:

[Service]
LimitNPROC=100

However, it's just a workaround.

Nyr (nyr-nzone) wrote :

Confirmation that people experiencing this issue is unable to reproduce in the same environment when running Debian instead of Ubuntu:

https://github.com/Nyr/openvpn-install/issues/206#issuecomment-256945446

I still don't have a clue on what's going on with that.

Brian Morton (rokclimb15) wrote :

Had the same error, but commenting/adjusting LimitNPROC didn't fix the issue. Ultimately I had to make the container unconfined by AA and privileged to get openvpn to start.

Thanks Nyr to link the discussion you had.
I see there were later updates report they see it on Debian as well.
I'm really puzzled what that should be.

Since I couldn't confirm it before using openvpn the way I do it usually I tried with the installer - even if I agree that it likely isn't the issue but it might help triggering for a repro.

$ lxc launch ubuntu-daily:xenial xenial-test-openvpn
# there is no apache to remove in the image
$ lxc exec xenial-test-openvpn -- wget https://git.io/vpn -O openvpn-install.sh
$ lxc exec xenial-test-openvpn -- chmod +x openvpn-install.sh
$ lxc exec xenial-test-openvpn -- bash -c "~/openvpn-install.sh"
# enter on all questions
$ lxc exec xenial-test-openvpn -- service openvpn start
$ lxc exec xenial-test-openvpn -- service openvpn status
● openvpn.service - OpenVPN service
   Loaded: loaded (/lib/systemd/system/openvpn.service; enabled; vendor preset: enabled)
   Active: active (exited) since Mon 2017-03-20 15:45:55 UTC; 1s ago
  Process: 1320 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
 Main PID: 1320 (code=exited, status=0/SUCCESS)

Mar 20 15:45:55 xenial-test-openvpn systemd[1]: Starting OpenVPN service...
Mar 20 15:45:55 xenial-test-openvpn systemd[1]: Started OpenVPN service.

Working just fine.
From here we need to find what has to be different to trigger the issue.
With it seems multiple people hitting it there must be something to it, but so far I fail to see how to recreate.

I've ran OpenVPN in containers before and it worked fine for me as well
but I was only launching a few OpenVPN processes.

On 03/20/2017 11:47 AM, ChristianEhrhardt wrote:
> Working just fine.
>>From here we need to find what has to be different to trigger the issue.
> With it seems multiple people hitting it there must be something to it, but so far I fail to see how to recreate.

I am not sure but I vaguely recall that rlimits are/were not namespace
aware? Maybe one could try spawning/cloning 11 OpenVPN containers and
check that theory?

Simon

Nyr (nyr-nzone) wrote :

Just to clarify: the issue seems to happen on setups where only one process should be running.

Additionally, there are conflicting reports of the affected users being able to reproduce the issue on Debian.

Hopefully someone affected will be able to do further troubleshooting, because this is very weird looking from the outside.

Daniel F (zabullet) wrote :

I've got an Ubuntu VPS that shows this behaviour if you're interested in using it to investigate.

zb

Daniel F (zabullet) wrote :

I've investigated this further by debugging through the startup process. openvpn appears to be correct and systemd is doing exactly what it is being told to do. Forks should fail after the limit. LimitNPROC=10 seems like an arbitrary limit. Why have it at all or why 10. FWIW 30 seems to be roughly the number that allows openvpn 2.4.2 on Ubuntu 16.04 to initialise and load.

D.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.