ways to speed up overhead of "lxc exec" on remote containers

Bug #1538174 reported by Martin Pitt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxd (Ubuntu)
Fix Released
Wishlist
Unassigned

Bug Description

When running autopkgtests in remote containers (necessary for moving armhf autopkgtesting into the cloud) there is quite a noticeable overhead. Running lxc exec with a no-op locally takes almost no time:

ubuntu@lxd-armhf2:~$ time lxc exec x1 whoami
root

real 0m0.076s
user 0m0.010s
sys 0m0.000s

But running this from a remote instance consistently has some 2.5 to 3.5 s overhead:

ubuntu@lxd-controller:~$ lxc launch armhf2:cloud/xenial/armhf armhf2:x1
ubuntu@lxd-controller:~$ time lxc exec --debug armhf2:x1 whoami </dev/null
DBUG[01-26|15:08:20] Posting {"command":["whoami"],"environment":{"HOME":"/root","TERM":"screen","USER":"root"},"interactive":false,"wait-for-websocket":true}
 to https://10.43.42.59:8443/1.0/containers/x1/exec
DBUG[01-26|15:08:21] Raw response: {"type":"async","status":"OK","status_code":100,"operation":"/1.0/operations/d183dcc6-5e94-43eb-9df8-35c963e8f2be","resources":null,"metadata":{"fds":{"0":"c10a5a34c84bfe86bdd4bc35c6a94349874b880ffb62a5709d01f15b97d92410","1":"fa77b9e2909863bb07247bf726223e46d8bf362284bd1cf74221e23cccc94885","2":"58337bcca5403e942a1f77f6acd899b6db9ab980ad8bfbef1cfea9470d52eae1","control":"0d55d71f9f8ebad1b8538f1c6538f86d1b973c1b368a1805af5b471baa9bc5b8"}}}

root
DBUG[01-26|15:08:23] Got error getting next reader websocket: close 1000 , &{%!s(*os.file=&{1 /dev/stdout <nil>})}
DBUG[01-26|15:08:23] Got error getting next reader websocket: close 1000 , &{%!s(*os.file=&{2 /dev/stderr <nil>})}
DBUG[01-26|15:08:23] 1.0/operations/d183dcc6-5e94-43eb-9df8-35c963e8f2be/wait
DBUG[01-26|15:08:24] Raw response: {"type":"sync","status":"Success","status_code":200,"metadata":{"created_at":"2016-01-26T15:08:21.388232Z","updated_at":"2016-01-26T15:08:23.814687Z","status":"Success","status_code":200,"resources":null,"metadata":{"return":0},"may_cancel":false}}

real 0m3.463s
user 0m0.177s
sys 0m0.021s

ubuntu@lxd-controller:~$ lxc remote list
[...]
| armhf2 | https://10.43.42.59:8443 | NO |
[...]

I suppose this is due to the overhead of https://, establishing a new connection every time etc. openssh has some "connection sharing" feature which avoids the overhead of the initial negotiation and authentication. This is rather complex, so maybe not something that we have in LXD, but are there some tweaks to speed this up? E. g. switching off SSL, or switching off authentication (I trust these boxes, and they are firewalled rather tightly).

Thanks!

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: lxd 0.27-0ubuntu2
ProcVersionSignature: Ubuntu 4.3.0-7.18-generic 4.3.3
Uname: Linux 4.3.0-7-generic x86_64
ApportVersion: 2.19.3-0ubuntu3
Architecture: amd64
CurrentDesktop: Unity
Date: Tue Jan 26 16:08:48 2016
EcryptfsInUse: Yes
SourcePackage: lxd
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Martin Pitt (pitti) wrote :
Changed in lxd (Ubuntu):
importance: Undecided → Wishlist
Revision history for this message
Tycho Andersen (tycho-s) wrote : Re: [Bug 1538174] [NEW] ways to speed up overhead of "lxc exec" on remote containers

On Tue, Jan 26, 2016 at 03:16:22PM -0000, Martin Pitt wrote:
> Public bug reported:
>
> When running autopkgtests in remote containers (necessary for moving
> armhf autopkgtesting into the cloud) there is quite a noticeable
> overhead. Running lxc exec with a no-op locally takes almost no time:
>
> ubuntu@lxd-armhf2:~$ time lxc exec x1 whoami
> root
>
> real 0m0.076s
> user 0m0.010s
> sys 0m0.000s
>
> But running this from a remote instance consistently has some 2.5 to 3.5
> s overhead:
>
> ubuntu@lxd-controller:~$ lxc launch armhf2:cloud/xenial/armhf armhf2:x1
> ubuntu@lxd-controller:~$ time lxc exec --debug armhf2:x1 whoami </dev/null
> DBUG[01-26|15:08:20] Posting {"command":["whoami"],"environment":{"HOME":"/root","TERM":"screen","USER":"root"},"interactive":false,"wait-for-websocket":true}
> to https://10.43.42.59:8443/1.0/containers/x1/exec
> DBUG[01-26|15:08:21] Raw response: {"type":"async","status":"OK","status_code":100,"operation":"/1.0/operations/d183dcc6-5e94-43eb-9df8-35c963e8f2be","resources":null,"metadata":{"fds":{"0":"c10a5a34c84bfe86bdd4bc35c6a94349874b880ffb62a5709d01f15b97d92410","1":"fa77b9e2909863bb07247bf726223e46d8bf362284bd1cf74221e23cccc94885","2":"58337bcca5403e942a1f77f6acd899b6db9ab980ad8bfbef1cfea9470d52eae1","control":"0d55d71f9f8ebad1b8538f1c6538f86d1b973c1b368a1805af5b471baa9bc5b8"}}}
>
> root
> DBUG[01-26|15:08:23] Got error getting next reader websocket: close 1000 , &{%!s(*os.file=&{1 /dev/stdout <nil>})}
> DBUG[01-26|15:08:23] Got error getting next reader websocket: close 1000 , &{%!s(*os.file=&{2 /dev/stderr <nil>})}
> DBUG[01-26|15:08:23] 1.0/operations/d183dcc6-5e94-43eb-9df8-35c963e8f2be/wait
> DBUG[01-26|15:08:24] Raw response: {"type":"sync","status":"Success","status_code":200,"metadata":{"created_at":"2016-01-26T15:08:21.388232Z","updated_at":"2016-01-26T15:08:23.814687Z","status":"Success","status_code":200,"resources":null,"metadata":{"return":0},"may_cancel":false}}
>
> real 0m3.463s
> user 0m0.177s
> sys 0m0.021s
>
> ubuntu@lxd-controller:~$ lxc remote list
> [...]
> | armhf2 | https://10.43.42.59:8443 | NO |
> [...]
>
> I suppose this is due to the overhead of https://, establishing a new
> connection every time etc. openssh has some "connection sharing" feature
> which avoids the overhead of the initial negotiation and authentication.
> This is rather complex, so maybe not something that we have in LXD, but
> are there some tweaks to speed this up? E. g. switching off SSL, or
> switching off authentication (I trust these boxes, and they are
> firewalled rather tightly).

How about forwarding with socat? untested:

socat TCP-LISTEN:8443,fork UNIX-CLIENT:/var/lib/lxd/unix.socket

and on the other host:

socat UNIX-CONNECT:/tmp/unix.socket TCP:10.0.0.1:8443 # or whatever your IP is

then,

LXD_DIR=/tmp lxc list

Revision history for this message
Martin Pitt (pitti) wrote :

Tycho, nice idea!

FTR, it's

    sudo socat UNIX-LISTEN:/var/lib/lxd/unix.socket TCP:10.43.42.59:8443

(CONNECT expects an existing socket, and that machine doesn't even have lxd installed, just -client). This does work for one operation, then socat exits. So this needs some tweaking, but looks like a promising route.

Thanks!

Revision history for this message
Tycho Andersen (tycho-s) wrote : Re: [Bug 1538174] Re: ways to speed up overhead of "lxc exec" on remote containers

On Tue, Jan 26, 2016 at 04:40:15PM -0000, Martin Pitt wrote:
> Tycho, nice idea!
>
> FTR, it's
>
> sudo socat UNIX-LISTEN:/var/lib/lxd/unix.socket TCP:10.43.42.59:8443
>
> (CONNECT expects an existing socket, and that machine doesn't even have
> lxd installed, just -client). This does work for one operation, then
> socat exits. So this needs some tweaking, but looks like a promising
> route.

Oh, right, derp. r.e. your second problem, I think socat has a "fork"
option, so it will keep listening, something like
`socat UNIX-LISTEN:/var/lib/lxd/unix.socket,fork ...`

might work.

Tycho

Revision history for this message
Martin Pitt (pitti) wrote :

Indeed it does! I just ran a complete test with this on the server:

   socat TCP-LISTEN:8443,fork UNIX-CLIENT:/var/lib/lxd/unix.socket &

and this on the client:

  socat UNIX-LISTEN:/var/lib/lxd/unix.socket,unlink-early,mode=666,fork TCP:10.43.42.59:8443 &

After two or three runs lxd locks up and needs to be restarted, but that happens with the "real" port and locally too, so that's a separate issue (bug 1531768). So I think this is working.

Should we close this bug (which so far was really a support request, thanks!), or keep it open for making this easier/more obvious in lxd? Like an "lxc config set core.remote unsafe" option?

Revision history for this message
Stéphane Graber (stgraber) wrote :

Unless we get more justified request (performance in safe environment like yours), I'd rather we don't make it easy for people to configure a completely unsafe LXD.

I'm a bit worried of people jumping on such an option as an alternative from writing code that talks to our unix socket (for local use case) because most languages make you jump through a few hoops to get http over unix socket working. The last thing I want to see is publicly exposed LXDs with an unauthenticated API!

Based on recent support requests on IRC, I've seen about 50% of our users running with LXD exposed to the network on a machine with public IPs. I don't know if they had a firewall in front of it or not, but if not, then I sure am glad that we've been pretty paranoid with our TLS requirements :)

Revision history for this message
Stéphane Graber (stgraber) wrote :

For those not aware, having access to the LXD API is basically equivalent (straightforward path) to root on the physical host, so it's something which must be very closely guarded. I have no doubt that Martin knows what he's doing and I'm happy that socat makes it reasonably simple to do what he wants, I'm just very worried about less knowledgeable users who will follow an easy recipe only somewhere and end up effectively exposing a root shell to the world.

Changed in lxd (Ubuntu):
status: New → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Sounds perfectly reasonable, so indeed, let's close this. Thanks Tycho for the nice idea!

Revision history for this message
Martin Pitt (pitti) wrote :

> having access to the LXD API is basically equivalent (straightforward path) to root on the physical host, so it's something which must be very closely guarded.

FTR, we have an incredibly (painfully) tight firewall there, and the Scalingstack instances are basically throwaway ones -- they run tons of crappy code and thus aren't very reliable (i. e. need to be rebuilt from time to time) anyway. So I think from that point it should be fine.

(ATM my main problem isn't performance yet anyway, but keeping LXD alive for more than two or three runs)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.