nova-novncproxy does not handle TCP RST cleanly when using SSL 

Bug #1816727 reported by Colleen Murphy on 2019-02-20
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
melanie witt
Rocky
Medium
Colleen Murphy
Stein
Medium
Colleen Murphy
OpenStack Security Advisory
Undecided
Unassigned

Bug Description

Description
===========

We have nova-novncproxy configured to use SSL:

```
[DEFAULT]
ssl_only=true
cert = /etc/nova/ssl/certs/signing_cert.pem
key = /etc/nova/ssl/private/signing_key.pem
...
[vnc]
enabled = True
server_listen = "0.0.0.0"
server_proxyclient_address = 192.168.237.81
novncproxy_host = 192.168.237.81
novncproxy_port = 5554
novncproxy_base_url = https://<public hostname>:6080/vnc_auto.html
xvpvncproxy_host = 192.168.237.81
```

We also have haproxy acting as a load balancer, but not terminating SSL. We have an haproxy health check configured like this for nova-novncproxy:

```
listen nova-novncproxy
    # irrelevant config...
    server <internal hostname> 192.168.237.84:5554 check check-ssl verify none inter 2000 rise 5 fall 2
```

where 192.168.237.81 is a virtual IP address and 192.168.237.84 is the node's individual IP address.

With that health check enabled, we found the nova-novncproxy process CPU spiking and eventually causing the node to hang. With debug logging enabled, we noticed this in the nova-novncproxy logs:

2019-02-19 15:02:44.148 2880 INFO nova.console.websocketproxy [-] WebSocket server settings:
2019-02-19 15:02:44.149 2880 INFO nova.console.websocketproxy [-] - Listen on 192.168.237.81:5554
2019-02-19 15:02:44.149 2880 INFO nova.console.websocketproxy [-] - Flash security policy server
2019-02-19 15:02:44.149 2880 INFO nova.console.websocketproxy [-] - Web server (no directory listings). Web root: /usr/share/novnc
2019-02-19 15:02:44.150 2880 INFO nova.console.websocketproxy [-] - SSL/TLS support
2019-02-19 15:02:44.151 2880 INFO nova.console.websocketproxy [-] - proxying from 192.168.237.81:5554 to None:None
2019-02-19 15:02:45.015 2880 DEBUG nova.console.websocketproxy [-] 192.168.237.85: new handler Process vmsg /usr/lib/python2.7/site-packages/websockify/websocket.py:873
2019-02-19 15:02:45.184 2889 DEBUG oslo_db.sqlalchemy.engines [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTI
TUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:308
2019-02-19 15:02:45.377 2889 DEBUG nova.context [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Found 2 cells: 00000000-0000-0000-0000-000000000000(cell0),9f9825dd-868f-41cc-9c8e-e544f1528d6a(cell1) load_cells /usr/lib/python2.7/site-packages/nova/context.py:479
2019-02-19 15:02:45.380 2889 DEBUG oslo_concurrency.lockutils [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Lock "00000000-0000-0000-0000-000000000000" acquired by "nova.context.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python2.7/site-packag
es/oslo_concurrency/lockutils.py:273
2019-02-19 15:02:45.382 2889 DEBUG oslo_concurrency.lockutils [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Lock "00000000-0000-0000-0000-000000000000" released by "nova.context.get_or_set_cached_cell_and_set_connections" :: held 0.002s inner /usr/lib/python2.7/site-packages
/oslo_concurrency/lockutils.py:285
2019-02-19 15:02:45.393 2889 DEBUG oslo_concurrency.lockutils [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Lock "9f9825dd-868f-41cc-9c8e-e544f1528d6a" acquired by "nova.context.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python2.7/site-packag
es/oslo_concurrency/lockutils.py:273
2019-02-19 15:02:45.395 2889 DEBUG oslo_concurrency.lockutils [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Lock "9f9825dd-868f-41cc-9c8e-e544f1528d6a" released by "nova.context.get_or_set_cached_cell_and_set_connections" :: held 0.003s inner /usr/lib/python2.7/site-packages
/oslo_concurrency/lockutils.py:285
2019-02-19 15:02:45.437 2889 DEBUG oslo_db.sqlalchemy.engines [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTI
TUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:308
2019-02-19 15:02:45.443 2889 DEBUG oslo_db.sqlalchemy.engines [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTI
TUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:308
2019-02-19 15:02:45.451 2889 INFO nova.compute.rpcapi [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Automatically selected compute RPC version 5.0 from minimum service version 35
2019-02-19 15:02:45.452 2889 INFO nova.console.websocketproxy [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] handler exception: [Errno 104] Connection reset by peer
2019-02-19 15:02:45.452 2889 DEBUG nova.console.websocketproxy [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] exception vmsg /usr/lib/python2.7/site-packages/websockify/websocket.py:873
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy Traceback (most recent call last):
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/websockify/websocket.py", line 928, in top_new_client
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy client = self.do_handshake(startsock, address)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/websockify/websocket.py", line 858, in do_handshake
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy self.RequestHandlerClass(retsock, address, self)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/nova/console/websocketproxy.py", line 311, in __init__
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy websockify.ProxyRequestHandler.__init__(self, *args, **kwargs)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/websockify/websocket.py", line 113, in __init__
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy SimpleHTTPRequestHandler.__init__(self, req, addr, server)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib64/python2.7/SocketServer.py", line 652, in __init__
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy self.handle()
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/websockify/websocket.py", line 579, in handle
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy SimpleHTTPRequestHandler.handle(self)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib64/python2.7/BaseHTTPServer.py", line 340, in handle
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy self.handle_one_request()
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib64/python2.7/BaseHTTPServer.py", line 310, in handle_one_request
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy self.raw_requestline = self.rfile.readline(65537)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib64/python2.7/socket.py", line 480, in readline
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy data = self._sock.recv(self._rbufsize)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/eventlet/green/ssl.py", line 190, in recv
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy return self._base_recv(buflen, flags, into=False)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/eventlet/green/ssl.py", line 217, in _base_recv
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy read = self.read(nbytes)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/eventlet/green/ssl.py", line 135, in read
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy super(GreenSSLSocket, self).read, *args, **kwargs)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/eventlet/green/ssl.py", line 109, in _call_trampolining
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy return func(*a, **kw)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib64/python2.7/ssl.py", line 673, in read
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy v = self._sslobj.read(len)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy error: [Errno 104] Connection reset by peer
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy
2019-02-19 15:02:47.037 2880 DEBUG nova.console.websocketproxy [-] 192.168.237.85: new handler Process vmsg /usr/lib/python2.7/site-packages/websockify/websocket.py:873

(paste: http://paste.openstack.org/show/745451/)

This sequence starting with the "new handler Process" repeats continuously. It seems that the haproxy health checks initiate an SSL connection but then immediately send a TCP RST:

http://git.haproxy.org/?p=haproxy.git;a=commit;h=fd29cc537b8511db6e256529ded625c8e7f856d0

For most services this does not seem to be an issue, but for nova-novncproxy it repeatedly initializes NovaProxyRequestHandler which creates a full nova.compute.rpcapi.ComputeAPI instance which very quickly starts to consume significant CPU and overtake the host.

Note that we tried upgrading to HEAD of websockify and eventlet which did not improve the issue.

Our workaround was to turn off check-ssl in haproxy and use a basic tcp check, but we're concerned that nova-novncproxy remains vulnerable to a DOS attack given how easy it is for haproxy to overload the service. For that reason I'm opening this initially as a security bug, though you could perhaps argue that it's no secret that making un-ratelimited requests at any service will cause high load.

Steps to reproduce
==================

1. Configure nova-novncproxy to use SSL by setting the cert= and key= parameters in [DEFAULT] and turn on debug logging.
2. You can simulate the haproxy SSL health check with this python script:

    import socket, ssl, struct, time
    host = '192.168.237.81'
    port = 5554
    while True:
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        ssl_sock = ssl.wrap_socket(sock)
        ssl_sock.connect((host, port))
        ssl_sock.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', 1, 0))
        sock.close()
        time.sleep(2)

Expected result
===============

nova-novncproxy should gracefully handle the RST and not start overutilizing CPU. It should probably hold off on initializing database connections and such until a meaningful request other than an SSL HELLO is received.

Actual result
=============

The nova-novncproxy process quickly jumps to the top of the CPU% metrics of process analyzers like top and htop and if left unattended on a server with few cores will cause the server's overall performance to be degraded.

Environment
===========

We found this on the latest of the stable/rocky branch on SLES:

# cat /etc/os-release
NAME="SLES"
VERSION="12-SP4"
VERSION_ID="12.4"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP4"

# uname -a
Linux d52-54-77-77-01-01 4.12.14-95.6-default #1 SMP Thu Jan 17 06:04:39 UTC 2019 (6af4ef8) x86_64 x86_64 x86_64 GNU/Linux

# zypper info openstack-nova

Information for package openstack-nova:
---------------------------------------
Repository : Cloud
Name : openstack-nova
Version : 18.1.1~dev47-749.1
Arch : noarch
Vendor : obs://build.suse.de/Devel:Cloud
Support Level : Level 3
Installed Size : 444.7 KiB
Installed : Yes
Status : up-to-date
Source package : openstack-nova-18.1.1~dev47-749.1.src
Summary : OpenStack Compute (Nova)

# zypper info haproxy

Information for package haproxy:
--------------------------------
Repository : Cloud
Name : haproxy
Version : 1.6.11-10.2
Arch : x86_64
Vendor : SUSE LLC <https://www.suse.com/>
Support Level : Level 3
Installed Size : 3.1 MiB
Installed : Yes
Status : up-to-date
Source package : haproxy-1.6.11-10.2.src

Jeremy Stanley (fungi) wrote :

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

description: updated
Changed in ossa:
status: New → Incomplete
Jeremy Stanley (fungi) wrote :

I agree, from the description, this seems to fall into that grey area between exploitable denial of service vulnerability and security hardening opportunity. If this turns out to be an obviously incorrect implementation in openstack/nova for which an effective fix can be safely backported to affected stable branches, and assuming it's not already at least as easy for an unauthenticated user to have a similar impact by sending other sorts of traffic to that socket, then I can see making the case for issuing an advisory. That said, I would also support not continuing this in private under embargo, if you and the Nova security reviewers feel the report isn't of an overly-sensitive nature.

melanie witt (melwitt) wrote :

AFAICT, this is behavior in websockify and I'm not yet sure whether we can avoid it.

This is our code for creating the websockify.WebSocketProxy:

https://github.com/openstack/nova/blob/be8af28a4ff111f99da9b34ee36a90a8a25d5bef/nova/cmd/baseproxy.py#L68-L80

where we pass the RequestHandlerClass=websocketproxy.NovaProxyRequestHandler to websockify.WebSocketProxy. Then, in websockify.WebsockifyServer.do_handshake, it will create an instance of the RequestHandlerClass:

https://github.com/novnc/websockify/blob/6e09ec25486f431e2acc463f994c4eb4bebc6ddd/websockify/websockifyserver.py#L538-L622

which is our websocketproxy.NovaProxyRequestHandler. The arguments to the class are the socket object and the address. Is there some way we can tell from the socket object that the message is a TCP RST? If so, maybe we could avoid creating a new ComputeAPI object in that case.

Otherwise, I'm not sure how we could do anything about this in nova vs websockify.

Jeremy Stanley (fungi) wrote :

Given this, are we likely to catch the websockify maintainers off-guard by making this issue public? If not, I'd like to switch it to public and classify it as a C2 or D class report per https://bugs.launchpad.net/bugs/1816727 .

Jeremy Stanley (fungi) wrote :

Er, sorry, I meant per https://security.openstack.org/vmt-process.html#incident-report-taxonomy (that was a rather confusing mis-paste).

melanie witt (melwitt) wrote :

I did some local testing with this using devstack and think I might have a patch that will solve this problem on the nova side. I'm going to post it here in the interim while I test it a bit more.

melanie witt (melwitt) wrote :

The last patch had a bug in it. This is the new patch that seems to work properly. I tested it locally by running the tempest NoVNCConsoleTestJSON tests on it.

Jeremy Stanley (fungi) wrote :

The patch looks simple and likely backportable, and it doesn't seem to me that this is actually exposing a bug in websockify (but it's possible I'm just not seeing the bug you see in their implementation).

This aside, can a determined attacker cause similar resource consumption by just opening a bunch of connections to the socket and not closing them? If so, this is probably a class D (security hardening opportunity) report. Either way, I'm still in favor of continuing work on this bug in public unless there are objections.

Gage Hugo (gagehugo) wrote :

++ class D if this is the case, also ++ on marking as public and getting this fixed.

That fix looks like it won't be too impacting to backport.

melanie witt (melwitt) wrote :

Jeremy, correct that this is not a bug in websockify. Apologies for not being clearer in comment 6. I should have said, "I did some local testing and discovered that this does not appear to be a bug or issue in websockify and that the fix can be handled entirely in nova."

On your second question, I'm not 100% sure. The creation of the ComputeAPI object happens during the handshake flow, so I would think even with the attached patch, one could cause similar resource consumption by just opening a bunch of connections to the socket and not closing them.

Colleen Murphy (krinkle) wrote :

I tested the patch in our environment and it fixes the issue when haproxy is hammering nova-novncproxy, thanks! I don't object to marking this as public.

Colleen Murphy (krinkle) wrote :

Well, correction - it fixes the cpu overconsumption problem, which is the thing I care about and the main security problem. It doesn't fix the tracebacks that appear in the nova-novncproxy debug logs when haproxy disconnects. I think that part is indeed a websockify bug, but not necessarily a security one.

melanie witt (melwitt) wrote :

The tracebacks are a socket.error "error: [Errno 104] Connection reset by peer" coming from websockify. I think we could probably try-except socket.error and log only the exception message, to handle that part. It would still spam the logs, but not with tracebacks.

Jeremy Stanley (fungi) wrote :

Given lack of objections and Melanie's assertion in comment #10 that this doesn't seem to be a vulnerability in websockify itself nor is the condition which is caused by a potential attacker sending TCP/RST packets immediately after an SSL handshake necessarily any more problematic than repeatedly opening SSL connections and dropping them silently without closing, I'm triaging this as a class D (security hardening opportunity) report and switching it to a normal public bug. Please feel free to continue with patch deliberations in normal public code reviews.

description: updated
Changed in ossa:
status: Incomplete → Won't Fix
tags: added: security
information type: Private Security → Public

Fix proposed to branch: master
Review: https://review.openstack.org/644998

Changed in nova:
assignee: nobody → melanie witt (melwitt)
status: New → In Progress
melanie witt (melwitt) on 2019-03-20
Changed in nova:
importance: Undecided → Medium
tags: added: console

Reviewed: https://review.openstack.org/644998
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e4fa061f17353f49615d4850b562699d55a0641b
Submitter: Zuul
Branch: master

commit e4fa061f17353f49615d4850b562699d55a0641b
Author: melanie witt <email address hidden>
Date: Wed Mar 20 19:01:33 2019 +0000

    Move create of ComputeAPI object in websocketproxy

    Currently, we create a compute.rpcapi.ComputeAPI object during
    NovaProxyRequestHandler.__init__ in order to make calls to nova-compute
    for console token authorizations (port validation). This is problematic
    in the event that we receive a TCP RST as it results in constructing a
    ComputeAPI object only to throw it away and a large number of TCP RST
    sent can cause excessive resource consumption.

    This moves the creation of the ComputeAPI object from __init__ to being
    lazily instantiated upon first use by access of a property, thus
    avoiding creation of ComputeAPI objects when we receive TCP RST
    messages.

    Closes-Bug: #1816727

    Change-Id: I3fe5540ea460fb32767b5e681295fdaf89ce17c5

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/649374
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b1b28169af9f835f5b99a4d70d1c0eff2d9fc2f4
Submitter: Zuul
Branch: stable/stein

commit b1b28169af9f835f5b99a4d70d1c0eff2d9fc2f4
Author: melanie witt <email address hidden>
Date: Wed Mar 20 19:01:33 2019 +0000

    Move create of ComputeAPI object in websocketproxy

    Currently, we create a compute.rpcapi.ComputeAPI object during
    NovaProxyRequestHandler.__init__ in order to make calls to nova-compute
    for console token authorizations (port validation). This is problematic
    in the event that we receive a TCP RST as it results in constructing a
    ComputeAPI object only to throw it away and a large number of TCP RST
    sent can cause excessive resource consumption.

    This moves the creation of the ComputeAPI object from __init__ to being
    lazily instantiated upon first use by access of a property, thus
    avoiding creation of ComputeAPI objects when we receive TCP RST
    messages.

    Closes-Bug: #1816727

    Change-Id: I3fe5540ea460fb32767b5e681295fdaf89ce17c5
    (cherry picked from commit e4fa061f17353f49615d4850b562699d55a0641b)

Reviewed: https://review.openstack.org/649375
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7766027c890dc79cc6a5697e18f719e4fd1c2b35
Submitter: Zuul
Branch: stable/rocky

commit 7766027c890dc79cc6a5697e18f719e4fd1c2b35
Author: melanie witt <email address hidden>
Date: Wed Mar 20 19:01:33 2019 +0000

    Move create of ComputeAPI object in websocketproxy

    Currently, we create a compute.rpcapi.ComputeAPI object during
    NovaProxyRequestHandler.__init__ in order to make calls to nova-compute
    for console token authorizations (port validation). This is problematic
    in the event that we receive a TCP RST as it results in constructing a
    ComputeAPI object only to throw it away and a large number of TCP RST
    sent can cause excessive resource consumption.

    This moves the creation of the ComputeAPI object from __init__ to being
    lazily instantiated upon first use by access of a property, thus
    avoiding creation of ComputeAPI objects when we receive TCP RST
    messages.

    Closes-Bug: #1816727

    Change-Id: I3fe5540ea460fb32767b5e681295fdaf89ce17c5
    (cherry picked from commit e4fa061f17353f49615d4850b562699d55a0641b)
    (cherry picked from commit b1b28169af9f835f5b99a4d70d1c0eff2d9fc2f4)

melanie witt (melwitt) on 2019-04-26
no longer affects: nova/queens

This issue was fixed in the openstack/nova 19.0.1 release.

This issue was fixed in the openstack/nova 18.2.1 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers