test_novnc failed with socket recv time

Bug #1695844 reported by Jianghua Wang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Jianghua Wang

Bug Description

We observed race condition with novnc test. Failed rate is about 1/10.

http://dd6b71949550285df7dc-dda4e480e005aaa13ec303551d2d8155.r49.cf1.rackcdn.com/72/463672/3/check/dsvm-tempest-neutron-network/849c338/run_tests.log

==============================
Failed 1 tests - output below:
==============================

tempest.api.compute.servers.test_novnc.NoVNCConsoleTestJSON.test_novnc[id-c640fdff-8ab4-45a4-a5d8-7e6146cbd0dc]
---------------------------------------------------------------------------------------------------------------

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "tempest/api/compute/servers/test_novnc.py", line 172, in test_novnc
        self._validate_rfb_negotiation()
      File "tempest/api/compute/servers/test_novnc.py", line 84, in _validate_rfb_negotiation
        data = self._websocket.receive_frame()
      File "tempest/common/compute.py", line 282, in receive_frame
        header = self._recv(2)
      File "tempest/common/compute.py", line 273, in _recv
        data_from_socket = self._socket.recv(recv_size)
      File "/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/fixtures/_fixtures/timeout.py", line 52, in signal_handler
        raise TimeoutException()
    fixtures._fixtures.timeout.TimeoutException

Revision history for this message
Jianghua Wang (wjh-fresh) wrote :

This issue was observed with XenServer test. But potentially it will impact other hypervisors.

I think the problem caused the code at here: https://github.com/openstack/nova/blob/master/nova/console/websocketproxy.py#L155

It tries to get the http response from the socket. But potentially the first negotiation message has arrived earlier then the negotiation message will be swallowed by the above recv(). Then the client will get stuck on waiting for the negotiation message.

Changed in nova:
assignee: nobody → Jianghua Wang (wjh-fresh)
Changed in nova:
status: New → In Progress
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

@Jianghua Wang: Do you think this is related to bug 1669468 ?

Revision history for this message
Bob Ball (bob-ball) wrote :

Almost certainly, this is the same issue, yes.

The tempest code is reading from the websocket and finding no data because the websocket has already consumed + discarded it as it arrived "early".

Revision history for this message
Jianghua Wang (wjh-fresh) wrote :

@Markus It seems libvirt doesn't have internal_access_path, so it won't run into this branch in which websocketproxy swallows the RFP message.
https://github.com/openstack/nova/blob/master/nova/console/websocketproxy.py#L147

But as Bob said maybe it's relative to the tempest problem which is to be fixed with this patchset: https://review.openstack.org/#/c/463259/

But it's also possible to be a different problem as there is a ResourceWarning complaining unclosed socket just as you posted in bug 1669468.

Anyway, it looks worthy to test by adding a sleep before this line,
https://github.com/openstack/tempest/blob/master/tempest/common/compute.py#L319
so that it will always contain the first RFP frame in the upgrade response. You can verify if that will trigger the error reported in 1669468.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/470798
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a7c054c55153c44891b8d1b14f93f0e03777fa6c
Submitter: Jenkins
Branch: master

commit a7c054c55153c44891b8d1b14f93f0e03777fa6c
Author: jianghua wang <email address hidden>
Date: Mon Jun 5 03:02:09 2017 +0100

    Fix the race condition with novnc

    The first RFP negotiation message potentially arrived
    earlier, in which case the message will be swallowed
    by the recv() which receive response for the http
    request. This commit is to ensure only the response
    is removed from the socket buffer, so it won't impact
    the following RFP negotiation.

    Change-Id: I100e140acbc2d981f7b98e12b3e9ae02844f41fd
    Closes-Bug: #1695844

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.0.0b3

This issue was fixed in the openstack/nova 16.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.