Fragile Test: glance.tests.functional.v1.test_api:TestApi. test_unsupported_default_store

Bug #1047593 reported by Brian Waldon
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Glance
Fix Released
High
John Bresnahan

Bug Description

As seen in http://logs.openstack.org/12566/1/gate/gate-glance-python26/1783/console.html:

16:58:21 ======================================================================
16:58:21 FAIL: We test that a mis-configured default_store causes the API server
16:58:21 ----------------------------------------------------------------------
16:58:21 Traceback (most recent call last):
16:58:21 File "/home/jenkins/workspace/gate-glance-python26/glance/tests/utils.py", line 178, in wrapped
16:58:21 func(*a, **kwargs)
16:58:21 File "/home/jenkins/workspace/gate-glance-python26/glance/tests/functional/v1/test_api.py", line 1230, in test_unsupported_default_store
16:58:21 **self.__dict__.copy())
16:58:21 File "/home/jenkins/workspace/gate-glance-python26/glance/tests/functional/__init__.py", line 564, in start_server
16:58:21 self.assertTrue(launch_msg is None, launch_msg)
16:58:21 AssertionError: Unexpected server launch status
16:58:21 """Fail the test unless the expression is true."""
16:58:21 >> if not False: raise self.failureException, 'Unexpected server launch status'

Tags: fragile-test
Revision history for this message
Brian Waldon (bcwaldon) wrote :
Changed in glance:
importance: Low → Medium
Revision history for this message
Brian Waldon (bcwaldon) wrote :
Changed in glance:
importance: Medium → High
Revision history for this message
John Bresnahan (jbresnah) wrote :

Is it possible that something else grabbed the same port after get_unused_port() was closed? From the look of the code this seems to be the best possibility.

Revision history for this message
John Bresnahan (jbresnah) wrote :

I see two ways to solve the 'get unused port' race.

1) We delay the close time of the socket as long as possible (meaning close it right before we exec the program that will use it). The calls to get the ports are here:

 https://github.com/openstack/glance/blob/master/glance/tests/functional/__init__.py#L470.

but they are not used until start is called. Further, two calls are right next to each other. On most systems the ephemeral port selection is random and thus a collision is unlikely. However we may as well eliminate that chance

2) Never close the socket. Set an ENV to the FD of that socket before forking the server off, then in the presence of that env create a socket from that FD. This solution should entirely eliminate the problem.

Any thoughts?

Changed in glance:
assignee: nobody → John Bresnahan (jbresnah)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (master)

Fix proposed to branch: master
Review: https://review.openstack.org/24866

Changed in glance:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to glance (master)

Reviewed: https://review.openstack.org/24866
Committed: http://github.com/openstack/glance/commit/6335fdb1d3efbbb6c3add7a2c31e279fc6e40ff3
Submitter: Jenkins
Branch: master

commit 6335fdb1d3efbbb6c3add7a2c31e279fc6e40ff3
Author: John Bresnahan <email address hidden>
Date: Tue Mar 19 14:27:09 2013 -1000

    Eliminate the race when selecting a port for tests.

    Functional tests currently select ports on which the various services
    will listen by opening a socket in the ephemeral port range, checking
    its port, closing the socket, and then sometime later starting the
    service and instructing it to use that port. There are a few bugs that
    reference fragile tests. In many of these case it is possible that
    another processes claimed the selected port in between the time it was
    first opened and closed and the time that the service actually used it.
    Because this is a known possibility, and because the tests fail
    infrequently resolving these bug becomes difficult.

    This patch eliminates the window in which another processes could claim
    the port. The socket is opened in the test code but not closed. Then
    the service process (api or registry) is forked off where it inherits
    the open file descriptor. A environment variable is used to tell the
    new wsgi process that it is a test process and should thus get its
    socket from the FD instead of creating a new socket.

    In the case where glance-control is used with the --respawn option this
    solution is not possible. The problem is that the FD is only good once.
    I can pass from the test code, to glance-control, and to glance-{api,
    registry} but only 1 time. If the service dies, and glance-control
    restarts it the FD will no longer be valid. For this case the forked FD
    code is disabled. However, the socket is still closed much later making
    the race condition even less likely.

    This fixes bug: 1047593
    It may fix some other "fragile test" bugs as well.

    Change-Id: I27313d144bc7bd2132a604dcc22916c80338abab

Changed in glance:
status: In Progress → Fix Committed
Changed in glance:
status: Fix Committed → Triaged
Changed in glance:
status: Triaged → Fix Committed
Thierry Carrez (ttx)
Changed in glance:
milestone: none → havana-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in glance:
milestone: havana-1 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.