gate failure with tempest.api.image.v1.test_images.CreateRegisterImagesTest.test_register_http_image

Bug #1347692 reported by Chris Dent
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Glance
Confirmed
Undecided
Unassigned

Bug Description

In review https://review.openstack.org/#/c/108410/ I get a failure with check-tempest-dsvm-postgres-full.

(note this is not a duplicate of https://bugs.launchpad.net/glance/+bug/1247194 )

Presumably this is a timeout, or something getting ahead of something else. Reporting to be a good boy.

Logs and links to logs below:

2014-07-23 11:02:18.128 | ==============================
2014-07-23 11:02:18.128 | Failed 1 tests - output below:
2014-07-23 11:02:18.128 | ==============================
2014-07-23 11:02:18.128 |
2014-07-23 11:02:18.128 | tempest.api.image.v1.test_images.CreateRegisterImagesTest.test_register_http_image[gate]
2014-07-23 11:02:18.128 | ----------------------------------------------------------------------------------------
2014-07-23 11:02:18.128 |
2014-07-23 11:02:18.128 | Captured traceback:
2014-07-23 11:02:18.128 | ~~~~~~~~~~~~~~~~~~~
2014-07-23 11:02:18.128 | Traceback (most recent call last):
2014-07-23 11:02:18.128 | File "tempest/api/image/v1/test_images.py", line 79, in test_register_http_image
2014-07-23 11:02:18.128 | self.client.wait_for_image_status(image_id, 'active')
2014-07-23 11:02:18.129 | File "tempest/services/image/v1/json/image_client.py", line 295, in wait_for_image_status
2014-07-23 11:02:18.129 | status=status)
2014-07-23 11:02:18.129 | ImageKilledException: Image 0d3e0e66-fbec-4bab-a070-f27f36f843e0 'killed' while waiting for 'active'
2014-07-23 11:02:18.129 |
2014-07-23 11:02:18.129 |
2014-07-23 11:02:18.129 | Captured pythonlogging:
2014-07-23 11:02:18.129 | ~~~~~~~~~~~~~~~~~~~~~~~
2014-07-23 11:02:18.129 | 2014-07-23 10:56:39,064 27125 INFO [tempest.common.rest_client] Request (CreateRegisterImagesTest:test_register_http_image): 200 POST http://127.0.0.1:5000/v2.0/tokens
2014-07-23 11:02:18.129 | 2014-07-23 10:56:39,291 27125 INFO [tempest.common.rest_client] Request (CreateRegisterImagesTest:test_register_http_image): 201 POST http://127.0.0.1:9292/v1/images 0.226s
2014-07-23 11:02:18.129 | 2014-07-23 10:56:39,397 27125 INFO [tempest.common.rest_client] Request (CreateRegisterImagesTest:test_register_http_image): 200 HEAD http://127.0.0.1:9292/v1/images/0d3e0e66-fbec-4bab-a070-f27f36f843e0 0.105s
2014-07-23 11:02:18.129 | 2014-07-23 10:56:41,595 27125 INFO [tempest.common.rest_client] Request (CreateRegisterImagesTest:test_register_http_image): 200 HEAD http://127.0.0.1:9292/v1/images/0d3e0e66-fbec-4bab-a070-f27f36f843e0 0.195s
2014-07-23 11:02:18.129 | 2014-07-23 10:56:42,597 27125 INFO [tempest.services.image.v1.json.image_client] Value transition from "queued" to "killed"in 2 second(s).

The glance-api logs from around that time are not particularly helpful:

http://logs.openstack.org/10/108410/2/check/check-tempest-dsvm-postgres-full/4496d7e/logs/screen-g-api.txt.gz#_2014-07-23_10_56_40_704

http://logs.openstack.org/10/108410/2/check/check-tempest-dsvm-postgres-full/4496d7e/logs/screen-g-reg.txt.gz#_2014-07-23_10_56_40_797

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

Here we go:

http://logs.openstack.org/10/108410/2/check/check-tempest-dsvm-postgres-full/4496d7e/logs/screen-g-api.txt.gz#_2014-07-23_10_56_40_831

DEBUG glance.api.v1.images [-] Copy from external source failed: HTTP URL returned a 500 status code. _upload /opt/stack/new/glance/glance/api/v1/images.py:583

wtf is that a debug log level issue?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to glance (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/109115

Matt Riedemann (mriedem)
Changed in glance:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to glance (master)

Reviewed: https://review.openstack.org/109115
Committed: https://git.openstack.org/cgit/openstack/glance/commit/?id=ff3161a506efba42f09fa403cf875a5a4ec56c0c
Submitter: Jenkins
Branch: master

commit ff3161a506efba42f09fa403cf875a5a4ec56c0c
Author: Matt Riedemann <email address hidden>
Date: Wed Jul 23 13:41:09 2014 -0700

    Log upload failed exception trace rather than debug

    We're seeing race failures in test_register_http_image
    in the gate due to a 500 coming back from the backing
    store when uploading the image in the v1 API, but it's
    logged at debug level so we can't fingerprint on the
    failure in logstash.

    This changes bumps the logging of the error from debug
    to exception so we get the trace on the error also.

    If it turns out that we have expected exception cases
    in the future and tracing on those isn't necessary,
    we can filter them out in a separate change.

    Change-Id: I87d1f8ae584b41456eae4f19a0bc60f13f47b5ac
    Related-Bug: #1347692

Revision history for this message
Matt Riedemann (mriedem) wrote :

This appears to have dropped off on 7/23 which could be due to this fix which is in the same code:

https://github.com/openstack/glance/commit/5148c9648fc959c1d807313176afe3fcf84b89cf

Revision history for this message
Attila Fazekas (afazekas) wrote :

http://logs.openstack.org/53/114553/1/check/check-tempest-dsvm-full/f8127df/logs/screen-s-proxy.txt.gz

proxy-server: ERROR 500 Traceback (most recent call last):#012 File "/opt/stack/new/swift/swift/obj/server.py", line 690, in __call__#012 res = method(req)#012 File "/opt/stack/new/swift/swift/common/utils.py", line 2448, in wrapped#012 return func(*a, **kw)#012 File "/opt/stack/new/swift/swift/common/utils.py", line 1033, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/opt/stack/new/swift/swift/obj/server.py", line 432, in PUT#012 upload_size = writer.write(chunk)#012 File "/opt/stack/new/swift/swift/obj/diskfile.py", line 754, in write#012 self._threadpool.run_in_thread(_write_entire_chunk, chunk)#012 File "/opt/stack/new/swift/swift/common/utils.py", line 2810, in run_in_thread#012 result = func(*args, **kwargs)#012 File "/opt/stack/new/swift/swift/obj/diskfile.py", line 750, in _write_entire_chunk#012 written = os.write(self._fd, chunk)#012OSError: [Errno 28] No space left on

Looks like we can run out from the 6G swift storage.

Revision history for this message
Attila Fazekas (afazekas) wrote :

Tempest uses less swift storage now. The storage size can be increased on devstack side as well https://review.openstack.org/#/c/114764/.

Revision history for this message
Attila Fazekas (afazekas) wrote :

The original issue not solved, the elastic-recheck still finds issues which does not have storage size issue, but this test case fails.
http://logs.openstack.org/13/115013/1/gate/gate-tempest-dsvm-postgres-full/1d67474/console.html

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Hit this at http://logs.openstack.org/41/116741/4/check/check-tempest-dsvm-full/e165c84/, found in the glance-api logs:

2014-08-27 04:13:14.268 22185 INFO glance.wsgi.server [-] 127.0.0.1 - - [27/Aug/2014 04:13:14] "HEAD /v1/images/36d64eb9-8f79-4ef9-9dfc-30c7c0be2651 HTTP/1.1" 200 932 0.039862
2014-08-27 04:13:14.393 22185 DEBUG glance.registry.client.v1.client [-] Registry request PUT /images/c42364fe-3420-4532-83b4-3340e51acd67 HTTP 200 request id req-307dd390-a384-4bc4-98ff-e631e3dd7083 do_request /opt/stack/new/glance/glance/registry/client/v1/client.py:122
2014-08-27 04:13:14.394 22185 ERROR glance.api.v1.images [-] Copy from external source 'swift' failed for image: c42364fe-3420-4532-83b4-3340e51acd67
2014-08-27 04:13:14.394 22185 TRACE glance.api.v1.images Traceback (most recent call last):
2014-08-27 04:13:14.394 22185 TRACE glance.api.v1.images File "/opt/stack/new/glance/glance/api/v1/images.py", line 583, in _upload
2014-08-27 04:13:14.394 22185 TRACE glance.api.v1.images dest=store)
2014-08-27 04:13:14.394 22185 TRACE glance.api.v1.images File "/opt/stack/new/glance/glance/api/v1/images.py", line 453, in _get_from_store
2014-08-27 04:13:14.394 22185 TRACE glance.api.v1.images context, where, dest=dest)
2014-08-27 04:13:14.394 22185 TRACE glance.api.v1.images File "/opt/stack/new/glance/glance/store/__init__.py", line 266, in get_from_backend
2014-08-27 04:13:14.394 22185 TRACE glance.api.v1.images return src_store.get(loc)
2014-08-27 04:13:14.394 22185 TRACE glance.api.v1.images File "/opt/stack/new/glance/glance/store/http.py", line 123, in get
2014-08-27 04:13:14.394 22185 TRACE glance.api.v1.images conn, resp, content_length = self._query(location, 'GET')
2014-08-27 04:13:14.394 22185 TRACE glance.api.v1.images File "/opt/stack/new/glance/glance/store/http.py", line 179, in _query
2014-08-27 04:13:14.394 22185 TRACE glance.api.v1.images raise exception.BadStoreUri(message=reason)
2014-08-27 04:13:14.394 22185 TRACE glance.api.v1.images BadStoreUri: HTTP URL returned a 500 status code.
2014-08-27 04:13:14.394 22185 TRACE glance.api.v1.images
2014-08-27 04:13:14.791 22185 DEBUG glance.registry.client.v1.api [-] Adding image metadata... add_image_metadata /opt/stack/new/glance/glance/registry/client/v1/api.py:159
2014-08-27 04:13:14.792 22185 DEBUG glance.common.client [-] Constructed URL: http://0.0.0.0:9191/images _construct_url /opt/stack/new/glance/glance/common/client.py:413

Revision history for this message
Adam Gandelman (gandelman-a) wrote :
Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Okay, this is a bit bonkers. I think this all points to a Horizon bug, of all things. There a few issues:

1)

glance logging is misleading. The test is attempting to register an image from an external URL (CONF.image.http_image) and register it in glance's swift store. This requires two calls to two different stores, http and swift. In the case of this error, we're getting a 500 from the first call (fetching the external URL) as seen in the trace:

glance.api.v1.images File "/opt/stack/new/glance/glance/store/http.py", line 179, in _query
glance.api.v1.images raise exception.BadStoreUri(message=reason)
glance.api.v1.images BadStoreUri: HTTP URL returned a 500 status code.

but the logged error is misleading, pointing to an error in swift itself:

glance.api.v1.images [-] Copy from external source 'swift' failed for image: c42364fe-3420-4532-83b4-3340e51acd67

2)

tempest is getting configured on the devstack slaves with CONF.image.http_image = http://127.0.0.1. I'm not sure this intentional or not, but results in the image being used in test actually being a dump of the horizon login page. The tempest default is to pull a cirros image from the internets, so its logical that we attempt to pull something local instead. Using the horizon login page as an image seems strange, but it works.

3)

In the case of these failures, glance's request to fetch http://127.0.0.1 is resulting in a 500 from horizon. horizon_error.txt contains a traceback and a 500 that points to https://bugs.launchpad.net/horizon/+bug/1345955

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

The choice of using the horizon login page for image is indeed intentional: https://review.openstack.org/#/c/32943/ Might be time to revisit that.

Revision history for this message
Mark Washenberger (markwash) wrote :

Thanks for the research Adam! I'm marking this as a duplicate of the horizon bug as you have indicated.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.