test_volume_upload failed in Tempest

Bug #1233908 reported by Ben Nemec
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Cinder
Invalid
Undecided
Unassigned
Glance
Invalid
Undecided
Unassigned
OpenStack Object Storage (swift)
Invalid
Undecided
Unassigned

Bug Description

Revision history for this message
John Griffith (john-griffith) wrote :

Root here seems to be the connection between glance and swift:
http://logs.openstack.org/75/49275/1/check/check-tempest-devstack-vm-postgres-full/8ae87da/logs/screen-g-api.txt.gz#_2013-10-01_22_53_40_415

The sheepdog failure at init I think is unrelated here.

Changed in cinder:
status: New → Invalid
Revision history for this message
Zhi Yan Liu (lzy-dev) wrote :

Can any openstack-infra guy help increase "node_timeout" option for swift proxy service? And answer follow three questions? currently we can just know the object server says it uploaded that request successfully and the proxy errored with a Timeout.

A few questions can help resolve this issue:
1. How many nodes in the swift cluster?
2. How many replica there?
3. why don't the swift logs have timestamps? with some timestamps on the logs it be a lot easier to see if the proxy did hit a timeout. And we'd know what to set "node_timeout" option.

Revision history for this message
Peter Portante (peter-a-portante) wrote :

Hello, looking at the swift logs, it appears there was a PUT operation that started in and around: Oct 1 22:53:12

At Oct 1 22:53:38, the proxy server having finished transmitting all the client data to the object server, timed out waiting for a response from the PUT operation after 10 seconds.

At Oct 1 22:53:42, the object server (apparently unaware that the proxy server gave up) finished the PUT operation and tried to issue a response.

Raising the timeout might help, but would be more curious to find out what was either keeping the object server from running for 10 seconds or more.

Revision history for this message
Erno Kuvaja (jokke) wrote :

Long time inactivity and by the details looks like timeouts. Perhaps Glance could be removed?

Revision history for this message
Mauro S M Rodrigues (maurorodrigues) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

Bug 1352548 is related.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (4.4 KiB)

Hit something similar here, but this is going from 'saving' to 'killed', which I think is automatic if the transfer times out/fails:

http://logs.openstack.org/27/115027/1/check/check-tempest-dsvm-postgres-full/a0affee/console.html#_2014-08-18_18_18_47_767

2014-08-18 18:18:47.767 | tempest.api.volume.test_volumes_actions.VolumesV1ActionsTest.test_volume_upload[gate,image]
2014-08-18 18:18:47.767 | -------------------------------------------------------------------------------------------
2014-08-18 18:18:47.767 |
2014-08-18 18:18:47.767 | Captured traceback:
2014-08-18 18:18:47.767 | ~~~~~~~~~~~~~~~~~~~
2014-08-18 18:18:47.767 | Traceback (most recent call last):
2014-08-18 18:18:47.768 | File "tempest/test.py", line 128, in wrapper
2014-08-18 18:18:47.768 | return f(self, *func_args, **func_kwargs)
2014-08-18 18:18:47.768 | File "tempest/api/volume/test_volumes_actions.py", line 106, in test_volume_upload
2014-08-18 18:18:47.768 | self.image_client.wait_for_image_status(image_id, 'active')
2014-08-18 18:18:47.768 | File "tempest/services/image/v1/json/image_client.py", line 295, in wait_for_image_status
2014-08-18 18:18:47.768 | status=status)
2014-08-18 18:18:47.768 | ImageKilledException: Image 1da39d96-331f-43a6-8c6f-b9942170c1dd 'killed' while waiting for 'active'

Querying on this:

message:"Value transition from \"saving\" to \"killed\"" AND module:"tempest.services.image.v1.json.image_client" AND tags:"tempest.txt"

There are 73 hits in 7 days, check and gate, all failures:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiVmFsdWUgdHJhbnNpdGlvbiBmcm9tIFxcXCJzYXZpbmdcXFwiIHRvIFxcXCJraWxsZWRcXFwiXCIgQU5EIG1vZHVsZTpcInRlbXBlc3Quc2VydmljZXMuaW1hZ2UudjEuanNvbi5pbWFnZV9jbGllbnRcIiBBTkQgdGFnczpcInRlbXBlc3QudHh0XCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0MDgzODk1MDg5MjMsIm1vZGUiOiIiLCJhbmFseXplX2ZpZWxkIjoiIn0=

This is in g-api logs for the same image:

http://logs.openstack.org/27/115027/1/check/check-tempest-dsvm-postgres-full/a0affee/logs/screen-g-api.txt.gz?level=INFO#_2014-08-18_18_11_22_563

2014-08-18 18:11:22.563 22562 ERROR glance.store.swift [-] Failed to add object to Swift.
Got error from Swift: put_object('glance', '1da39d96-331f-43a6-8c6f-b9942170c1dd', ...) failure and no ability to reset contents for reupload.
2014-08-18 18:11:22.564 22562 ERROR glance.api.v1.upload_utils [-] Failed to upload image 1da39d96-331f-43a6-8c6f-b9942170c1dd
2014-08-18 18:11:22.564 22562 TRACE glance.api.v1.upload_utils Traceback (most recent call last):
2014-08-18 18:11:22.564 22562 TRACE glance.api.v1.upload_utils File "/opt/stack/new/glance/glance/api/v1/upload_utils.py", line 105, in upload_data_to_store
2014-08-18 18:11:22.564 22562 TRACE glance.api.v1.upload_utils store)
2014-08-18 18:11:22.564 22562 TRACE glance.api.v1.upload_utils File "/opt/stack/new/glance/glance/store/__init__.py", line 340, in store_add_to_backend
2014-08-18 18:11:22.564 22562 TRACE glance.api.v1.upload_utils (location, size, checksum, metadata) = store.add(image_id, data, size)
2014-08-18 18:11:22.564 22...

Read more...

Revision history for this message
Matt Riedemann (mriedem) wrote :

Note that there is a 503 coming back from swift:

2014-08-18 18:11:21.561 22562 INFO swiftclient [-] RESP STATUS: 503 Internal Server Error

Maybe this is related to python-swiftclient bug 1295812 and we just aren't pooling connections to swift as well as we should?

Revision history for this message
Matt Riedemann (mriedem) wrote :

elastic-recheck query here: https://review.openstack.org/#/c/115082/

Revision history for this message
Matt Riedemann (mriedem) wrote :

Maybe this isn't an issue anymore? We're not seeing it in the gate now.

Changed in swift:
status: New → Incomplete
Changed in glance:
status: New → Incomplete
Matt Riedemann (mriedem)
Changed in glance:
status: Incomplete → Invalid
Changed in swift:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.