Swift: No space left on device

Bug #1225664 reported by Christopher Yeoh
50
This bug affects 10 people
Affects Status Importance Assigned to Milestone
Cinder
Invalid
Undecided
Unassigned
Glance
Invalid
Undecided
Unassigned
OpenStack Object Storage (swift)
Invalid
Undecided
Unassigned
devstack
Fix Released
Undecided
Joe Gordon

Bug Description

Intermittent failures with tempest.api.volume.test_volumes_actions.VolumesActionsTestXML:

2013-09-15 12:19:50,370 Glance request id req-e344e633-818b-4e74-a153-dc5495300cbe
2013-09-15 12:19:52,372 Request: HEAD http://127.0.0.1:9292/v1/images/5e4d189e-8de7-428a-923f-1c1090745bf9
2013-09-15 12:19:52,400 Response Status: 200
2013-09-15 12:19:52,400 Glance request id req-c6b0838c-6c4a-44ee-a23b-7a251d6a75db
2013-09-15 12:19:54,403 Request: HEAD http://127.0.0.1:9292/v1/images/5e4d189e-8de7-428a-923f-1c1090745bf9
2013-09-15 12:19:54,428 Response Status: 200
2013-09-15 12:19:54,428 Glance request id req-46580425-9cc0-46fc-8719-86b103604d60
2013-09-15 12:19:56,430 Request: HEAD http://127.0.0.1:9292/v1/images/5e4d189e-8de7-428a-923f-1c1090745bf9
2013-09-15 12:19:56,458 Response Status: 200
2013-09-15 12:19:56,458 Glance request id req-de198fae-8eb6-4217-a757-847e188f187c
2013-09-15 12:19:58,460 Request: HEAD http://127.0.0.1:9292/v1/images/5e4d189e-8de7-428a-923f-1c1090745bf9
2013-09-15 12:19:58,485 Response Status: 200
2013-09-15 12:19:58,485 Glance request id req-5797d453-f722-47ab-a6e5-662694636761
2013-09-15 12:19:59,490 Request: DELETE http://127.0.0.1:9292/v1/images/5e4d189e-8de7-428a-923f-1c1090745bf9
2013-09-15 12:19:59,560 Response Status: 200
2013-09-15 12:19:59,560 Glance request id req-7f160544-13d4-465d-a0bf-bc969a8021b5
}}}

Traceback (most recent call last):
  File "tempest/api/volume/test_volumes_actions.py", line 112, in test_volume_upload
    self.image_client.wait_for_image_status(image_id, 'active')
  File "tempest/services/image/v1/json/image_client.py", line 276, in wait_for_image_status
    raise exceptions.TimeoutException(message)
TimeoutException: Request timed out
Details: Time Limit Exceeded! (400s)while waiting for active, but we got killed.

Full logs here: http://logs.openstack.org/23/42523/4/check/gate-tempest-devstack-vm-full/8352d30/testr_results.html.gz

Revision history for this message
Attila Fazekas (afazekas) wrote :

glance uses swift backend and you can see the followings in the glance log:

http://logs.openstack.org/23/42523/4/check/gate-tempest-devstack-vm-full/8352d30/logs/screen-g-api.txt.gz#_2013-09-15_11_59_14_678

http://logs.openstack.org/23/42523/4/check/gate-tempest-devstack-vm-full/8352d30/logs/screen-g-api.txt.gz#_2013-09-15_12_13_42_773

At same time the swift proxy sent an 500:

http://logs.openstack.org/23/42523/4/check/gate-tempest-devstack-vm-full/8352d30/logs/syslog.txt.gz
Sep 15 12:13:41 devstack-precise-hpcloud-az2-285663 proxy-server ERROR 500 Traceback (most recent call last):#012 File "/opt/stack/new/swift/swift/obj/server.py", line 631, in __call__#012 res = method(req)#012 File "/opt/stack/new/swift/swift/common/utils.py", line 1896, in wrapped#012 return func(*a, **kw)#012 File "/opt/stack/new/swift/swift/common/utils.py", line 686, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/opt/stack/new/swift/swift/obj/server.py", line 378, in PUT#012 writer.write(chunk)#012 File "/opt/stack/new/swift/swift/obj/diskfile.py", line 301, in write#012 self.threadpool.run_in_thread(_write_entire_chunk, chunk)#012 File "/opt/stack/new/swift/swift/common/utils.py", line 2151, in run_in_thread#012 return func(*args, **kwargs)#012 File "/opt/stack/new/swift/swift/obj/diskfile.py", line 297, in _write_entire_chunk#012 written = os.write(self.fd, chunk)#012OSError: [Errno 28] No space left on device#012 From Object Server re: /v1/AUTH_66c403bb38b94b74851290e6a83f16dd/glance/5e4d189e-8de7-428a-923f-1c1090745bf9 127.0.0.1:6013/sdb1 (txn: tx8c6af44ce8af402ba2aa8-005235a45f)

Swift storage size should increased.

affects: tempest → openstack-ci
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack-gate (master)

Fix proposed to branch: master
Review: https://review.openstack.org/46663

Revision history for this message
Jeremy Stanley (fungi) wrote : Re: tempest.api.volume.test_volumes_actions.VolumesActionsTestXML flakey failure

The last time adjusting SWIFT_LOOPBACK_DISK_SIZE came up and we put that in devstack-gate, there was a post-facto discussion that we really should have increased the default value in devstack instead. Quoting from #openstack-infra...

[...]
> 2013-06-25 00:19:10 <openstackgerrit> A change was merged to
> openstack-infra/devstack-gate: Extends SWIFT loopback device size
> to 2G https://review.openstack.org/34102
> 2013-06-25 00:26:32 <jeblair> fungi, mordred: ^ is that referring to a
> devstack exercise or tempest test?
> 2013-06-25 00:28:46 <jeblair> fungi, mordred: tempest, it looks like.
> did you consider adding that to devstack instead of devstack-gate?
> 2013-06-25 00:40:02 <fungi> jeblair: yes, there was some debate as to
> whether doing that in d-g to accommodate tempest on our slaves vs
> as a default in devstack was a more appropriate solution. i didn't
> know which would be preferred, so i just suggested getting
> something up for review as a conversation starter
> 2013-06-25 00:41:09 <jeblair> fungi: looks like the conversation's over.
> 2013-06-25 00:41:42 <jeblair> my preference is for devstack to have
> sensible defaults, so that people actually stand a chance of being
> able to run tests without using the gate configuration.
> 2013-06-25 00:42:34 <fungi> well, there's nothing stopping devstack
> from upping its default, at which our setting in d-g is redundant.
> but understood. if devstack is intended as a reference platform
> to run tempest on then it makes sense to accommodate
> tempest there instead of in d-g
> 2013-06-25 00:43:29 <jeblair> fungi: yeah, but i'm guessing no one has
> suggested increasing it in devstack, which makes this just more
> permacruft in devstack-gate.
> 2013-06-25 00:47:14 <fungi> well, i suggested both but said i didn't
> know which was preferred, and that was the path the author
> took. i will definitely push to satisfy tempest needs in devstack
> instead of in devstack-gate from now on
> 2013-06-25 00:48:19 <fungi> especially if the goal is to have devstack
> defaults suitable for running tempest out of the box
[...]

Based on that discussion, I think we should probably consider removing the override from devstack-gate entirely and punt this bug to devstack instead.

Changed in openstack-ci:
status: New → Triaged
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (master)

Fix proposed to branch: master
Review: https://review.openstack.org/46770

Revision history for this message
Joe Gordon (jogo) wrote : Re: tempest.api.volume.test_volumes_actions.VolumesActionsTestXML flakey failure

logstash query: @message:"Details: Time Limit Exceeded! (400s)while waiting for active, but we got killed." AND @fields.filename:"console.html" AND @fields.build_status:"FAILURE"

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack (master)

Reviewed: https://review.openstack.org/46770
Committed: http://github.com/openstack-dev/devstack/commit/3418c1caa5c52fd9989e5829fda0848b4a8dfea7
Submitter: Jenkins
Branch: master

commit 3418c1caa5c52fd9989e5829fda0848b4a8dfea7
Author: Attila Fazekas <email address hidden>
Date: Mon Sep 16 18:35:49 2013 +0200

    Increase default swift storage

    Swift storage is used as glance image back-end. Tempest have cinder to
    uploads 1 GiB image from cinder in twice.

    In parallel execution in cause an issue, bacuse the current default size is
    1_000_000 KiB.

    Increasing the default swit storage size from 1_000_000 KiB 4_000_000
    KiB when tempest is enabled.

    Fixing bug 1225664

    Change-Id: Iccd6368e4df71abb5ccfe7d361c64d86e1071d35

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack-gate (master)

Reviewed: https://review.openstack.org/46663
Committed: http://github.com/openstack-infra/devstack-gate/commit/4155df85cb61b6126368b3b816396d7064f678ec
Submitter: Jenkins
Branch: master

commit 4155df85cb61b6126368b3b816396d7064f678ec
Author: afazekas <email address hidden>
Date: Sun Sep 15 18:22:49 2013 +0200

    Let devstack to decide about swift storage

    Besed on several discussion, the right swift storage size value should
    be configured by devstack and not by devstack gate.

    Fixing bug 1225664

    Change-Id: I421d8ee6fc6cbde463592134832d169f91d8a91b

Revision history for this message
John Griffith (john-griffith) wrote : Re: tempest.api.volume.test_volumes_actions.VolumesActionsTestXML flakey failure
Clark Boylan (cboylan)
Changed in openstack-ci:
milestone: none → icehouse
Clark Boylan (cboylan)
no longer affects: openstack-ci
Revision history for this message
John Griffith (john-griffith) wrote :
Revision history for this message
John Griffith (john-griffith) wrote :

Leaving Cinder as affected for now until we get root fixed.

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

It doesn't look like heat is directly affected by this

no longer affects: heat
Revision history for this message
Vijendar Komalla (vijendar-komalla) wrote :

It still affects openstack-ci. Take a look at https://review.openstack.org/#/c/52137/

Revision history for this message
Vijendar Komalla (vijendar-komalla) wrote :

Below given is the error message.

2013-10-24 20:49:29.505 | Traceback (most recent call last):
2013-10-24 20:49:29.505 | File "tempest/api/image/v1/test_images.py", line 90, in test_register_http_image
2013-10-24 20:49:29.505 | self.client.wait_for_image_status(image_id, 'active')
2013-10-24 20:49:29.506 | File "tempest/services/image/v1/json/image_client.py", line 276, in wait_for_image_status
2013-10-24 20:49:29.506 | raise exceptions.TimeoutException(message)
2013-10-24 20:49:29.506 | TimeoutException: Request timed out
2013-10-24 20:49:29.506 | Details: Time Limit Exceeded! (400s)while waiting for active, but we got killed.

Revision history for this message
Clark Boylan (cboylan) wrote :

Vijendar, I have removed openstack-ci from the bug as jgriffith believes the bug is in glance or swift. Can you add more info on why change 52137 indicates this is an openstack-ci bug?

Revision history for this message
Zhi Yan Liu (lzy-dev) wrote :

From glance-api log message seems this bug report related to https://bugs.launchpad.net/cinder/+bug/1233908 . As I mentioned in there I think we'd better increase "node_timeout" option value for swift proxy service also.

Revision history for this message
Vijendar Komalla (vijendar-komalla) wrote :

Clark, I am not sure about the project that is causing this bug, but seeing openstack-ci failure consistently. Below given test is failing. You are right. This seems to be the issue with glace!

FAIL: tempest.api.image.v1.test_images.CreateRegisterImagesTest.test_register_http_image[gate]
2013-10-25 16:10:47.294 | tempest.api.image.v1.test_images.CreateRegisterImagesTest.test_register_http_image[gate]
2013-10-25 16:10:47.294 | ----------------------------------------------------------------------
.
.
.
2013-10-25 16:10:47.458 | Traceback (most recent call last):
2013-10-25 16:10:47.458 | File "tempest/api/image/v1/test_images.py", line 90, in test_register_http_image
2013-10-25 16:10:47.458 | self.client.wait_for_image_status(image_id, 'active')
2013-10-25 16:10:47.459 | File "tempest/services/image/v1/json/image_client.py", line 276, in wait_for_image_status
2013-10-25 16:10:47.459 | raise exceptions.TimeoutException(message)
2013-10-25 16:10:47.459 | TimeoutException: Request timed out
2013-10-25 16:10:47.459 | Details: Time Limit Exceeded! (400s)while waiting for active, but we got killed.
2013-10-25 16:10:47.459 |

Revision history for this message
Clark Boylan (cboylan) wrote :

I have removed openstack-ci again. In this case the CI systems are working properly and correctly reporting failure back to Gerrit. There is a real bug (this bug) in glance and/or swift that is affecting tests some percentage of the time. If we can isolate these types of failures to the services being tested we don't want the bug to also be attached to the CI system as there isn't much we can do from an infrastructure perspective to fix most of these.

Most failed tests do not reflect a problem with the infrastructure. Those that do should have bugs assigned to them under openstack-ci. Examples of these sorts of failures would be failed git clones, or Jenkins exceptions that prevent tests from running.

no longer affects: openstack-ci
Revision history for this message
John Griffith (john-griffith) wrote :

Root cause of this appears to be Glance-->Swift

Changed in cinder:
status: New → Invalid
Revision history for this message
Joe Gordon (jogo) wrote :
Joe Gordon (jogo)
summary: - tempest.api.volume.test_volumes_actions.VolumesActionsTestXML flakey
- failure
+ Swift: No space left on device
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to devstack (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/56116

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to devstack (master)

Reviewed: https://review.openstack.org/56116
Committed: http://github.com/openstack-dev/devstack/commit/66c54249805c9a6e863c81b754f4abae71aa1b2b
Submitter: Jenkins
Branch: master

commit 66c54249805c9a6e863c81b754f4abae71aa1b2b
Author: Joe Gordon <email address hidden>
Date: Tue Nov 12 16:24:14 2013 -0800

    Bump SWIFT_LOOPBACK_DISK_SIZE_DEFAULT over swift max_file_size

    Swift is returning 50x error codes because its disk is too small, set
    size bigger then max_file_size in an attempt to fix the problem, or at
    least reduce it.

    "we create a 4GB device, but swift thinks it can write 5GB, hence fail"
    --sdague

    This patch based off of Iccd6368e4df71abb5ccfe7d361c64d86e1071d35

    Change-Id: Ib56a98cd74e7edf1fa90facc25c72632d43180f1
    Related-Bug: #1225664

Joe Gordon (jogo)
Changed in devstack:
status: New → Fix Committed
assignee: nobody → Joe Gordon (jogo)
Changed in glance:
status: New → Invalid
Changed in swift:
status: New → Invalid
Dean Troyer (dtroyer)
Changed in devstack:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.