test_object_upload_in_segments fails with OperationalError database is locked

Bug #1224253 reported by Peter Portante
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Unassigned

Bug Description

See http://logs.openstack.org/46/46146/2/check/gate-tempest-devstack-vm-full/8209945/console.html

2013-09-12 02:41:15.508 | ======================================================================
2013-09-12 02:41:15.508 | FAIL: tempest.api.object_storage.test_object_services.ObjectTest.test_object_upload_in_segments[gate]
2013-09-12 02:41:15.509 | tempest.api.object_storage.test_object_services.ObjectTest.test_object_upload_in_segments[gate]
2013-09-12 02:41:15.509 | ----------------------------------------------------------------------
2013-09-12 02:41:15.540 | _StringException: Empty attachments:
2013-09-12 02:41:15.540 | stderr
2013-09-12 02:41:15.540 | stdout
2013-09-12 02:41:15.541 |
2013-09-12 02:41:15.541 | pythonlogging:'': {{{
2013-09-12 02:41:15.541 | 2013-09-12 02:31:13,827 Request: PUT http://127.0.0.1:8080/v1/AUTH_871ae2648efc4d798bf91c97e75bbc1b/TestContainer41153477/LObject523949731/0
2013-09-12 02:41:15.542 | 2013-09-12 02:31:13,848 Response Status: 201
2013-09-12 02:41:15.542 | 2013-09-12 02:31:13,849 Request: PUT http://127.0.0.1:8080/v1/AUTH_871ae2648efc4d798bf91c97e75bbc1b/TestContainer41153477/LObject523949731/1
2013-09-12 02:41:15.542 | 2013-09-12 02:31:13,868 Response Status: 201
2013-09-12 02:41:15.543 | 2013-09-12 02:31:13,868 Request: PUT http://127.0.0.1:8080/v1/AUTH_871ae2648efc4d798bf91c97e75bbc1b/TestContainer41153477/LObject523949731/2
2013-09-12 02:41:15.543 | 2013-09-12 02:31:13,903 Response Status: 201
2013-09-12 02:41:15.543 | 2013-09-12 02:31:13,903 Request: PUT http://127.0.0.1:8080/v1/AUTH_871ae2648efc4d798bf91c97e75bbc1b/TestContainer41153477/LObject523949731/3
2013-09-12 02:41:15.544 | 2013-09-12 02:31:13,934 Response Status: 201
2013-09-12 02:41:15.544 | 2013-09-12 02:31:13,934 Request: PUT http://127.0.0.1:8080/v1/AUTH_871ae2648efc4d798bf91c97e75bbc1b/TestContainer41153477/LObject523949731/4
2013-09-12 02:41:15.544 | 2013-09-12 02:31:13,976 Response Status: 201
2013-09-12 02:41:15.545 | 2013-09-12 02:31:13,976 Request: PUT http://127.0.0.1:8080/v1/AUTH_871ae2648efc4d798bf91c97e75bbc1b/TestContainer41153477/LObject523949731/5
2013-09-12 02:41:15.545 | 2013-09-12 02:31:14,000 Response Status: 201
2013-09-12 02:41:15.545 | 2013-09-12 02:31:14,000 Request: PUT http://127.0.0.1:8080/v1/AUTH_871ae2648efc4d798bf91c97e75bbc1b/TestContainer41153477/LObject523949731/6
2013-09-12 02:41:15.546 | 2013-09-12 02:31:14,025 Response Status: 201
2013-09-12 02:41:15.546 | 2013-09-12 02:31:14,026 Request: PUT http://127.0.0.1:8080/v1/AUTH_871ae2648efc4d798bf91c97e75bbc1b/TestContainer41153477/LObject523949731/7
2013-09-12 02:41:15.546 | 2013-09-12 02:31:14,046 Response Status: 201
2013-09-12 02:41:15.547 | 2013-09-12 02:31:14,046 Request: PUT http://127.0.0.1:8080/v1/AUTH_871ae2648efc4d798bf91c97e75bbc1b/TestContainer41153477/LObject523949731/8
2013-09-12 02:41:15.547 | 2013-09-12 02:31:14,059 Response Status: 201
2013-09-12 02:41:15.547 | 2013-09-12 02:31:14,059 Request: PUT http://127.0.0.1:8080/v1/AUTH_871ae2648efc4d798bf91c97e75bbc1b/TestContainer41153477/LObject523949731/9
2013-09-12 02:41:15.548 | 2013-09-12 02:31:14,078 Response Status: 201
2013-09-12 02:41:15.548 | 2013-09-12 02:31:14,078 Request: PUT http://127.0.0.1:8080/v1/AUTH_871ae2648efc4d798bf91c97e75bbc1b/TestContainer41153477/LObject523949731
2013-09-12 02:41:15.549 | 2013-09-12 02:31:14,093 Response Status: 201
2013-09-12 02:41:15.549 | 2013-09-12 02:31:14,094 Request: POST http://127.0.0.1:8080/v1/AUTH_871ae2648efc4d798bf91c97e75bbc1b/TestContainer41153477/LObject523949731
2013-09-12 02:41:15.549 | 2013-09-12 02:31:14,112 Response Status: 202
2013-09-12 02:41:15.550 | 2013-09-12 02:31:14,112 Request: HEAD http://127.0.0.1:8080/v1/AUTH_871ae2648efc4d798bf91c97e75bbc1b/TestContainer41153477/LObject523949731
2013-09-12 02:41:15.550 | 2013-09-12 02:31:14,149 Response Status: 200
2013-09-12 02:41:15.551 | 2013-09-12 02:31:14,150 Request: GET http://127.0.0.1:8080/v1/AUTH_871ae2648efc4d798bf91c97e75bbc1b/TestContainer41153477/LObject523949731
2013-09-12 02:41:15.551 | 2013-09-12 02:31:14,165 Response Status: 500
2013-09-12 02:41:15.551 | }}}
2013-09-12 02:41:15.552 |
2013-09-12 02:41:15.552 | Traceback (most recent call last):
2013-09-12 02:41:15.552 | File "tempest/api/object_storage/test_object_services.py", line 292, in test_object_upload_in_segments
2013-09-12 02:41:15.553 | self.container_name, object_name)
2013-09-12 02:41:15.553 | File "tempest/services/object_storage/object_client.py", line 77, in get_object
2013-09-12 02:41:15.553 | resp, body = self.get(url)
2013-09-12 02:41:15.554 | File "tempest/common/rest_client.py", line 263, in get
2013-09-12 02:41:15.554 | return self.request('GET', url, headers)
2013-09-12 02:41:15.554 | File "tempest/common/rest_client.py", line 394, in request
2013-09-12 02:41:15.555 | resp, resp_body)
2013-09-12 02:41:15.555 | File "tempest/common/rest_client.py", line 480, in _error_checker
2013-09-12 02:41:15.555 | raise exceptions.ComputeFault(message)
2013-09-12 02:41:15.556 | ComputeFault: Got compute fault
2013-09-12 02:41:15.556 | Details: <html><h1>Internal Error</h1><p>The server has either erred or is incapable of performing the requested operation.</p></html>

The 500 error encountered by the container server is:

Sep 12 02:31:14 devstack-precise-hpcloud-az3-255091 container-server ERROR __call__ error with GET /sdb1/338/AUTH_871ae2648efc4d798bf91c97e75bbc1b/TestContainer41153477 :
Traceback (most recent call last):
  File "/opt/stack/new/swift/swift/container/server.py", line 493, in __call__
    res = method(req)
  File "/opt/stack/new/swift/swift/common/utils.py", line 1896, in wrapped
    return func(*a, **kw)
  File "/opt/stack/new/swift/swift/common/utils.py", line 686, in _timing_stats
    resp = func(ctrl, *args, **kwargs)
  File "/opt/stack/new/swift/swift/container/server.py", line 375, in GET
    if broker.is_deleted():
  File "/opt/stack/new/swift/swift/container/backend.py", line 241, in is_deleted
    with self.get() as conn:
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/opt/stack/new/swift/swift/common/db.py", line 325, in get
    self.possibly_quarantine(*sys.exc_info())
  File "/opt/stack/new/swift/swift/common/db.py", line 323, in get
    self.conn = get_db_connection(self.db_file, self.timeout)
  File "/opt/stack/new/swift/swift/common/db.py", line 167, in get_db_connection
    timeout=timeout)
DatabaseConnectionError: DB connection error (/opt/stack/data/swift/1/sdb1/containers/338/9dc/a96bffe352715985b2d2b2fbb366d9dc/a96bffe352715985b2d2b2fbb366d9dc.db, 25):
Traceback (most recent call last):
  File "/opt/stack/new/swift/swift/common/db.py", line 159, in get_db_connection
    cur.execute('PRAGMA synchronous = NORMAL')
OperationalError: database is locked
 (txn: txbee4d89e30fb4e8b8698b-0052312772)

Revision history for this message
Peter Portante (peter-a-portante) wrote :
Revision history for this message
Juan J. Martínez (jjmartinez) wrote :

I'm experiencing this issue in a test cluster with swift 1.10.0.

Revision history for this message
Juan J. Martínez (jjmartinez) wrote :

I've managed to isolate the change that introduced the problem:

https://github.com/openstack/swift/commit/0fdad0d9d9e68b00f61171bb2a0dfd840ef5345f

As this change is just for PyPy support I've reverted it and the problem is gone.

Revision history for this message
Alex Gaynor (alex-gaynor) wrote :
Revision history for this message
Peter Portante (peter-a-portante) wrote :

So could this be that the PyPy change is a red-herring, and that all it does is change the timing a bit, but that the real problem is the race condition solved by Greg Lange change here: https://review.openstack.org/57019 ?

Revision history for this message
Peter Portante (peter-a-portante) wrote :

That should be Steven Lang's patch, not Greg Lange. Sorry.

Revision history for this message
Juan J. Martínez (jjmartinez) wrote :

It could be!

In fact I spent two days triaging this issue and several times I had the feeling that the code was "wrong" all together and it's just the cursor that it's making the problem more noticeable

I'll give that patch a go when I have the time, thanks for the pointer!

Revision history for this message
Steven Lang (steven-lang) wrote :

The impact of the change for PyPy support in 0fdad0d9 was that conn.execute() had been wrapped in a green thread friendly retry/timeout, but the cursor had not. So a cursor was calling execute directly with no retry/timeout, and when it hit a locked DB it returned an exception immediately.

Revision history for this message
Juan J. Martínez (jjmartinez) wrote :

I've tested https://review.openstack.org/#/c/57019/ and it fixes the problem for me!

Revision history for this message
Christian Schwede (cschwede) wrote :

Looks like https://review.openstack.org/#/c/57019/ fixed this -> closing.

Changed in swift:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.