container replicator throws a disk I/O error on bad container, never finishes replicating it

Bug #1413409 reported by Caleb Tennis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
New
Undecided
Unassigned

Bug Description

Jan 21 03:51:08 localhost container-replicator: ERROR reading db /srv/node/d351/containers/23149/602/5a6db6d7e80b14b1a7aab9a19bdb4602/5a6db6d7e80b14b1a7aab9a19bdb4602.db: #012Traceback (most recent call last):#012 File "/opt/ss/lib/python2.7/site-packages/swift/common/db_replicator.py", line 435, in _replicate_object#012 now - (self.reclaim_age * 2))#012 File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 806, in reclaim#012 self._commit_puts()#012 File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 608, in _commit_puts#012 self.merge_items(item_list)#012 File "/opt/ss/lib/python2.7/site-packages/swift/container/backend.py", line 751, in merge_items#012 with self.get() as conn:#012 File "/opt/ss/lib/python2.7/contextlib.py", line 17, in __enter__#012 return self.gen.next()#012 File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 360, in get#012 self.possibly_quarantine(*sys.exc_info())#012 File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 358, in get#012 self.conn = get_db_connection(self.db_file, self.timeout)#012 File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 190, in get_db_connection#012 timeout=timeout)#012DatabaseConnectionError: DB connection error (/srv/node/d351/containers/23149/602/5a6db6d7e80b14b1a7aab9a19bdb4602/5a6db6d7e80b14b1a7aab9a19bdb4602.db, 25):#012Traceback (most recent call last):#012 File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 182, in get_db_connection#012 cur.execute('PRAGMA synchronous = NORMAL')#012 File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 129, in execute#012 self.timeout, self.db_file, lambda: sqlite3.Cursor.execute(#012 File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 67, in _db_timeout#012 return call()#012 File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 130, in <lambda>#012 self, *args, **kwargs))#012OperationalError: disk I/O error#012

Jan 21 03:51:08 localhost container-replicator: ERROR reading db /srv/node/d351/containers/23149/602/5a6db6d7e80b14b1a7aab9a19bdb4602/5a6db6d7e80b14b1a7aab9a19bdb4602.db:
Traceback (most recent call last):
  File "/opt/ss/lib/python2.7/site-packages/swift/common/db_replicator.py", line 435, in _replicate_object
    now - (self.reclaim_age * 2))
  File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 806, in reclaim
    self._commit_puts()
  File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 608, in _commit_puts
    self.merge_items(item_list)
  File "/opt/ss/lib/python2.7/site-packages/swift/container/backend.py", line 751, in merge_items
    with self.get() as conn:
  File "/opt/ss/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 360, in get
    self.possibly_quarantine(*sys.exc_info())
  File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 358, in get
    self.conn = get_db_connection(self.db_file, self.timeout)
  File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 190, in get_db_connection
    timeout=timeout)
DatabaseConnectionError: DB connection error (/srv/node/d351/containers/23149/602/5a6db6d7e80b14b1a7aab9a19bdb4602/5a6db6d7e80b14b1a7aab9a19bdb4602.db, 25):
Traceback (most recent call last):
  File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 182, in get_db_connection
    cur.execute('PRAGMA synchronous = NORMAL')
  File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 129, in execute
    self.timeout, self.db_file, lambda: sqlite3.Cursor.execute(
  File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 67, in _db_timeout
    return call()
  File "/opt/ss/lib/python2.7/site-packages/swift/common/db.py", line 130, in <lambda>
    self, *args, **kwargs))
OperationalError: disk I/O error

[root@store-012 5a6db6d7e80b14b1a7aab9a19bdb4602]# ls -l
total 10260
-rw------- 1 swift swift 5932032 Jan 14 17:30 5a6db6d7e80b14b1a7aab9a19bdb4602.db
-rw-r--r-- 1 swift swift 512 Jan 14 17:31 5a6db6d7e80b14b1a7aab9a19bdb4602.db-journal
-rw-r--r-- 1 swift swift 16008 Jan 15 13:17 5a6db6d7e80b14b1a7aab9a19bdb4602.db.pending

The I/O error isn't anything to do with disk IO, the files can all be "read" fine. The issue has something to do with corrupt data in the db-journal. We moved that out of the way, and the replicator finished just fine, cleaning up the contents.

The "error" in my opinion here is that we shouldn't be backtracing, but either reporting more gracefully, or perhaps better, quarantining the container.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.