Using swift 1.7.6.
I noticed the following from the account-auditor in the backgroung.log:
Apr 12 11:03:11 sw-ae1az2-object0007 account-auditor ERROR Could not get account info /srv/node/disk8/accounts/6774054/eda/ceba4d9bfeb1c4c61d01de3bc8a3aeda/ceba4d9bfeb1c4c61d01de3bc8a3aeda.db: #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/account/auditor.py", line 114, in account_audit#012 if not broker.is_deleted():#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 1512, in is_deleted#012 self.commit_puts()#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 1353, in _commit_puts#012 self.merge_items(item_list)#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 1690, in merge_items#012 conn.commit()#012 File "/usr/lib/python2.7/contextlib.py", line 35, in __exit_#012 self.gen.throw(type, value, traceback)#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 325, in get#012 self.possibly_quarantine(*sys.exc_info())#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 317, in get#012 yield conn#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 1649, in merge_items#012 curs = conn.execute(query, (rec['name'],))#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 86, in execute#012 return self._timeout(lambda: sqlite3.Connection.execute(#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 79, in _timeout#012 return call()#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 87, in <lambda>#012 self, *args, **kwargs))#012OperationalError: disk I/O error
The account-server is also generating a similar trace when trying to update the same account database:
Apr 12 10:10:51 sw-ae1az2-object0007 account-server ERROR _call_ error with PUT /disk8/6774054/38028656263528/swift_monitor_latency_test : #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/account/server.py", line 347, in _call#012 res = method(req)#012 File "/usr/lib/python2.7/dist-packages/swift/common/utils.py", line 1501, in wrapped#012 return func(*a, **kw)#012 File "/usr/lib/python2.7/dist-packages/swift/common/utils.py", line 485, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/dist-packages/swift/account/server.py", line 109, in PUT#012 'yes' and broker.is_deleted():#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 1512, in is_deleted#012 self._commit_puts()#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 1353, in _commit_puts#012 self.merge_items(item_list)#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 1690, in merge_items#012 conn.commit()#012 File "/usr/lib/python2.7/contextlib.py", line 35, in __exit_#012 self.gen.throw(type, value, traceback)#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 325, in get#012 self.possibly_quarantine(*sys.exc_info())#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 317, in get#012 yield conn#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 1649, in merge_items#012 curs = conn.execute(query, (rec['name'],))#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 86, in execute#012 return self._timeout(lambda: sqlite3.Connection.execute(#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 79, in _timeout#012 return call()#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 87, in <lambda>#012 self, *args, **kwargs))#012OperationalError: disk I/O error (txn: txe39256a61ae34e5e9e333f76beb3892d)
The problem was an unrecoverable read error on one of the sectors used by the account db.
I would have expected the account-auditor to detect this failure and quarantine the problem db but it did not.
Yeah, the decision of whether or not a database exception requires quarantine is based on some string matching for the exception message, and "disk I/O error" doesn't match those.
See swift.common. db.DatabaseBrok er.possibly_ quarantine( ) for details.