Container updater crashes the account server repeatedly

Bug #1861233 reported by Pete Zaitcev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
New
Undecided
Unassigned

Bug Description

The story starts with logs having a lot of these:

Jan 12 03:39:55 wb-sdc-controller02 account-server: ERROR __call__ error with PUT /sdd/504/AUTH_c7a7a88d5ae84c74aaf145bc1cdfec16/gnocchi.09d85fbf-aa3a-4746-a06a-f7cd68af9e1f : #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/swift/account/server.py", line 269, in __call__#012 res = getattr(self, req.method)(req)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 1672, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/swift/account/server.py", line 132, in PUT#012 container_policy_index)#012 File "/usr/lib/python2.7/site-packages/swift/account/backend.py", line 267, in put_container#012 self.put_record(record)#012 File "/usr/lib/python2.7/site-packages/swift/common/db.py", line 574, in put_record#012 raise DatabaseConnectionError(self.db_file, "DB doesn't exist")#012DatabaseConnectionError: DB connection error (/srv/node/sdd/accounts/504/0b0/7e2138feb881d8da10fa9f0c1dbee0b0/7e2138feb881d8da10fa9f0c1dbee0b0.db, 0):#012DB doesn't exist
Jan 12 03:39:55 wb-sdc-controller02 account-server: 172.20.164.39 - - [11/Jan/2020:22:09:55 +0000] "PUT /sdd/504/AUTH_c7a7a88d5ae84c74aaf145bc1cdfec16/gnocchi.09d85fbf-aa3a-4746-a06a-f7cd68af9e1f" 500 854 "-" "-" "container-updater 1" 0.0008 "-" 70 0

As seen in the path, updater thinks it's updating an account for gnocchi,
but active account of gnocchi uses a different ID.

I deduce that the operator did something to re-deploy Gnocchi in such
a way that old account was eliminated and a new one re-established.
However, the container was not completely cleared up, orphaned, and
it had an outstanding update.

In this situation, updater keeps trying to process the orphan container,
but receives 500 from the account server, and thus loops over and over.

Here's a human-readable traceback:

ERROR __call__ error with PUT /sdb/542/AUTH_47db2ff81f5a4e8a94fbc375385985c1/gnocchi.66948ba8-f4a4-4c6e-8234-765bb36c4f9c :
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/swift/account/server.py", line 269, in __call__
    res = getattr(self, req.method)(req)
  File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 1672, in _timing_stats
    resp = func(ctrl, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/swift/account/server.py", line 132, in PUT
    container_policy_index)
  File "/usr/lib/python2.7/site-packages/swift/account/backend.py", line 267, in put_container
    self.put_record(record)
  File "/usr/lib/python2.7/site-packages/swift/common/db.py", line 574, in put_record
    raise DatabaseConnectionError(self.db_file, "DB doesn't exist")
DatabaseConnectionError: DB connection error (/srv/node/sdb/accounts/542/cec/87a44c8976382111bf69749b2ae26cec/87a44c8976382111bf69749b2ae26cec.db, 0):
DB doesn't exist

The problem is identified in Swift 2.17.1.

Suggestion:
- verify that the current master is susceptible to this
- stuff the traceback in the account server, return 404 instead of 500
- continue the development of the dark data plugin which can clean up
orphaned containers and objects when accounts get removed.

Pete Zaitcev (zaitcev)
description: updated
Revision history for this message
Romain LE DISEZ (rledisez) wrote :

Seems related to https://review.opendev.org/#/c/704435/

Discussion on the best solution is still ongoing. Best idea we had[1] for now is to re-create the account in a DELETED state so that the reaper can make another pass at cleaning objects.

[1] http://eavesdrop.openstack.org/irclogs/%23openstack-swift/%23openstack-swift.2020-01-27.log.html#t2020-01-27T21:58:26

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.