Some choices of delay_reaping & reclaim_age values can lead to container databases which are never deleted.

Bug #1514528 reported by Christopher Bartz
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Confirmed
Medium
Unassigned

Bug Description

If the delay_reaping value in the [account-reaper] section of account-server.conf is higher than the reclaim_age value of the [account-replicator] (or the difference is not high, especially less than the interval of [container-updater] in container-server.conf),
then there exist the possiblity that the container databases of a deleted account are never deleted.

This is how it happens:

1. Account-reaper reaps the account, and thus sets the status of all container databases of the account to DELETED. Furthermore, the deleted_timestamp is adjusted, but NOT the reported_deleted_timestamp.
2. Account-replicator runs and removes the account database.
3. Container-Updater runs, but cannot update the containers which are still on the disk, because the respective account database is no longer existent. The reported_deleted_timestamp is still 0 after the run.
4. Container-Replicator cannot remove the container databases, because they have no up-to-date report status.

Step 3 & 4 are repeating forever. The container databases rest on the disk, and the account-server logs an error on step 3, because the account-server cannot find the account database. This error looks like this:

ERROR __call__ error with PUT /disk/807/AUTH_test/container : #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/swift/account/server.py", line 271, in __call__#012 res = method(req)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 2693, in wrapped#012 return func(*a, **kw)#012 File "/usr/lib/python2.7/site-packages/swift/common/utils.py", line 1230, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/swift/account/server.py", line 131, in PUT#012 container_policy_index)#012 File "/usr/lib/python2.7/site-packages/swift/account/backend.py", line 268, in put_container#012 self.put_record(record)#012 File "/usr/lib/python2.7/site-packages/swift/common/db.py", line 563, in put_record#012 raise DatabaseConnectionError(self.db_file, "DB doesn't exist")#012DatabaseConnectionError: DB connection error (/srv/node/disk/accounts/807/a36/c9f75af529ec2adc4b78f3ac71b9fa36/c9f75af529ec2adc4b78f3ac71b9fa36.db, 0):#012DB doesn't exist

This bug is similar to #1332439. It was observed using openstack-swift 2.5.0 on centos7.

One userside workaround could be to adjust the interval of the container-updater (so that its clear it runs before the account-replicator which removes the account-database), or to set the delay_reaping value much lower than the reclaim_age of the account. At least there should be some documentation about this issue in the official docs. There is no
warning (for example in the configuration reference) of the interdependence of these two values.

summary: - Some choices of delay_reaping & reclaim_age values can lead to
- unreclaimable container databases
+ Some choices of delay_reaping & reclaim_age values can lead to container
+ databases which are never deleted.
description: updated
Revision history for this message
Christopher Bartz (bartz) wrote :

Why is the reported_deleted_timestamp not updated after a reaper run on a container database? In https://github.com/openstack/swift/blob/master/swift/container/server.py inside the method account_update, the
account db gets informed, that the container db has been deleted. Could we not update the container database after this step (via broker.reported)?

Revision history for this message
Alistair Coles (alistair-coles) wrote :
Revision history for this message
Alistair Coles (alistair-coles) wrote :

confirmed on swift 2.15.1 (commit ff23dcdf3aa) - all background daemons run only in once mode

1. created account
2. created containers
3. deleted account with reseller admin token
4. ran account-reaper with default reaper_delay of 0
5. use swift-container-info to verify that containers are deleted
6. ran account-replicator with small reclaim_age (60) -> account dbs are removed
7. ran swift-container-updater -> traceback account-server logs:

Sep 27 16:00:24 localhost account-server: STDERR: 127.0.0.1 - - [27/Sep/2017 15:00:24] "PUT /sdb4/802/AUTH_test/c2 HTTP/1.1" 500 911 0.000695
Sep 27 16:00:24 localhost account-server: ERROR __call__ error with PUT /sdb4/802/AUTH_test/c1 : #012Traceback (most recent call last):#012 File "/home/swift/swift/swift/account/server.py", line 269, in __call__#012 res = getattr(self, req.method)(req)#012 File "/home/swift/swift/swift/common/utils.py", line 1658, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/home/swift/swift/swift/account/server.py", line 132, in PUT#012 container_policy_index)#012 File "/home/swift/swift/swift/account/backend.py", line 267, in put_container#012 self.put_record(record)#012 File "/home/swift/swift/swift/common/db.py", line 572, in put_record#012 raise DatabaseConnectionError(self.db_file, "DB doesn't exist")#012DatabaseConnectionError: DB connection error (/srv/4/node/sdb4/accounts/802/178/c8bcccab3ddbfdc34b08e9223f4f5178/c8bcccab3ddbfdc34b08e9223f4f5178.db, 0):#012DB doesn't exist

8. ran container-replicator once with small reclaim_age (60) - container dbs are NOT removed

In contrast, same set of steps but not running account-replicator leaves account dbs in place (as expected) and container-replicator removes container dbs.

Changed in swift:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to swift (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/507899

Revision history for this message
Alistair Coles (alistair-coles) wrote :

proposed a note in sample config here https://review.openstack.org/507899

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to swift (master)

Reviewed: https://review.openstack.org/507899
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=93fc9d2de86f37f62b1d6768600d0551e1b72fb6
Submitter: Jenkins
Branch: master

commit 93fc9d2de86f37f62b1d6768600d0551e1b72fb6
Author: Alistair Coles <email address hidden>
Date: Wed Sep 27 16:35:27 2017 +0100

    Add cautionary note re delay_reaping in account-server.conf-sample

    Change-Id: I2c3eea783321338316eecf467d30ba0b3217256c
    Related-Bug: #1514528

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to swift (feature/deep)

Related fix proposed to branch: feature/deep
Review: https://review.openstack.org/508700

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to swift (feature/deep)
Download full text (7.2 KiB)

Reviewed: https://review.openstack.org/508700
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=0c75ddf6fe5a4843fe60836b402f27cb1b83d8c5
Submitter: Zuul
Branch: feature/deep

commit 93fc9d2de86f37f62b1d6768600d0551e1b72fb6
Author: Alistair Coles <email address hidden>
Date: Wed Sep 27 16:35:27 2017 +0100

    Add cautionary note re delay_reaping in account-server.conf-sample

    Change-Id: I2c3eea783321338316eecf467d30ba0b3217256c
    Related-Bug: #1514528

commit c6aea4b3730c937c41815831a7b4d60ff2899fcb
Author: Tim Burke <email address hidden>
Date: Wed Sep 27 19:19:53 2017 +0000

    Fix intermittent failure in test_x_delete_after

    X-Delete-After: 1 is known to be flakey; use 2 instead.

    When the proxy receives an X-Delete-After header, it automatically
    converts it to an X-Delete-At header based on the current time. So far,
    so good. But in normalize_delete_at_timestamp we convert our

        time.time() + int(req.headers['X-Delete-After'])

    to a string representation of an integer and in the process always round
    *down*. As a result, we lose up to a second worth of object validity,
    meaning the object server can (rarely) respond 400, complaining that the
    X-Delete-At is in the past.

    Change-Id: Ib5e5a48f5cbed0eade8ba3bca96b26c82a9f9d84
    Related-Change: I643be9af8f054f33897dd74071027a739eaa2c5c
    Related-Change: I10d3b9fcbefff3c415a92fa284a1ea1eda458581
    Related-Change: Ifdb1920e5266aaa278baa0759fc0bfaa1aff2d0d
    Related-Bug: #1597520
    Closes-Bug: #1699114

commit 5c76b9e691166acc1f7b8483aaa3980ebc70bd3a
Author: Alistair Coles <email address hidden>
Date: Wed Sep 27 14:11:14 2017 +0100

    Add concurrent_gets to proxy.conf man page

    Change-Id: Iab1beff4899d096936c0e5915f3ec32364b3e517
    Closes-Bug: #1559347

commit b4f08b6090057897ac647ba6331a4ec867b8e3b8
Author: Jens Harbott <email address hidden>
Date: Wed Sep 27 09:10:54 2017 +0000

    Fix functest for IPv6 endpoints

    Currently the functional tests fail if the storage_url contains a quoted
    IPv6 address because we try to split on ':'.

    But actually we don't need to split hostname and port only in order to
    combine it back together lateron. Use the standard urlparse() function
    instead and work with the 'netloc' part of the URL which keeps hostname
    and port together.

    Change-Id: I64589e5f2d6fb3cebc6768dc9e4de6264c09cbeb
    Partial-Bug: 1656329

commit 53ab6f2907eff2bb90528010d881f2f87ee02505
Author: Alistair Coles <email address hidden>
Date: Tue Sep 26 11:43:53 2017 +0100

    Assert memcached connection error is logged

    Follow up to [1] - change logger mocking so that we can
    assert the memcached connection error is logged.

    [1] Related-Change: I97c5420b4b4ecc127e9e94e9d0f91fbe92a5f623

    Change-Id: I87cf4245082c5e0f0705c2c14ddfc0b5d5d89c06

commit e501ac7d2be5c11b2ed0005885c84023054ec041
Author: Matthew Oliver <email address hidden>
Date: Thu Sep 3 12:19:05 2015 +1000

    Fix memcached exception out of range stacktrace

    When a memecached server goes offline in the middle of a
    MemcahceRing...

Read more...

tags: added: in-feature-deep
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to swift (feature/s3api)

Related fix proposed to branch: feature/s3api
Review: https://review.openstack.org/512277

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: feature/s3api
Review: https://review.openstack.org/512283

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on swift (feature/s3api)

Change abandoned by Alistair Coles (<email address hidden>) on branch: feature/s3api
Review: https://review.openstack.org/512283
Reason: I was just trying to get sensible topic

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to swift (feature/s3api)
Download full text (23.2 KiB)

Reviewed: https://review.openstack.org/512277
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=f94d6567a7e2e8b3ca1168b4a41c42c1ee371af5
Submitter: Zuul
Branch: feature/s3api

commit 24188beb81d39790034fa0902246163a7bf54c91
Author: Samuel Merritt <email address hidden>
Date: Thu Oct 12 16:13:25 2017 -0700

    Remove some leftover threadpool cruft.

    Change-Id: I43a1a428bd96a2e18aac334c03743a9f94f7d3e1

commit 1d67485c0b935719e0c8999eb353dfd84713add6
Author: Samuel Merritt <email address hidden>
Date: Fri Apr 15 12:43:44 2016 -0700

    Move all monkey patching to one function

    Change-Id: I2db2e53c50bcfa17f08a136581cfd7ac4958ada2

commit 407f5394f0f5cb422c06b4e5b2f9fbfdb07782d1
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Oct 12 08:12:38 2017 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://docs.openstack.org/i18n/latest/reviewing-translation-import.html

    Change-Id: I628cb09aa78d8e339b4762a3c9ed8aed43941261

commit 45ca39fc68cdb42b382c1638a92cc8d3cec5529a
Author: Clay Gerrard <email address hidden>
Date: Tue Oct 10 11:47:50 2017 -0700

    add mangle_client_paths to example config

    Change-Id: Ic1126fc95e8152025fccf25356c253facce3e3ec

commit 94bac4ab2fe65104d602378e8e49c37b8187a75d
Author: Tim Burke <email address hidden>
Date: Fri May 12 10:55:21 2017 -0400

    domain_remap: stop mangling client-provided paths

    The root_path option for domain_remap seems to serve two purposes:
     - provide the first component (version) for the backend request
     - be an optional leading component for the client request, which
       should be stripped off

    As a result, we have mappings like:

       c.a.example.com/v1/o -> /v1/AUTH_a/c/o

    instead of

       c.a.example.com/v1/o -> /v1/AUTH_a/c/v1/o

    which is rather bizarre. Why on earth did we *ever* start doing this?

    Now, this second behavior is managed by a config option
    (mangle_client_paths) with the default being to disable it.

    Upgrade Consideration
    =====================

    If for some reason you *do* want to drop some parts of the
    client-supplied path, add

       mangle_client_paths = True

    to the [filter:domain_remap] section of your proxy-server.conf. Do this
    before upgrading to avoid any loss of availability.

    UpgradeImpact
    Change-Id: I87944bfbf8b767e1fc36dbc7910305fa1f11eeed

commit a4a5494fd2fe8a43a5d50a21a1951266cc7c4212
Author: Alistair Coles <email address hidden>
Date: Mon Oct 9 11:33:28 2017 +0100

    test account autocreate listing format

    Related-Change: Id3ce37aa0402e2d8dd5784ce329d7cb4fbaf700d
    Change-Id: I50c22225bbebff71600bea9158bda1edd18b48b0

commit 8b7f15223cde4c19fd9cbbd97e8ad79a1b4afa8d
Author: Alistair Coles <email address hidden>
Date: Mon Oct 9 10:06:19 2017 +0100

    Add example to container-sync-realms.conf.5 man page

    Related-Change: I0760ce149e6d74f2b3f1badebac3e36da1ab7e77

    Change-Id: I129de42f91d7924c7bcb9952f17fe8a1a10ae219

commit 816331155c624c444ed123bcab412...

tags: added: in-feature-s3api
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.