DB with empty *_info table hangs servers

Bug #1747689 reported by Ondřej Nový
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Unassigned

Bug Description

Hi,

just spotted in our production this situation. We have container DB file (attached), with empty container_info table.

This DB completely hangs container-server and broke container-replicator with this error in log:

Feb 4 22:12:43 sdn-swift-store7 swift-container-replicator: ERROR reading db /data/ssd0-127G/containers/51502/5ea/c92e719d0362b50a48b16517a9f355ea/c92e719d0362b50a48b16517a9f355ea.db: #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/common/db_replicator.py", line 478, in _replicate_object#012 now - (self.reclaim_age * 2))#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 838, in reclaim#012 DatabaseBroker._reclaim(self, conn, age_timestamp)#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 856, in _reclaim#012 self.db_type).fetchone()[0]#012TypeError: 'NoneType' object has no attribute '__getitem__'

Removing this DB fixies problem.

Revision history for this message
Ondřej Nový (onovy) wrote :
Revision history for this message
Ondřej Nový (onovy) wrote :
Revision history for this message
Samuel Merritt (torgomatic) wrote :

Are there many such databases in the cluster, or is it just this one?

Revision history for this message
Ondřej Nový (onovy) wrote :

just this one

Revision history for this message
Ondřej Nový (onovy) wrote :

That container was created at
Sunday 4. February 2018 19:45:33.307 GMT
(according to container_info.created_at from different replica).

at ~21:12 GMT our monitoring detected hanged container server.

So this is not problem with migration of old DB format. It was brand new container DB (unique name, newer existed before).

Revision history for this message
Ondřej Nový (onovy) wrote :

this container was 'short lived'. We have script which creates random-named container, object inside it and than remove c+o. We are using it for periodically checking of Swift cluster.

attaching that script.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/541252
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=bfe52a2e3562243b1b488e339a965642d846a30e
Submitter: Zuul
Branch: master

commit bfe52a2e3562243b1b488e339a965642d846a30e
Author: Ondřej Nový <email address hidden>
Date: Tue Feb 6 13:08:42 2018 +0100

    Quarantine DB without *_stat row

    Closes-Bug: #1747689
    Change-Id: Ief6bd0ba6cf675edd8ba939a36fb9d90d3f4447f

Changed in swift:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/deep)

Fix proposed to branch: feature/deep
Review: https://review.openstack.org/544818

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/deep)
Download full text (4.9 KiB)

Reviewed: https://review.openstack.org/544818
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=629d6b364f6fb8735732fcb8a5003db7f076d37c
Submitter: Zuul
Branch: feature/deep

commit 57306181f1ffde3da1849c6d27e0b001d3bf5ea3
Author: Gerard Gine <email address hidden>
Date: Tue Feb 13 17:00:22 2018 -0800

    Improved usage of args in .functests

    If we're calling the script with any arguments, --pretty will not
    be passed to ostestr.
    Also redirected cd commands' output to /dev/null in .functests.

    Change-Id: I6e7e391c7e1659b86ab12eae4362b565218917b2

commit 8dc40125ce8da616c713a778029b1b29903a1537
Author: John Dickinson <email address hidden>
Date: Tue Feb 13 14:38:22 2018 -0800

    add keystonemiddleware to extras area

    Change-Id: I805cf220aee4af69bf4984cc148d69520e958073

commit 1dddcb533e16602cca5a057dbda12ca4049ea64e
Author: Tim Burke <email address hidden>
Date: Thu Feb 8 13:45:21 2018 -0800

    probetests: Close our handle to subprocess' stdouts

    Otherwise, you could start running out of file handles when running all
    tests. Previously, I'd regularly exceed 1k fds and then all subsequent
    tests would fail; now, it maxes out around 90.

    Change-Id: Ib53f15b5d2be95c70cce8903ea0ef5c334479837

commit f0f3de34626269052b1d7022d2e3cd3f9f0a377e
Author: malei <email address hidden>
Date: Fri Feb 9 22:43:21 2018 +0800

    fix typos in swift

    Change-Id: I14dd433ed7c7fe789d5d04defbf9eec5bffc301a

commit 54918ad1db574bbc2c0b429836609a7ddb1efb13
Author: baiwenteng <email address hidden>
Date: Fri Feb 9 16:39:32 2018 +0800

    Fix grammar error

    Change-Id: Ic375b6c6ebf3f66860b065785e75b8d47552be28

commit 753715499c96c1602bfec5ed241b81b82dd1f1db
Author: OpenStack Proposal Bot <email address hidden>
Date: Fri Feb 9 07:09:42 2018 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://docs.openstack.org/i18n/latest/reviewing-translation-import.html

    Change-Id: I1409c8eaca4fbe1170d2e998b35e7fdacf4b86b4

commit a3d2aaba6470645345b19d52f0dbbfd51a4d5cf4
Author: baiwenteng <email address hidden>
Date: Fri Feb 9 12:20:59 2018 +0800

    Fix typos in swift

    Change-Id: I0982b0046a16fda0a39d9b31402b2e4b3160a5c4

commit 1f4ebbc9900c113fa4447dd51079cc2cb9aa3ceb
Author: Alistair Coles <email address hidden>
Date: Thu Feb 8 12:59:36 2018 +0000

    kill orphans during probe test setup

    orphans processes sometimes cause probe test failures so
    get rid of them before each test.

    Change-Id: I4ba6748d30fbb28371f13aa95387c49bc8223402

commit c9410c7dd482cc1faefdfd9d9c83d225e7d28e8f
Author: Thiago da Silva <email address hidden>
Date: Thu Feb 8 18:18:15 2018 -0500

    Move eventlet patch before call to loadapp

    Ran into an eventlet bug[0] while integration Swift/Barbican
    in TripleO. It is very similar to a previous bug related
    to keystonemiddleware[1]. Suggestion from urllib3[2] is to
    patch eventlet "as early as possible". Traceback[3] shows that
    urllib3 is being imported before the eventlet patch, so moved
    the patch to be...

Read more...

tags: added: in-feature-deep
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/s3api)

Fix proposed to branch: feature/s3api
Review: https://review.openstack.org/548052

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: feature/s3api
Review: https://review.openstack.org/548193

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on swift (feature/s3api)

Change abandoned by Kota Tsuyuzaki (<email address hidden>) on branch: feature/s3api
Review: https://review.openstack.org/548052
Reason: This is affected by the new s3api functests added in recent. Use https://review.openstack.org/#/c/548193/ instead.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/s3api)
Download full text (30.7 KiB)

Reviewed: https://review.openstack.org/548193
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=9d43f4e190802cb0cad507a65245e4dd70fda7db
Submitter: Zuul
Branch: feature/s3api

commit b3f1558acd2f5a4df3f039070ca5bbc33935e822
Author: Kazuhiro MIYAHARA <email address hidden>
Date: Tue Feb 13 05:16:27 2018 +0000

    Fix expirer's invalid task object names in unit tests

    Object-expirer's task name should be in format of
    "<timestamp>-<account>/<container>/<obj>". In object-expirer
    implementation, ValueError is catched and handled when expirer's task
    objects have invalid name. But in actual swift cluster, invalid task
    object name is not created because task object is created by
    object-server.
    However, without the ValueError catching, some unit tests fail,
    because the unit tests create invalid task object names.

    This patch fixes invalid task object names in unit tests. The
    ValueError catch is remained for unexpected errors, but in the case
    the task will be skipped.

    This patch will help to refactor expirer's task object parsing.

    Change-Id: I8fab8fd180481ce9e97c945904c5c89eec037110

commit 4b19ac772364778a4b96d7e18834db9a7645f482
Author: Tim Burke <email address hidden>
Date: Thu Feb 1 14:30:19 2018 -0800

    py3: port common/storage_policy.py

    Change-Id: I7030280a8495628df9ed8edcc8abc31f901da72e

commit 25540a415e7e36bb08a01a14ca41e2d7591e62cb
Author: Tim Burke <email address hidden>
Date: Thu Feb 22 11:08:49 2018 -0800

    Tighten up assertions around expirer's concurrency

    In particular, test that each work item is only done *once*.

    Change-Id: I9cc610bffb2aa9a2f2b05f4c49e574ab56d05201
    Related-Change: Ic0075a3718face8c509ed0524b63d9171f5b7d7a

commit 5017864133b5af289f205afaf76ffe4518688b3f
Author: melissaml <ma.lei@99cloud.net>
Date: Mon Feb 26 15:48:31 2018 +0800

    Fix the incorrect reference links

    TrivialFix
    [1] is the installation guide for OpenStack components, obviously,
    we need [1] in the docs.

    [1] https://docs.openstack.org/latest/install/

    Change-Id: I3c6fe7327f5552cc2b8f0f0e42b41f8e989a0a7e

commit 58f5d89066adbd54403ad315ffe1f9b2f05aa583
Author: Kazuhiro MIYAHARA <email address hidden>
Date: Tue Feb 13 03:36:04 2018 +0000

    Remove confusing assertion from expirer's unit test

    In test_expirer.TestObjectExpirer.test_process_based_concurrency,
    an assertion checks that expirer execute tasks in round-robin order
    for target containers. But the assertion depends on task object path,
    because task assignation for each process depends on md5 of task
    object path. The dependency makes the assetion confusing.

    Now, we have test_expirer.TestObjectExpirer.test_round_robin_order which
    is added in [1]. So this patch remove the confusing assertion.

    This patch will help to refactor expirer's task object parsing.
    I will push patch for the refactoring after this patch.

    [1]: https://review.openstack.org/#/c/538171

    Change-Id: Ic0075a3718face8c509ed0524b63d9171f5b7d7a

commit 6060af8db96e23d32f3ecc3d02f7f...

tags: added: in-feature-s3api
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.18.0

This issue was fixed in the openstack/swift 2.18.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.