Container/account disk drive fault results replication on all rest drives

Bug #1675500 reported by Pavel Kvasnička
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
High
Pavel Kvasnička

Bug Description

We use separate drives for account and container servers, one per server. When 2 drives crashed we noticed that used space on all rest account and container DB storages increased 2 times.

Space was consumed slowly - every replication cycle was consumed 1/server_count of space. There was increase from 15% to 30% used space where trends stopped and then increased to 45% when 2 drives were failed. When drive was replaced after few hours, everything returned to normal state.

------------------------------------------

How to reproduce with SAIO:

  * configure 2 replica with 4 servers or increase container servers count to have drives count >= replicas + 2,

  * create some containers
for c in {1..50} ; do swift -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing post cont$c ; done

  * configure one (or more) container servers to see device as unmounted: set mount_check = true in /etc/swift/container-server/2.conf (check_mount fails on directory that is not mounted device) and restart container server(s),

  * replicate, replicate - every replicator cycle replicates databases of unmounted device into next servers
for n in {1..2} ; do for i in 1 3 4 ; do swift-container-replicator /etc/swift/container-server/$i.conf -o ; done ; done

What to expect with 4 devices, 2 replica and 1 "unmounted" drive: databases from unmounted drive is replicated on 2 more devices (1 is unmounted and 1 is primary location, 2 are handoff) but it should be only 1 handoff.
With 8 devices you should have 6 more replicas (8 = 1 unmounted, 1 is primary location, 6 handoffs).

Revision history for this message
Pavel Kvasnička (pavel-kvasnicka) wrote :
Changed in swift:
assignee: nobody → Pavel Kvasnička (pavel-kvasnicka)
status: New → In Progress
description: updated
Tim Burke (1-tim-z)
Changed in swift:
importance: Undecided → High
Revision history for this message
Tim Burke (1-tim-z) wrote :

I was a little worried that I wasn't understanding at first, but looking just at db files made it rather clear: https://gist.github.com/tipabu/37b01c6e08bb1328f5877995771429e3

Revision history for this message
Romain LE DISEZ (rledisez) wrote :

I tried to reproduce with 3 replicas on a 6 devices ring. I'm unable to reproduce.

After the first container-replicator's run, a new replica of each impacted partitions was created on their first handoff device.

Second run of container-replicator didn't create more replicas.

Revision history for this message
Matthew Oliver (matt-0) wrote :

That's strange, because I recreated on a 2 replica and 4 and 8 device SAIO.

Then increased the replicas to 3 (8 devs), and I still see the same problem: http://paste.openstack.org/show/604778/

It's alot easier to see if you only create 1 container.

I can see whats happening, the problem is, the new handoff node's replicator will always find the next handoff node in the ring and pass it to it, because the unounted primary causes a get_more_nodes (from where I am). So will always pass it along until its remounted.

So the real problem here is the same thing the object-reconstructor had where the handoffs just pass off to another handoff.. even though they themselves are hand offs.

I'll put together a hack to confirm/fix this behaviour, but I have followed it through a debugger.

Revision history for this message
Matthew Oliver (matt-0) wrote :

Maybe something like this patch. It will always use the first few handoffs and only skip if it's itself. Further the scanning to find the handoff node to help global region improvements still happens for those who do want to run write affinity.

The downside is, when the replicator runs on the first handoff it'll pop it off to another, so if a drive it down there will always be an extra one. But one extra is better then _all_ of them.

This is just a first attempt as a possible way forward, no tests have been added.

Revision history for this message
Matthew Oliver (matt-0) wrote :

opps that patch the spawning turned off to help with the debugger

Revision history for this message
Kota Tsuyuzaki (tsuyuzaki-kota) wrote :

@Matt

Thanks for worthful analysis for this bug and the reason makes me sense. That is from wrong usage of the more_nodes iterator.

However, your fix looks still to be needed to improve. I wrote unit test for the case that, handoff nodes attempts to push thier local to ....

With your patch, the handoff node still try to replicate 4th replica even if it uses 3 replica ring because the replicator doesn't count his local as the third replica yeah? And then, I think it causes a race to push the replica between handoff nodes.

Thinking of similar case in the object-replicator, the object-replicators in handoffs attempt to push its local *only* to primaries, and the object-replicators in *primaries* are able to push the local replica to a handoff when a device failure found while syncing.

My patch for db_replicator with this comment is designed to work as well as the object-replicator, i.e. the db replciator in a handoff never tries to push the replica to handoffs, which is desirable behavior, I think.

Thought?

Revision history for this message
Pavel Kvasnička (pavel-kvasnicka) wrote :

@Kota

Please, notice patch https://review.openstack.org/#/c/448480/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.openstack.org/454174

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/454174
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=bcd0eb70afacae1483e9e53d5a4082536770aed8
Submitter: Jenkins
Branch: master

commit bcd0eb70afacae1483e9e53d5a4082536770aed8
Author: Pavel Kvasnička <email address hidden>
Date: Wed Mar 22 09:59:53 2017 +0100

    Container drive error results double space usage on rest drives

    When drive with container or account database is unmounted
    replicator pushes database to handoff location. But this
    handoff location finds replica with unmounted drive and
    pushes database to the *next* handoff until all handoffs has
    a replica - all container/account servers has replicas of
    all unmounted drives.

    This patch solves:
    - Consumption of iterator on handoff location that results in
      replication to the next and next handoff.
    - StopIteration exception stopped not finished loop over
      available handoffs if no more nodes exists for db replication
      candidency.

    Regression was introduced in 2.4.0 with rsync compression.

    Co-Author: Kota Tsuyuzaki <email address hidden>

    Change-Id: I344f9daaa038c6946be11e1cf8c4ef104a09e68b
    Closes-Bug: 1675500

Changed in swift:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/456093

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/456095

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (stable/newton)

Reviewed: https://review.openstack.org/456093
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=80f550f80e3ff7e3ebd266dafe34c154c9caf861
Submitter: Jenkins
Branch: stable/newton

commit 80f550f80e3ff7e3ebd266dafe34c154c9caf861
Author: Pavel Kvasnička <email address hidden>
Date: Wed Mar 22 09:59:53 2017 +0100

    Container drive error results double space usage on rest drives

    When drive with container or account database is unmounted
    replicator pushes database to handoff location. But this
    handoff location finds replica with unmounted drive and
    pushes database to the *next* handoff until all handoffs has
    a replica - all container/account servers has replicas of
    all unmounted drives.

    This patch solves:
    - Consumption of iterator on handoff location that results in
      replication to the next and next handoff.
    - StopIteration exception stopped not finished loop over
      available handoffs if no more nodes exists for db replication
      candidency.

    Regression was introduced in 2.4.0 with rsync compression.

    Co-Author: Kota Tsuyuzaki <email address hidden>

    Change-Id: I344f9daaa038c6946be11e1cf8c4ef104a09e68b
    Closes-Bug: 1675500

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (stable/ocata)

Reviewed: https://review.openstack.org/456095
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=1582f8a559ac03be7d3976558e66339d8c3a1811
Submitter: Jenkins
Branch: stable/ocata

commit 1582f8a559ac03be7d3976558e66339d8c3a1811
Author: Pavel Kvasnička <email address hidden>
Date: Wed Mar 22 09:59:53 2017 +0100

    Container drive error results double space usage on rest drives

    When drive with container or account database is unmounted
    replicator pushes database to handoff location. But this
    handoff location finds replica with unmounted drive and
    pushes database to the *next* handoff until all handoffs has
    a replica - all container/account servers has replicas of
    all unmounted drives.

    This patch solves:
    - Consumption of iterator on handoff location that results in
      replication to the next and next handoff.
    - StopIteration exception stopped not finished loop over
      available handoffs if no more nodes exists for db replication
      candidency.

    Regression was introduced in 2.4.0 with rsync compression.

    Co-Author: Kota Tsuyuzaki <email address hidden>

    Change-Id: I344f9daaa038c6946be11e1cf8c4ef104a09e68b
    Closes-Bug: 1675500

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.14.0

This issue was fixed in the openstack/swift 2.14.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.10.2

This issue was fixed in the openstack/swift 2.10.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.13.1

This issue was fixed in the openstack/swift 2.13.1 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers