Container DELETEs should only need a node majority to succeed

Bug #665164 reported by gholt on 2010-10-22
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Medium
Mike Barton

Bug Description

Container DELETEs should only need a node majority to succeed for the proxy to return success. Currently all the nodes have to succeed. I believe the current behavior is a holdover from an earlier rendition of container db controls.

gholt (gholt) on 2010-10-22
security vulnerability: yes → no
visibility: private → public
clayg (clay-gerrard) on 2010-11-23
tags: added: proxy
tags: added: low-hanging-fruit
Greg Lange (greglange) on 2010-12-06
Changed in swift:
assignee: nobody → Greg Lange (greglange)
clayg (clay-gerrard) wrote :

from the source:

                    # If even one node doesn't do the delete, we can't be sure
                    # what the outcome will be once everything is in sync; so
                    # we 503.

?

gholt (gholt) wrote :

Yeah, the argument against (probably my) code comment was that resurrected containers are okay. IIRC it can only happen when the "no!" node had an object in it still that the other nodes didn't yet know about. If we allow the 204 in such a case, the user will think their container was deleted only to see it reappear with the offending object once things get in sync. In the vast majority of cases where the 3rd node says no, everything will end up meaning yes.

I put this bug in because Mike always pesters me about it, so you might grab his opinion. Generally speaking, we should 503 when we're not certain of the outcome. But at some point, we have to consider things certain enough. I'm fine with this one being on either side of that line.

Greg Lange (greglange) wrote :

I'm backing away from this bug with my hands up in the air.

I like how it currently works.

I hate how it currently works. If a drive dies or is otherwise
unavailable, users can't delete their container until it's fixed.
That seems to be against our design goals.

The only down side I can see to returning success on majority is that
we could resurrect a container in the one in a gazillion chance that
the only good copy of a file's container listing is on the machine
that's currently unavailable. The horrors?

Andrew Clay Shafer (littleidea) wrote :

Is there a reason not to use something like the tombstone approach used with
objects to solve this?

Then remove everything asynchronously once all the nodes are ack'd on the
delete?

Is that unworkable for some reason?

On Tue, Dec 7, 2010 at 1:59 PM, Mike Barton <email address hidden>wrote:

> I hate how it currently works. If a drive dies or is otherwise
> unavailable, users can't delete their container until it's fixed.
> That seems to be against our design goals.
>
> The only down side I can see to returning success on majority is that
> we could resurrect a container in the one in a gazillion chance that
> the only good copy of a file's container listing is on the machine
> that's currently unavailable. The horrors?
>
> --
> You received this bug notification because you are subscribed to
> OpenStack Object Storage (swift).
> https://bugs.launchpad.net/bugs/665164
>
> Title:
> Container DELETEs should only need a node majority to succeed
>
> Status in OpenStack Object Storage (Swift):
> New
>
> Bug description:
> Container DELETEs should only need a node majority to succeed for the
> proxy to return success. Currently all the nodes have to succeed. I believe
> the current behavior is a holdover from an earlier rendition of container db
> controls.
>
>
>

gholt (gholt) wrote :

No, that's not unworkable. The argument against is that you'd be deleting an object the user never issued a delete for (but did issue a delete on the container, though they didn't know the object was there). Currently, you can only delete empty containers, so you have to explicitly delete objects first. This is so a user can't just delete a container with millions of objects causing us to have to have background jobs that clean that up (and don't allow conflicts with a new container with the same name, etc.)

Like Mike said, the one in a... gazillion? chance of container resurrection is probably fine. But for some reason somebody was dead set against resurrections at some point. My memory is horrible, so it was probably me. :) I'm really fine with very rare resurrections. Kinda like a lost+found. Heh.

I'd rather err on resurrecting something the user thought they'd deleted implicitly than err on deleting something they never issued an explicit delete for.

Mike Barton (redbo) wrote :

The container db has a deleted timestamp that gets replicated to
peers, and that basically serves the same purpose as a tombstone.

clayg (clay-gerrard) wrote :

I think Redbo makes a good point about a single drive failure making the system unable to return success being unfortunate. I wasn't really thinking about it that way. I'm starting to lean towards "certain enough".

For example, if we only return 500 when one of the nodes is specifically returning 419, and allow 202's if we get a timeout or 507 from just one node - the one in a gazillion chance that replication leads to a resurrected container almost assuredly trends toward one in a Googolplex.

clayg (clay-gerrard) on 2010-12-09
tags: added: container
removed: low-hanging-fruit
Changed in swift:
importance: Undecided → Medium
gholt (gholt) wrote :

I'm pretty sure Mike has fixed this while working on another issue.

Changed in swift:
assignee: Greg Lange (greglange) → Mike Barton (redbo)
status: New → Fix Committed
Thierry Carrez (ttx) on 2011-04-15
Changed in swift:
milestone: none → 1.3.0
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers