Container DELETEs should only need a node majority to succeed

Bug #665164 reported by gholt
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Medium
Mike Barton

Bug Description

Container DELETEs should only need a node majority to succeed for the proxy to return success. Currently all the nodes have to succeed. I believe the current behavior is a holdover from an earlier rendition of container db controls.

gholt (gholt)
security vulnerability: yes → no
visibility: private → public
clayg (clay-gerrard)
tags: added: proxy
tags: added: low-hanging-fruit
Greg Lange (greglange)
Changed in swift:
assignee: nobody → Greg Lange (greglange)
Revision history for this message
clayg (clay-gerrard) wrote :

from the source:

                    # If even one node doesn't do the delete, we can't be sure
                    # what the outcome will be once everything is in sync; so
                    # we 503.

?

Revision history for this message
gholt (gholt) wrote :

Yeah, the argument against (probably my) code comment was that resurrected containers are okay. IIRC it can only happen when the "no!" node had an object in it still that the other nodes didn't yet know about. If we allow the 204 in such a case, the user will think their container was deleted only to see it reappear with the offending object once things get in sync. In the vast majority of cases where the 3rd node says no, everything will end up meaning yes.

I put this bug in because Mike always pesters me about it, so you might grab his opinion. Generally speaking, we should 503 when we're not certain of the outcome. But at some point, we have to consider things certain enough. I'm fine with this one being on either side of that line.

Revision history for this message
Greg Lange (greglange) wrote :

I'm backing away from this bug with my hands up in the air.

I like how it currently works.

Revision history for this message
Mike Barton (redbo) wrote : Re: [Bug 665164] Re: Container DELETEs should only need a node majority to succeed

I hate how it currently works. If a drive dies or is otherwise
unavailable, users can't delete their container until it's fixed.
That seems to be against our design goals.

The only down side I can see to returning success on majority is that
we could resurrect a container in the one in a gazillion chance that
the only good copy of a file's container listing is on the machine
that's currently unavailable. The horrors?

Revision history for this message
Andrew Clay Shafer (littleidea) wrote :

Is there a reason not to use something like the tombstone approach used with
objects to solve this?

Then remove everything asynchronously once all the nodes are ack'd on the
delete?

Is that unworkable for some reason?

On Tue, Dec 7, 2010 at 1:59 PM, Mike Barton <email address hidden>wrote:

> I hate how it currently works. If a drive dies or is otherwise
> unavailable, users can't delete their container until it's fixed.
> That seems to be against our design goals.
>
> The only down side I can see to returning success on majority is that
> we could resurrect a container in the one in a gazillion chance that
> the only good copy of a file's container listing is on the machine
> that's currently unavailable. The horrors?
>
> --
> You received this bug notification because you are subscribed to
> OpenStack Object Storage (swift).
> https://bugs.launchpad.net/bugs/665164
>
> Title:
> Container DELETEs should only need a node majority to succeed
>
> Status in OpenStack Object Storage (Swift):
> New
>
> Bug description:
> Container DELETEs should only need a node majority to succeed for the
> proxy to return success. Currently all the nodes have to succeed. I believe
> the current behavior is a holdover from an earlier rendition of container db
> controls.
>
>
>

Revision history for this message
gholt (gholt) wrote :

No, that's not unworkable. The argument against is that you'd be deleting an object the user never issued a delete for (but did issue a delete on the container, though they didn't know the object was there). Currently, you can only delete empty containers, so you have to explicitly delete objects first. This is so a user can't just delete a container with millions of objects causing us to have to have background jobs that clean that up (and don't allow conflicts with a new container with the same name, etc.)

Like Mike said, the one in a... gazillion? chance of container resurrection is probably fine. But for some reason somebody was dead set against resurrections at some point. My memory is horrible, so it was probably me. :) I'm really fine with very rare resurrections. Kinda like a lost+found. Heh.

I'd rather err on resurrecting something the user thought they'd deleted implicitly than err on deleting something they never issued an explicit delete for.

Revision history for this message
Mike Barton (redbo) wrote :

The container db has a deleted timestamp that gets replicated to
peers, and that basically serves the same purpose as a tombstone.

Revision history for this message
clayg (clay-gerrard) wrote :

I think Redbo makes a good point about a single drive failure making the system unable to return success being unfortunate. I wasn't really thinking about it that way. I'm starting to lean towards "certain enough".

For example, if we only return 500 when one of the nodes is specifically returning 419, and allow 202's if we get a timeout or 507 from just one node - the one in a gazillion chance that replication leads to a resurrected container almost assuredly trends toward one in a Googolplex.

clayg (clay-gerrard)
tags: added: container
removed: low-hanging-fruit
Changed in swift:
importance: Undecided → Medium
Revision history for this message
gholt (gholt) wrote :

I'm pretty sure Mike has fixed this while working on another issue.

Changed in swift:
assignee: Greg Lange (greglange) → Mike Barton (redbo)
status: New → Fix Committed
Thierry Carrez (ttx)
Changed in swift:
milestone: none → 1.3.0
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.