Appserver version differences cause 412 errors on PATCH requests

Bug #637323 reported by James Westby on 2010-09-13
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Launchpad itself
High
Unassigned

Bug Description

The inclusion of the appserver revno in the etag calculation makes rollouts more
fragile, as intermediate states can cause failures for API clients.

James Westby (james-w) wrote :

Using edge btw.

James Westby (james-w) wrote :

Production seems to not be showing these issues.

Thanks,

James

Leonard Richardson (leonardr) wrote :

James, what objects are you manipulating through the web service? I bet it's bugs. Ursula says she's manipulating bugs. This is the same behavior we were seeing with bug 336866, and bugs are the most complicated objects published through the web service.

Does the first PATCH you ever send fail with a 412, or are you doing more than one PATCH?

Bug 336886 had a number of causes, but they all had to do with code that triggered side effects. You'd make a change to field 'foo', and as a side effect field 'bar' would be changed. But lazr.restful wouldn't hear about the change to 'bar', and when it sent you the 209 response code with the "new" representation it would include the change to "foo" but not to "bar". I'm not sure exactly what you mean when you say the objects are the same, and this is a bug that can be triggered by different HTTP headers even if the JSON documents are the same.

If this behavior is only happening on edge, it's probably caused by a branch that landed recently.

If lp_save() is sometimes succeeding and sometimes failing, I would like to see the actual headers and content of the 209 response code you get when it succeeds.

James Westby (james-w) wrote :

Hi,

This was again apparently caused by the edge appservers having version skew.

lifeless asked me to make this a bug about that being a problem, as it makes it
harder to rollout without causing problems for API clients.

Thanks,

James

summary: - Spurious 412 error from the API on PATCH
+ Appserver version differences cause 412 errors on PATCH requests
affects: launchpad → launchpad-foundations
description: updated
Leonard Richardson (leonardr) wrote :

I see. Right now the ETag change whenever the code revision number changes. We can make it so that the "revision number" is a number we control and only increment when we actually change the web service. That will reduce this problem, but it won't totally eliminate it, and it'll mean that every Launchpad developer has to remember something extra.

We have a plan to freeze old versions of the web service (eg. "beta" and "1.0"; currently backward-compatible changes bleed through to earlier versions because it's easier than making sure they don't). Once we do this, we could associate a revision number with each version of the web service. Since the frozen versions would change very very rarely, you would pretty much never get this error as long as you were using a frozen version. You would get it more often if you were using devel,

But there's no way to totally get rid of this problem, because in the limiting case it is the correct behavior. If you get data from revision n of the web service, and then modify it and send your modification to be processed by revision n+k, the web service _should_ reject your request. It can't be expected to handle that request correctly, because it relied on assumptions that are no longer valid. It doesn't matter whether your two requests are separated by three months, across several Launchpad rollouts, or by a few seconds during a rollout. And of course if you get data from revision n+k and send it in to be processed by revision n, there's absolutely no hope.

On Tue, Sep 14, 2010 at 10:22 AM, Leonard Richardson
<email address hidden> wrote:
> I see. Right now the ETag change whenever the code revision number
> changes. We can make it so that the "revision number" is a number we
> control and only increment when we actually change the web service. That
> will reduce this problem, but it won't totally eliminate it, and it'll
> mean that every Launchpad developer has to remember something extra.

What does including the code revision in the ETag do for us?

> We have a plan to freeze old versions of the web service (eg. "beta" and
> "1.0"; currently backward-compatible changes bleed through to earlier
> versions because it's easier than making sure they don't). Once we do
> this, we could associate a revision number with each version of the web
> service. Since the frozen versions would change very very rarely, you
> would pretty much never get this error as long as you were using a
> frozen version. You would get it more often if you were using devel,

Here's the challenge:
 - rollout 10 times a day, one-two commits a rollout
 - no observed downtime.

> But there's no way to totally get rid of this problem, because in the
> limiting case it is the correct behavior. If you get data from revision
> n of the web service, and then modify it and send your modification to
> be processed by revision n+k, the web service _should_ reject your

While its true that we've *implemented' this as 'gets data and sends a
patch' I very much doubt that most consumer think of what they do as
'patching'. Creating a mailing list is not 'patching the team'. Other
web services manage to evolve without all clients simultaneously
breaking every time they change *anything*.

> request. It can't be expected to handle that request correctly, because
> it relied on assumptions that are no longer valid. It doesn't matter
> whether your two requests are separated by three months, across several
> Launchpad rollouts, or by a few seconds during a rollout. And of course
> if you get data from revision n+k and send it in to be processed by
> revision n, there's absolutely no hope.

Perhaps we need to redesign the web service fundamentals then? I don't
know how deep this assumption goes.

-Rob

James Westby (james-w) wrote :

On Mon, 13 Sep 2010 23:17:56 -0000, Robert Collins <email address hidden> wrote:
> > But there's no way to totally get rid of this problem, because in the
> > limiting case it is the correct behavior. If you get data from revision
> > n of the web service, and then modify it and send your modification to
> > be processed by revision n+k, the web service _should_ reject your
>
> While its true that we've *implemented' this as 'gets data and sends a
> patch' I very much doubt that most consumer think of what they do as
> 'patching'. Creating a mailing list is not 'patching the team'. Other
> web services manage to evolve without all clients simultaneously
> breaking every time they change *anything*.

Creating a mailing list also probably not a PATCH operation in
Launchpad.

Certainly I agree that the way the API works doesn't make it conducive
to handling PATCH failures well:

  * In most cases you don't care about the initial state of the object,
    you only want to set a final state (set a bug to Fix Released say).
    I have at times implemented a "mark_bug_fix_released" function, then
    called it in a loop catching 412 and retrying.

  * The 412 gives you no information about what changed server side, or
    even which object you are acting upon, which isn't great for
    long-running scripts. I think raising an exception with the 3 states
    available would allow for more flexibility in clients.

Thanks,

James

Gary Poster (gary) wrote :

PATCH is generally a reasonable verb for changing the existing characteristics of, say, a team. ETags are a good mechanism for coordinating caching. I assume we agree on these.

If a software version changes, it seems like it is reasonable to say that the ETag changes. Therefore, the devel webservice should continue to have the current behavior. IMO, the danger that this bug describes is a reasonable additional danger to those already present by using devel in your webservice code.

I think that frozen versions of the webservice should not include software versions in the ETag. I believe that will address this bug sufficiently. Leonard is preliminary in favor of this proposal, pending further discussion and documentation on just how "frozen" our frozen versions are.

Having a mailing list be a separate part of the path does seem like the right thing to do, but it also seems peripheral to this bug.

Changed in launchpad-foundations:
status: New → Triaged
importance: Undecided → Medium
Changed in launchpad:
importance: Medium → High
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers