placement returns 503 when keystone is down

Bug #1749797 reported by Jim Rollenhagen on 2018-02-15
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Unassigned
keystonemiddleware
Undecided
Colleen Murphy

Bug Description

See the logs here: http://logs.openstack.org/50/544750/8/check/ironic-grenade-dsvm-multinode-multitenant/5713fb8/logs/screen-placement-api.txt.gz#_Feb_15_17_58_22_463228

This is during an upgrade while Keystone is down. Placement returns a 503 because it cannot reach keystone.

I'm not sure what the expected behavior should be, but a 503 feels wrong.

Matt Riedemann (mriedem) on 2018-02-15
tags: added: placement upgrade
Changed in nova:
status: New → Confirmed
Chris Dent (cdent) wrote :

Thanks for report thing jroll. For a long time, I had a thing on my to-do list something along the lines of "placement weird when keystone down, investigate" but without enough detail to remember what it was when I went back the old things in the list, so deleted it. It was a note I had made while I was digging in something else and didn't want to get distracted.

The logs show the response is coming from keystonemiddleware.auth_token, which placement uses pretty much directly as a straight up wsgi middleware, not adapting the response in any way.

So if this is a bug, it's a bug there (and I've added the project). However, it is by design:

* https://github.com/openstack/keystonemiddleware/blob/master/keystonemiddleware/auth_token/__init__.py#L776-L781
* https://docs.openstack.org/keystonemiddleware/latest/middlewarearchitecture.html#specification-overview

It's a potentially weird interpretation of 503 ( https://httpstatuses.com/503 ) but not too far off base.

Since keystone is required to do the auth, I'm not really sure what ought to happen here. A more clear message about _which_ service is unavailable might be useful: "Bad response code while validating token: 503: ServiceUnavailable: Service Unavailable (HTTP 503)"

Any 4xx error is suspect because there's nothing wrong with the clients request, other than bad luck about the time they asked it.

What other alternatives are there?

Ah, gotcha. That makes sense, thanks Chris.

I do think we should make placement's error message better for the client. Currently it is: {"message": "The server is currently unavailable. Please try again at a later time.<br /><br />\n\n\n", "code": "503 Service Unavailable", "title": "Service Unavailable"}. See also the compute logs from the same run: http://logs.openstack.org/50/544750/8/check/ironic-grenade-dsvm-multinode-multitenant/5713fb8/logs/screen-n-cpu.txt.gz#_Feb_15_17_58_22_470073

I'm thinking we can catch webob.exc.HTTPServiceUnavailable from placement's wrapper around the auth_token middleware, and transform the message there. Does that work for you?

Continued thinking about this, and maybe we should just put a clearer message in the auth_token middleware itself? Then all consumers will get the clear message. Win.

Chris Dent (cdent) wrote :

yeah, I think the second option is better because many situations there won't be any wrapper (placement didn't have one until very recently)

Fix proposed to branch: master
Review: https://review.openstack.org/546108

Changed in keystonemiddleware:
assignee: nobody → Chris Dent (cdent)
status: New → In Progress
Changed in keystonemiddleware:
assignee: Chris Dent (cdent) → Colleen Murphy (krinkle)

Reviewed: https://review.openstack.org/546108
Committed: https://git.openstack.org/cgit/openstack/keystonemiddleware/commit/?id=d3352ff422db6ba6a5e7bd4f7220af0d97efd0ac
Submitter: Zuul
Branch: master

commit d3352ff422db6ba6a5e7bd4f7220af0d97efd0ac
Author: Chris Dent <email address hidden>
Date: Tue Feb 20 10:31:49 2018 +0000

    Identify the keystone service when raising 503

    When the keystonemiddleware is used directly in the WSGI stack of an
    application, the 503 that is raised when the keystone service errors
    or cannot be reached needs to identify that keystone is the service
    that has failed, otherwise it appears to the client that it is the
    service they are trying to access is down, which is misleading.

    This addresses the problem in the most straightforward way possible:
    the exception that causes the 503 is given a message including the
    word "Keystone".

    The call method in BaseAuthTokenTestCase gains an
    expected_body_string kwarg. If not None, the response body (as
    a six.text_type) is compared with the value.

    Change-Id: Idf211e7bc99139744af232f5ea3ecb4be41551ca
    Closes-Bug: #1747655
    Closes-Bug: #1749797

Changed in keystonemiddleware:
status: In Progress → Fix Released
Chris Dent (cdent) on 2018-03-09
Changed in nova:
status: Confirmed → Invalid

This issue was fixed in the openstack/keystonemiddleware 5.0.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers