Subrequests from ratelimit for /auth/v1.0 considered as proxy-server errors

Bug #1695273 reported by Filippo Giunchedi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Unassigned

Bug Description

Hi,
I'm upgrading from swift 2.2 to swift 2.10 and starting from swift-proxy, I noticed some 400s for HEAD /auth/v1.0 marked as "RL" that also cause "proxy-server.errors" metric to increment (see also graph in attachment). Specifically requests like these:

Jun 2 13:44:24 ms-fe2005 proxy-server: - - 02/Jun/2017/13/44/24 HEAD /auth/v1.0 HTTP/1.0 400 - Swift - - - - tx7356819ef1ad4d018cd1e-0059316bb8 - 0.0007 RL - 1496411064.224621058 1496411064.225285053 -
Jun 2 13:44:42 ms-fe2005 proxy-server: - - 02/Jun/2017/13/44/42 HEAD /auth/v1.0 HTTP/1.0 400 - Swift - - - - tx625ce88247d0486f8b71b-0059316bca - 0.0007 RL - 1496411082.986550093 1496411082.987246037 -

We're using tempauth and the HEAD seems to be generated also as part of normal client requests (e.g. in response of PUT).

I wanted to know if this behavior is expected? AFAICT those 400s shouldn't account towards proxy-server errors.

Thanks!

The proxy configuration looks like the one below.

[DEFAULT]
bind_port = 80
workers = 16
user = swift
log_statsd_host = localhost
log_statsd_port = 8125
log_statsd_metric_prefix = swift.codfw-prod.ms-fe2005
log_statsd_sample_rate_factor = 1

[pipeline:main]
pipeline = rewrite healthcheck cache container_sync tempurl ratelimit tempauth cors proxy-logging proxy-server

[app:proxy-server]
use = egg:swift#proxy
account_autocreate = true

[filter:tempurl]
use = egg:swift#tempurl
# default includes PUT
methods = GET HEAD

[filter:tempauth]
use = egg:swift#tempauth
token_life = 604800
<accounts>

[filter:ratelimit]
use = egg:swift#ratelimit
# accounts limited to 5/s PUT/DELETE to containers
account_ratelimit = 5
account_whitelist = AUTH_mw
log_sleep_time_seconds = 3
# containers with > 200 objects limited to 30/s PUT/DELETE/POST and listings
container_ratelimit_200 = 30
container_listing_ratelimit_200 = 30

[filter:container_sync]
use = egg:swift#container_sync

[filter:healthcheck]
use = egg:swift#healthcheck

[filter:cache]
use = egg:swift#memcache
memcache_servers = <servers>
memcache_serialization_support = 2
# per worker!
memcache_max_connections = 12

[filter:cors]
paste.filter_factory = wmf.cors:filter_factory

[filter:proxy-logging]
use = egg:swift#proxy_logging
set access_log_facility = LOG_LOCAL1

[filter:rewrite]
# the auth system turns our login and key into an account / token pair.
# the account remains valid forever, but the token times out.
account = AUTH_mw
# the name of the scaler cluster.
thumbhost = imagescaler-rw.discovery.wmnet
# upload doesn't like our User-agent (Python-urllib/2.6), otherwise we could call it using urllib2.urlopen()
user_agent = Mozilla/5.0
# this list is the containers that should be sharded
shard_container_list = <list>
backend_url_format = sitelang
tld = org

paste.filter_factory = wmf.rewrite:filter_factory

Revision history for this message
Filippo Giunchedi (filippo) wrote :
Revision history for this message
Tim Burke (1-tim-z) wrote :

Unfortunately, ratelimit is a bit of a blunt instrument -- it assumes that anything with at least two components to the path will correspond to a Swift path, regardless of whether the first component is a valid api version (like v1) or not (like auth). Unfortunately, this gets more complicated when you need to consider ratelimit's placement with respect to cname_lookup, domain_remap, or swift3 middlewares; as a result I'm not even sure that the current, rather broad behavior covers *enough* cases.

Fortunately, though, the errors are benign and can just be ignored. I think it's pretty safe to ignore *any* 4xx error with a RL source.

See also: https://bugs.launchpad.net/swift/+bug/1669888

(Separately, though, I think tempauth is out of spec returning a 400 on HEAD where a GET would 401... never mind the fact that there's a better code (405) for exactly this situation of using the wrong method on a known path...)

Revision history for this message
Filippo Giunchedi (filippo) wrote :

Thanks Tim for the quick reply!

What tipped me off into looking what was causing this was the proxy-server errors increase after 2.10 upgrade, I'm assuming other users might have had the same either after upgrading or after enabling ratelimit. Is there anything to be done code-wise in proxy-server to explicitly exclude RL requests from being counted towards proxy-server errors metric?

re: tempauth out of spec I've filed https://bugs.launchpad.net/swift/+bug/1695855

thanks!

Revision history for this message
Tim Burke (1-tim-z) wrote :

We probably could... over in the proxy_logging middleware [1] we could try to be a bit more intelligent about when to update stats, in particular when req.environ['swift.source'] is set -- but I worry a bit that doing so may

- paper over issues like https://bugs.launchpad.net/swift/+bug/1669888 so we never properly address them or
- hide real problems (like if ratelimit started making legitimately bad requests)

[1] https://github.com/openstack/swift/blob/2.15.1/swift/common/middleware/proxy_logging.py#L202-L219

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/540092
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=6994a2e392be4096beb49bd33e1a507dd04d491e
Submitter: Zuul
Branch: master

commit 6994a2e392be4096beb49bd33e1a507dd04d491e
Author: Samuel Merritt <email address hidden>
Date: Wed Jan 31 17:05:28 2018 -0800

    ratelimit: ignore requests with invalid API versions

    If you've got things like domain_remap, swift3, or other such
    middlewares in your pipeline, you wind up with requests that aren't of
    the form "/v1/<account>/<container>/<object>". When encountering such
    an oddball request, it's not useful to call get_account_info() on the
    second path component since it's probably not an account.

    This commit makes the ratelimit middleware skip requests that don't
    start with either "/v1" or "/v1.0". The requests will still be
    handled, but they won't be rate-limited.

    Change-Id: I9980cd0e902610ac99d13a502ae955bca2d99df3
    Closes-Bug: 1669888
    Closes-Bug: 1695273

Changed in swift:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.20.0

This issue was fixed in the openstack/swift 2.20.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.