Handoff requested log for HEAD on proxy-server

Bug #1297214 reported by Hugo Kou
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Unassigned

Bug Description

Bunch of Handoff requested log for HEAD in proxy-server.
Is this on purpose ????

proxy.error:Mar 25 03:17:21 PACO1 proxy-server: Handoff requested (1) (txn: txd53d195092a04b049834f-00533157b1) (client_ip: 192.168.56.21)
proxy.error:Mar 25 03:17:21 PACO1 proxy-server: Handoff requested (1) (txn: tx914443cd8eb6440493aca-00533157b1) (client_ip: 192.168.56.21)
proxy.error:Mar 25 03:17:21 PACO1 proxy-server: Handoff requested (1) (txn: tx2cb1143a44b24ab19788f-00533157b1) (client_ip: 192.168.56.21)
proxy.error:Mar 25 03:17:21 PACO1 proxy-server: Handoff requested (1) (txn: txdc509e8f3618462b96c0e-00533157b1) (client_ip: 192.168.56.21)
proxy.error:Mar 25 03:17:21 PACO1 proxy-server: Handoff requested (1) (txn: txcc9c2eaaf0154aa9a5cd1-00533157b1) (client_ip: 192.168.56.21)
proxy.error:Mar 25 03:17:21 PACO1 proxy-server: Handoff requested (1) (txn: tx7b8e19fefaa9466582e60-00533157b1) (client_ip: 192.168.56.21)
proxy.error:Mar 25 03:17:21 PACO1 proxy-server: Handoff requested (1) (txn: txcd74db04f337405ba8ad4-00533157b1) (client_ip: 192.168.56.21)
proxy.error:Mar 25 03:17:21 PACO1 proxy-server: Handoff requested (1) (txn: tx20c83ac2cfc14adc9cd11-00533157b1) (client_ip: 192.168.56.21)
proxy.error:Mar 25 03:17:21 PACO1 proxy-server: Handoff requested (1) (txn: tx189851f41b4b485ba05c6-00533157b1) (client_ip: 192.168.56.21)
proxy.error:Mar 25 03:17:21 PACO1 proxy-server: Handoff requested (1) (txn: tx20aa5c08db854a0d8acba-00533157b1) (client_ip: 192.168.56.21)
proxy.error:Mar 25 03:17:21 PACO1 proxy-server: Handoff requested (1) (txn: tx5143dd47d65b45d6a2416-00533157b1) (client_ip: 192.168.56.21)
proxy.error:Mar 25 03:53:33 PACO1 proxy-server: Handoff requested (1) (txn: txff62a0c5f688481d9b520-005331602d)

root@PACO1:/var/log/swift# grep txff62a0c5f688481d9b520-005331602d *
proxy.error:Mar 25 03:53:33 PACO1 proxy-server: STDOUT: HUGOHUGOHUGOHUGO (txn: txff62a0c5f688481d9b520-005331602d)
proxy.error:Mar 25 03:53:33 PACO1 proxy-server: Handoff requested (1) (txn: txff62a0c5f688481d9b520-005331602d)
proxy.log:Mar 25 03:53:33 PACO1 proxy-server: Calling Swift3 Middleware (txn: txff62a0c5f688481d9b520-005331602d)
proxy.log:Mar 25 03:53:33 PACO1 proxy-server: {'headers': {'Accept-Encoding': 'gzip, deflate, compress', 'Accept': '*/*', 'User-Agent': 'python-requests/2.2.1 CPython/2.7.3 Linux/3.5.0-23-generic', 'Host': 'localhost', 'X-Auth-Token': 'AUTH_tk2bd513aa449c47fa896ff45556d61b3a', 'Content-Type': None}, 'environ': {'SCRIPT_NAME': '', 'REQUEST_METHOD': 'HEAD', 'PATH_INFO': '/v1/AUTH_val/1/ssl-val_val.source', 'SERVER_PROTOCOL': 'HTTP/1.0', 'REMOTE_ADDR': '127.0.0.1', 'HTTP_X_AUTH_TOKEN': 'AUTH_tk2bd513aa449c47fa896ff45556d61b3a', 'HTTP_USER_AGENT': 'python-requests/2.2.1 CPython/2.7.3 Linux/3.5.0-23-generic', 'eventlet.posthooks': [], 'RAW_PATH_INFO': '/v1/AUTH_val/1/ssl-val_val.source', 'REMOTE_PORT': '45359', 'eventlet.input': <eventlet.wsgi.Input object at 0x27eb210>, 'wsgi.url_scheme': 'https', 'SERVER_PORT': '443', 'wsgi.input': <swift.common.utils.InputProxy object at 0x29c6750>, 'HTTP_HOST': 'localhost', 'swift.cache': <swift.common.memcached.MemcacheRing object at 0x27c8810>, 'HTTPS': 'on', 'wsgi.multithread': True, 'HTTP_ACCEPT': '*/*', 'wsgi.version': (1, 0), 'SERVER_NAME': '127.0.0.1', 'GATEWAY_INTERFACE': 'CGI/1.1', 'wsgi.run_once': False, 'wsgi.errors': <swift.common.utils.LoggerFileObject object at 0x234bb90>, 'wsgi.multiprocess': False, 'swift.trans_id': 'txff62a0c5f688481d9b520-005331602d', 'CONTENT_TYPE': None, 'HTTP_ACCEPT_ENCODING': 'gzip, deflate, compress', 'swift.copy_response_hook': <function slo_hook at 0x27eaa28>}}
proxy.log:Mar 25 03:53:33 PACO1 proxy-server: User: val uses token AUTH_tk2bd513aa449c47fa896ff45556d61b3a (trans_id txff62a0c5f688481d9b520-005331602d)
proxy.log:Mar 25 03:53:33 PACO1 proxy-server: User val:val has reseller admin authorizing. (txn: txff62a0c5f688481d9b520-005331602d) (client_ip: 127.0.0.1)
proxy.log:Mar 25 03:53:33 PACO1 proxy-server: 127.0.0.1 127.0.0.1 25/Mar/2014/10/53/33 HEAD /v1/AUTH_val/1/ssl-val_val.source HTTP/1.0 404 - python-requests/2.2.1%20CPython/2.7.3%20Linux/3.5.0-23-generic AUTH_tk2bd513aa449c47fa896ff45556d61b3a - - - txff62a0c5f688481d9b520-005331602d - 0.0142 - - 1395744813.805253029 1395744813.819473028
storage1.log:Mar 25 03:53:33 PACO1 object-server: 127.0.0.1 - - [25/Mar/2014:10:53:33 +0000] "HEAD /d1/827/AUTH_val/1/ssl-val_val.source" 404 - "HEAD https://localhost/v1/AUTH_val/1/ssl-val_val.source" "txff62a0c5f688481d9b520-005331602d" "proxy-server 3499" 0.0002
storage2.log:Mar 25 03:53:33 PACO1 container-server: 127.0.0.1 - - [25/Mar/2014:10:53:33 +0000] "HEAD /d2/774/AUTH_val/1" 204 - "txff62a0c5f688481d9b520-005331602d" "HEAD http://localhost/v1/AUTH_val/1" "proxy-server 3499" 0.0011
storage2.log:Mar 25 03:53:33 PACO1 object-server: 127.0.0.1 - - [25/Mar/2014:10:53:33 +0000] "HEAD /d2/827/AUTH_val/1/ssl-val_val.source" 404 - "HEAD https://localhost/v1/AUTH_val/1/ssl-val_val.source" "txff62a0c5f688481d9b520-005331602d" "proxy-server 3499" 0.0002
storage3.log:Mar 25 03:53:33 PACO1 object-server: 127.0.0.1 - - [25/Mar/2014:10:53:33 +0000] "HEAD /d3/827/AUTH_val/1/ssl-val_val.source" 404 - "HEAD https://localhost/v1/AUTH_val/1/ssl-val_val.source" "txff62a0c5f688481d9b520-005331602d" "proxy-server 3499" 0.0003
storage4.log:Mar 25 03:53:33 PACO1 object-server: 127.0.0.1 - - [25/Mar/2014:10:53:33 +0000] "HEAD /d4/827/AUTH_val/1/ssl-val_val.source" 404 - "HEAD https://localhost/v1/AUTH_val/1/ssl-val_val.source" "txff62a0c5f688481d9b520-005331602d" "proxy-server 3499" 0.0002
root@PACO1:/var/log/swift#

Revision history for this message
Samuel Merritt (torgomatic) wrote :

Yep, this logging is unhelpful.

A long time ago, the proxy would mostly talk to primary nodes, but it would talk to handoffs when a primary was down. This applied to all HTTP verbs. Handoff logging was added so that operators could see how often requests went to handoffs; if one were to look at the ratio of handoffs/requests, one could get a sense of how well their cluster was doing with availability.

A shorter time ago, the proxy started talking to handoffs on GET/HEAD requests when the primaries gave 404s. This way, even if a write went to handoffs, the object would still be available for reads. By default, the proxy will request things from the 3 primaries and then the first 3 handoffs (but it's configurable). Unfortunately, this means the handoffs/requests ratio isn't particularly useful right now.

What should happen:

On GET/HEAD requests, the first 3 handoffs shouldn't be logged, as they're normal. For other verbs, all handoffs should be logged.

Some examples with the default config:

GET talks to 3 primaries and 3 handoffs: no handoffs logged

GET talks to 2 primaries (one's down) and 4 handoffs: 1 handoff logged

GET talks to 1 primary (two are down) and 5 handoffs: 2 handoffs logged

GET talks to 0 primaries (all down) and 6 handoffs: 3 handoffs logged, 1 handoff_all logged

PUT talks to 3 primaries and 0 handoffs: no handoffs logged

PUT talks to 2 primaries (one's down) and 1 handoff: 1 handoff logged

PUT talks to 0 primaries (all down) and 3 handoffs: 3 handoffs logged, 1 handoff_all logged

Changed in swift:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/109791
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=4bc600ff42f254849458c77ddef9bdc47f91e4a1
Submitter: Jenkins
Branch: master

commit 4bc600ff42f254849458c77ddef9bdc47f91e4a1
Author: Samuel Merritt <email address hidden>
Date: Fri Jul 25 18:50:19 2014 -0700

    Fix "Handoff requested (N)" log spam

    A long, long time ago, on a GET request, the proxy would go look on 3*
    nodes for the requested thing. If one of the primary nodes was
    error-limited, it'd look on two primaries and a handoff. Since this
    indicated some failure somewhere, the proxy would emit a warning:
    "Handoff requested (1)". If two primaries were down, there'd be a
    second message "Handoff requested (2)", and so on.

    Some StatsD messages were emitted too.

    A somewhat shorter time ago (commit d79a67eb), the proxy started
    looking into handoffs if it got 404s from the primaries. While this
    was a good idea, it resulted lots of "Handoff requested (N)" log spam;
    you'd see these messages on every single 404. Also, the StatsD
    handoff_count and handoff_all_count metrics shot way up and turned
    into noise.

    This commit restores the original intent (and usefulness) of the log
    messages and StatsD metrics: if the proxy only looks at the normal
    number of handoff nodes, nothing is logged. However, if a primary is
    down, then the message "Handoff requested (1)" will be logged,
    indicating that the proxy looked at one more handoff than it normally
    would, and this happened because a primary node was error-limited.

    Closes-Bug: 1297214

    * or whatever the replica count was

    Change-Id: If1b77c18c880b096e8ab1df3008db40ce313835d

Changed in swift:
status: Confirmed → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/ec)

Fix proposed to branch: feature/ec
Review: https://review.openstack.org/111506

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on swift (feature/ec)

Change abandoned by paul luse (<email address hidden>) on branch: feature/ec
Review: https://review.openstack.org/111506
Reason: accident

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/ec)

Fix proposed to branch: feature/ec
Review: https://review.openstack.org/111866

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/ec)
Download full text (15.2 KiB)

Reviewed: https://review.openstack.org/111866
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=9ace546b81b7efe606314e24fdb951ffe8e9a253
Submitter: Jenkins
Branch: feature/ec

commit e2255fc1f4d08993f41d3ff2eacc62a3c5f66ecd
Author: David Goetz <email address hidden>
Date: Thu Jul 31 10:19:10 2014 -0700

    fix expirer bug with unicode container listings

    If the container names in the expirer's account are returned as
    unicode strings (as is the case with some json libraries), the
    expirer compared eg u'1' == '1', which is problematic. This patch
    ensures that the unicode is coerced to ascii so the comparison
    is correct.

    Change-Id: I72b322e7513f7da32e8dc75c6bf0e7e016948c88

commit 4286f36a60446e322f23a329416be3ed67db1149
Author: anc <email address hidden>
Date: Mon Mar 10 11:46:58 2014 +0000

    Enable object system metadata on PUTs

    This patch takes a first step towards support
    for object system metadata by enabling headers
    in the x-object-sysmeta- namespace to be
    persisted when objects are PUT. This should be
    useful for other pending patches such as on
    demand migration and server side encryption
    (https://review.openstack.org/#/c/64430/ and
    https://review.openstack.org/#/c/76578/1).

    The x-object-sysmeta- namespace is already
    reserved/protected by the gatekeeper and
    passed through the proxy. This patch modifies
    the object server to persist these headers
    alongside user metadata when an object is
    PUT.

    This patch will preserve existing object
    system metadata and ignore any new system
    metadata when handling object POSTs,
    including POST-as-copy operations. Support
    for modification of object system metadata
    with a POST request requires further work
    as discussed in the blueprint.

    This patch will preserve existing object
    system metadata and update it with new
    system metadata when copying an object.

    A new probe test is added which makes use of
    the BrainSplitter class that has been moved
    from test_container_merge_policy_index.py to
    a new module brain.py.

    blueprint object-system-metadata

    Change-Id: If716bc15730b7322266ebff4ab8dd31e78e4b962

commit 8ad37b2217c30b756bb79ead84be7c04f628ed36
Author: Christian Schwede <email address hidden>
Date: Fri Aug 1 13:55:21 2014 +0000

    Fix swift-dispersion-populate bug when using SimpleClient class

    Commit 7fcbbebb changed the internal client a little bit and removed the
    class variable self.attempts in SimpleClient. This is currently still
    needed in swift-dispersion-populate to track the number of retries, thus
    re-adding it again.

    Closes-Bug: 1351323
    Change-Id: I98ba6441607158b34708b2dbb2d1a63788681d63

commit 5a84b1cde102dac4ae5a40a20a364ada1dc547e9
Author: Samuel Merritt <email address hidden>
Date: Thu Jul 31 12:18:47 2014 -0700

    Put proxy-logging in object expirer's sample config

    It's generally better to have logs for something than to not have
    logs. This way, the object expirer (if using the sample config as a
    starting point) will log wh...

Thierry Carrez (ttx)
Changed in swift:
milestone: none → 2.1.0
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.