In multisite config, secondary zone is unable to sync when TLS is enabled

Bug #1966669 reported by Giuseppe Petralia
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph RADOS Gateway Charm
Fix Released
High
James Page

Bug Description

Ceph version: 15.2.14-0ubuntu0.20.04.2

When creating a new bucket and insert an object on the Secondary this is replicated on Primary
but if when inserting an object on the Primary this is not replicated with the Secondary
and sync status shows

Secondary sync status:
# radosgw-admin sync status
realm 9db21932-5a36-4553-bb6b-526e4d704d45 (replicated)
zonegroup b94227a8-4f3b-4829-bc4e-e5325687b9a4 (myzonegroup)
zone 399fe045-b41a-41ea-97df-afefdba58523 (secondary)
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is behind on 1 shards
behind shards: [23]
oldest incremental change not applied: 2022-03-25T16:36:48.631357+0000 [23]
data sync source: d0b0d796-4628-44f1-a10c-25e7198dd3af (primary)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
1 shards are recovering
recovering shards: [45]

Secondary zone is full of Permission in `radosgw-admin sync error list `:

i.e.
    {
        "shard_id": 10,
        "entries": [
            {
                "id": "1_1648226529.700750_24427332.1",
                "section": "data",
                "name": "test-20220325-12:d0b0d796-4628-44f1-a10c-25e7198dd3af.224896.1:1",
                "timestamp": "2022-03-25T16:42:09.700750Z",
                "info": {
                    "source_zone": "d0b0d796-4628-44f1-a10c-25e7198dd3af",
                    "error_code": 13,
                    "message": "failed to sync bucket instance: (13) Permission denied"
                }
            }
        ]
    },

Increasing loglevel on ceph-radosgw and we can see signature mismatch error on Primary logs:

2022-03-24T13:15:56.812+0000 7f92f2ffd700 15 req 301160 0s :get_metadata server signature=AAAA$redacted # two different signatures here
2022-03-24T13:15:56.812+0000 7f92f2ffd700 15 req 301160 0s :get_metadata client signature=BBBB$redacted # two different signatures here
2022-03-24T13:15:56.812+0000 7f92f2ffd700 15 req 301160 0s :get_metadata compare=6
2022-03-24T13:15:56.812+0000 7f92f2ffd700 20 req 301160 0s :get_metadata rgw::auth::s3::LocalEngine denied with reason=-2027
2022-03-24T13:15:56.812+0000 7f92f2ffd700 20 req 301160 0s :get_metadata rgw::auth::s3::AWSAuthStrategy denied with reason=-2027
2022-03-24T13:15:56.812+0000 7f92f2ffd700 5 req 301160 0s :get_metadata Failed the auth strategy, reason=-2027
2022-03-24T13:15:56.812+0000 7f92f2ffd700 10 failed to authorize request

the above returns an http 403 error.

We have tried to remove Secondary zone, clean pools and recreate the zone, but as soon as we create a new bucket on the primary a behind shard in metadata appears on the secondary and once we create an object on Primary a recovering shard appears on Secondary data sync status output.

description: updated
Revision history for this message
James Page (james-page) wrote :

I've reproduced this in the lab - this only impacts deployments where TLS is enabled.

When TLS is enabled Apache2 is used to terminate the secure connection and then proxy the connection to the radosgw process - something in this data pipeline is causing the client provided signature to mismatch with the server calculated signature and authentication fails as a result.

Bypassing Apache2 and terminating the secure connection on haproxy works around the issue but does change the security profile of the deployment.

summary: - In multisite config, secondary zone is unable to sync
+ In multisite config, secondary zone is unable to sync when TLS is
+ enabled
Changed in charm-ceph-radosgw:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → James Page (james-page)
Revision history for this message
James Page (james-page) wrote :

Interestingly it appears that only the metadata query fails signature matching - so there must be something interesting in this particular code path that's causing the problem.

If metadata is in sync, objects in buckets actually do continue to sync even through apache2 TLS termination pipeline.

Revision history for this message
James Page (james-page) wrote :

Creating a new bucket:

10.5.0.136:443 10.5.0.136 - - [30/Mar/2022:09:43:09 +0000] "GET /admin/metadata/bucket.instance/test-4eb37ce7-e4ff-40dc-9fbd-41f0a2fcce82%3A98d662cc-86ba-4c88-ba9d-cce8f3751708.230998.1?key=test-4eb37ce7-e4ff-40dc-9fbd-41f0a2fcce82%3A98d662cc-86ba-4c88-ba9d-cce8f3751708.230998.1&rgwx-zonegroup=c891d8a3-fe5c-44bf-b296-1b59046d2005 HTTP/1.1" 403 395 "-" "-"

403 seen straight away

Revision history for this message
James Page (james-page) wrote :

From radosgw log:

2022-03-30T09:43:09.925+0000 7f5581f9b700 1 beast: 0x7f5638674810: 127.0.0.1 - - [2022-03-30T09:43:09.925701+0000] "GET /admin/metadata/bucket.instance/test-4eb37ce7-e4ff-40dc-9fbd-41f0a2fcce82:98d662cc-86ba-4c88-ba9d-cce8f3751708.230998.1?key=test-4eb37ce7-e4ff-40dc-9fbd-41f0a2fcce82%3A98d662cc-86ba-4c88-ba9d-cce8f3751708.230998.1&rgwx-zonegroup=c891d8a3-fe5c-44bf-b296-1b59046d2005 HTTP/1.1" 403 131 - -

Revision history for this message
James Page (james-page) wrote :

This appears to be the only query impacted by the signature verification failure.

My current thought is that mod proxy is decoding the URI and thus the signature calculated in the RADOS backend does not match.

Revision history for this message
James Page (james-page) wrote :

test-4eb37ce7-e4ff-40dc-9fbd-41f0a2fcce82%3A98d662cc-86ba-4c88-ba9d-cce8f3751708.230998.1

vs

test-4eb37ce7-e4ff-40dc-9fbd-41f0a2fcce82:98d662cc-86ba-4c88-ba9d-cce8f3751708.230998.1

note the ':'

Revision history for this message
James Page (james-page) wrote (last edit ):

when bypassing apache and mod proxy the URI is logged encoded rather than decoded.

2022-03-30T10:44:06.779+0000 7fab34790700 1 beast: 0x7faa435ea6b0: 10.5.0.136 - - [2022-03-30T10:44:06.779996+0000] "GET /admin/metadata/bucket.instance/test-b5af594b-f69d-4f38-9836-eeff065a76b2%3A98d662cc-86ba-4c88-ba9d-cce8f3751708.232157.6?key=test-b5af594b-f69d-4f38-9836-eeff065a76b2%3A98d662cc-86ba-4c88-ba9d-cce8f3751708.232157.6&rgwx-zonegroup=c891d8a3-fe5c-44bf-b296-1b59046d2005 HTTP/1.1" 200 1250 - -

Revision history for this message
Giuseppe Petralia (peppepetra) wrote :

adding `nocanon` to ProxyPass fixes the issue.

Ref https://httpd.apache.org/docs/2.4/mod/mod_proxy.html

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-radosgw (master)
Changed in charm-ceph-radosgw:
status: Confirmed → In Progress
Revision history for this message
James Page (james-page) wrote :

adding the nocanon stanza to the ProxyPass configuration in Apache seems todo the trick.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-radosgw (master)

Reviewed: https://review.opendev.org/c/openstack/charm-ceph-radosgw/+/835827
Committed: https://opendev.org/openstack/charm-ceph-radosgw/commit/7907fa96e93085e114fb42f2dc547963938498fb
Submitter: "Zuul (22348)"
Branch: master

commit 7907fa96e93085e114fb42f2dc547963938498fb
Author: James Page <email address hidden>
Date: Wed Mar 30 13:35:04 2022 +0100

    Resolve issue with mod_proxy decoding

    The Ceph RADOS Gateway uses some unusual URI's for multisite
    replication; ensure that mod_proxy passes the 'raw' URI down
    to the radosgw http endpoint so that client and server side
    signatures continue to match.

    This seems quite Ceph specific so the template is specialised
    into the charm rather than updated in charm-helpers.

    Change-Id: Iede49ba8904500076d53388345e154a3ed18e761
    Closes-Bug: 1966669

Changed in charm-ceph-radosgw:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-radosgw (stable/pacific)

Fix proposed to branch: stable/pacific
Review: https://review.opendev.org/c/openstack/charm-ceph-radosgw/+/835941

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-radosgw (stable/21.10)

Fix proposed to branch: stable/21.10
Review: https://review.opendev.org/c/openstack/charm-ceph-radosgw/+/835942

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-radosgw (stable/octopus)

Fix proposed to branch: stable/octopus
Review: https://review.opendev.org/c/openstack/charm-ceph-radosgw/+/836129

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-ceph-radosgw (stable/21.10)

Change abandoned by "James Page <email address hidden>" on branch: stable/21.10
Review: https://review.opendev.org/c/openstack/charm-ceph-radosgw/+/835942

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-radosgw (stable/pacific)

Reviewed: https://review.opendev.org/c/openstack/charm-ceph-radosgw/+/835941
Committed: https://opendev.org/openstack/charm-ceph-radosgw/commit/16999f9dda3b22117cc75c93068c39c8f725d703
Submitter: "Zuul (22348)"
Branch: stable/pacific

commit 16999f9dda3b22117cc75c93068c39c8f725d703
Author: James Page <email address hidden>
Date: Wed Mar 30 13:35:04 2022 +0100

    Resolve issue with mod_proxy decoding

    The Ceph RADOS Gateway uses some unusual URI's for multisite
    replication; ensure that mod_proxy passes the 'raw' URI down
    to the radosgw http endpoint so that client and server side
    signatures continue to match.

    This seems quite Ceph specific so the template is specialised
    into the charm rather than updated in charm-helpers.

    Change-Id: Iede49ba8904500076d53388345e154a3ed18e761
    Closes-Bug: 1966669
    (cherry picked from commit 7907fa96e93085e114fb42f2dc547963938498fb)

tags: added: in-stable-pacific
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-radosgw (stable/octopus)

Reviewed: https://review.opendev.org/c/openstack/charm-ceph-radosgw/+/836129
Committed: https://opendev.org/openstack/charm-ceph-radosgw/commit/aa9b42371356d4366a0c39c729eca3ce74eb7fe7
Submitter: "Zuul (22348)"
Branch: stable/octopus

commit aa9b42371356d4366a0c39c729eca3ce74eb7fe7
Author: James Page <email address hidden>
Date: Wed Mar 30 13:35:04 2022 +0100

    Resolve issue with mod_proxy decoding

    The Ceph RADOS Gateway uses some unusual URI's for multisite
    replication; ensure that mod_proxy passes the 'raw' URI down
    to the radosgw http endpoint so that client and server side
    signatures continue to match.

    This seems quite Ceph specific so the template is specialised
    into the charm rather than updated in charm-helpers.

    Change-Id: Iede49ba8904500076d53388345e154a3ed18e761
    Closes-Bug: 1966669
    (cherry picked from commit 7907fa96e93085e114fb42f2dc547963938498fb)

tags: added: in-stable-octopus
Changed in charm-ceph-radosgw:
milestone: none → 22.04
Changed in charm-ceph-radosgw:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.