Resource leak with HTTPBadRequest in StaticLargeObject.get_slo_segments

Bug #1980954 reported by Jeremy Stanley
258
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Unassigned
OpenStack Security Advisory
Incomplete
Undecided
Unassigned

Bug Description

[Sébastien Mériot with OVH sent the following via encrypted E-mail]

Hello guys,

We reach out to you to follow the responsible disclosure process after our team
that works on OpenStack found a security issue that could cause a Deny of
Service. Indeed a resource is not properly free in Swift and in a specific
condition, it leads Swift to leak memory and not properly close file
descriptors. By repeating the exploit, a remote attacker can cause the host to
run out of memory and most likely run out of file descriptors.

The issue occurs when dealing with Static Large Objects (SLO) on EC storage
policy. To reproduce the issue:

1. Create a container with an EC storage policy (PCA at OVHcloud)

   $ swift post mycontainer2 --header X-Storage-Policy:PCA

2. Put a simple object (without SLO) in the container.

   $ swift upload mycontainer2 bigfile --object-name test

3. Delete in loop the object *with* the SLO mode.

   $ for i in {1..10000}
       do curl -i "http://localhost:8888/v1/\
                   AUTH_b2b0bf987baa4161b9b83985ac303c9f/mycontainer2/test\
                   ?multipart-manifest=delete" \
               -XDELETE -H "X-Auth-Token: XXXX"
        done

4. An 400 error is raised : "Not an SLO manifest" and memory is getting leaked.

After investigating the issue the team discovered that a resource was not
properly released :
https://github.com/openstack/swift/blob/master/swift/common/middleware/slo.py#L1535

Indeed, the body of the resp object is not read and then remain open even after
raising the exception, which cause the memory leak.

Our team suggests to add a `drain_and_close` call before raising the exception
in order to fix the issue:

```python
        if resp.is_success:
            if config_true_value(resp.headers.get('X-Static-Large-Object')):
                try:
                    return json.loads(resp.body)
                except ValueError:
                    raise HTTPServerError('Unable to load SLO manifest')
            else:
                # Drain an close requests opened to object servers
                drain_and_close(resp)
                raise HTTPBadRequest('Not an SLO manifest')
```

The issue has been successfully triggered on Swift 2.25 and 2.29. And it seems
the upstream still is vulnerable as shown above.

This vulnerability could be rated 7.5 according to CVSS3 calculator:
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

Feel free to contact us if you need any additionnal information.

Regards,

Sébastien Mériot
Head of CSIRT-OVH
PGP: https://openpgp.circl.lu/pks/lookup?op=get&search=0x43f7c95e4eb57ef1
OVHcloud | Roubaix, FR

Revision history for this message
Jeremy Stanley (fungi) wrote :

Since this report concerns a possible security risk, an incomplete
security advisory task has been added while the core security
reviewers for the affected project or projects confirm the bug and
discuss the scope of any vulnerability along with potential
solutions.

Changed in ossa:
status: New → Incomplete
Revision history for this message
clayg (clay-gerrard) wrote :

Hello Sébastien Mériot, thanks for the detailed bug report.

FWIW I find it's a good deal easier to get these sort of fixes merged if we can do tests and review using review.opendev.org to collaborate. e.g. https://bugs.launchpad.net/swift/+bug/1572719

The core team is very small and we rely a lot on public contributions.

Sébastien Mériot, do you - or could you say if OVH would - have any objection to marking this bug report public?

Jeremy Stanley, do you know if it would be possible for Sébastien Mériot or a member of the swift core team to to create a private/security change with `git review`?

Revision history for this message
Jeremy Stanley (fungi) wrote :

The Gerrit service at review.opendev.org is entirely public and does not provide fine-grained access controls in order to limit visibility of some changes to specific individuals. If the suggestion is to switch this report to public and continue to fix the bug publicly in Gerrit, I don't object, and am happy to reach out to Sébastien Mériot in order to find out whether OVH is comfortable with the suggestion.

The OpenStack vulnerability managers have typically considered denial of service through authenticated resource exhaustion to not need a private embargo, since service administrators can identify any accounts causing these sorts of problems and block further access. Making the report public while a fix is in progress will at most mean some people who find out might attempt to maliciously exploit the bug and hopefully have their accounts blocked by administrators when they do, but by the same token it will accelerate a resolution to the problem and may even reduce the number of conscientious users accidentally triggering it.

Revision history for this message
Jeremy Stanley (fungi) wrote :

Assuming that's indeed what you meant, I've gone ahead and replied to CSIRT-OVH in order to inquire about their comfort level with switching to a public workflow for this report.

Revision history for this message
Jeremy Stanley (fungi) wrote :

I've received word from the fine folks at OVH that they're also okay with working on fixes for this in public, so am switching to our public vulnerability report workflow now.

description: updated
information type: Private Security → Public Security
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/swift/+/850357

Changed in swift:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to swift (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/swift/+/850782

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.opendev.org/c/openstack/swift/+/850357
Committed: https://opendev.org/openstack/swift/commit/a5c1444faa827dc2e3d85468d42622acea1d435e
Submitter: "Zuul (22348)"
Branch: master

commit a5c1444faa827dc2e3d85468d42622acea1d435e
Author: Romain de Joux <email address hidden>
Date: Tue Jul 19 15:22:31 2022 +0200

    Drain and close response in StaticLargeObject.get_slo_segments

    In get_slo_segments a GET subrequest is processed to get SLO manifest,
    but if the object is not a SLO the response was not drain/closed.

    Closes-Bug: 1980954
    Change-Id: I7862c8ef153416c00c8ca7d6bf2f3556a1776d8c

Changed in swift:
status: In Progress → Fix Released
Revision history for this message
Jeremy Stanley (fungi) wrote :

Will backporting 850357 be sufficient to address this risk on stable branches, or does 850782 need to be included as well?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to swift (master)

Reviewed: https://review.opendev.org/c/openstack/swift/+/850782
Committed: https://opendev.org/openstack/swift/commit/e6ee372744de9855ac3bfe5dea0859c42ae85cc2
Submitter: "Zuul (22348)"
Branch: master

commit e6ee372744de9855ac3bfe5dea0859c42ae85cc2
Author: Tim Burke <email address hidden>
Date: Fri Jul 22 15:04:04 2022 -0700

    slo: Reduce overhead for 'Not an SLO manifest' responses

    When clients issue a ?multipart-manifest=delete request to non-SLOs, we
    try to fetch the manifest then drain and close the response upon seeing
    it wasn't actually an SLO manifest. This could previously cause the extra
    transfer (and discard) of several gigabytes of data.

    Now, add two extra headers to the request:

      * Range: bytes=-1
      * X-Backend-Ignore-Range-If-Metadata-Present: X-Static-Large-Object

    The first limits how much data we'll be discarding, while the second tells
    object servers to ignore the range header if it's an SLO manifest. Note
    that object-servers may still need to return more than one byte to the
    proxy -- an EC policy will require that we get a full fragment's worth
    from each server -- but at least we've got a better cap on our downside.

    Why one byte? Because range requests weren't designed to be able to
    return no data. Why the last byte (as opposed to the first)? Because
    bytes=0-0 will 416 on a zero-byte object, while bytes=-1 will 200.

    Note that the backend header was introduced in Swift 2.24.0 -- if we get
    a response from an older object-server, it may respect the Range header
    even though it's returning an SLO manifest. In that case, retry without
    either header.

    Related-Bug: #1980954
    Co-Authored-By: Romain de Joux <email address hidden>
    Change-Id: If3861e5b9c4f17ab3b82ea16673ddb29d07820a1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.30.0

This issue was fixed in the openstack/swift 2.30.0 release.

To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.