LDAP paging leaks memory

Bug #1896125 reported by Lance Bragstad
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
Fix Released
Undecided
Lance Bragstad

Bug Description

If you're using page_size [0] and are integrating keystone with an LDAP server that supports paging (like Active Directory), it's possible to see keystone memory footprint slowly increase over time.

The problem isn't as noticable with large page sizes (e.g., page_size = 10000). But it's noticable when you use small page sizes (e.g., page_size = 5).

I hit this issue using Active Directory with 10,000 users. I set my page_size to 5 and listed users continuously for an hour. During that time I noticed keystone's total memory consumption on the host increase from 5% to 14%.

Additionally, the problem is exacerbated using page_size = 1.

I was unsuccessful in reproducing this issue with FreeIPA, which is another LDAP implementation, but it doesn't support paging. Keystone automatically disables paging if the LDAP server doesn't support it.

It seems there is a memory leak somewhere in keystone's LDAP paging implementation.

[0] https://docs.openstack.org/keystone/latest/configuration/config-options.html#ldap.page_size

tags: added: ldap
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (master)

Fix proposed to branch: master
Review: https://review.opendev.org/754488

Changed in keystone:
assignee: nobody → Lance Bragstad (lbragstad)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/755734

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/755735

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/755736

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystone (master)
Download full text (7.3 KiB)

Reviewed: https://review.opendev.org/754488
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=e98d1ac622f74cbbb41872226ab755bf4ff3ed84
Submitter: Zuul
Branch: master

commit e98d1ac622f74cbbb41872226ab755bf4ff3ed84
Author: Lance Bragstad <email address hidden>
Date: Tue Sep 29 10:20:13 2020 -0500

    Implement more robust connection handling for asynchronous LDAP calls

    Keystone's paging implementation contains a memory leak. The issue is
    noticeable if you integrate keystone with an LDAP server that supports
    paging and set `keystone.conf [ldap] page_size` to a low integer
    (e.g., 5).

    Keystone's LDAP backend uses `python-ldap` to interact with LDAP
    servers. For paged requests, it uses `search_ext()`, which is an
    asynchronous API [0]. The server responds with a message ID, which the
    client uses to retrieve all data for the request. In keystone's case,
    the `search_ext()` method is invoked with a page control that tells
    the server to deliver responses in increments according to the page
    size configured with `keystone.conf [ldap] page_size`. So long as the
    client has the original connection used to fetch the message ID, it
    can request the rest of the information associated to the request.

    Keystone's paging implementation loops continuously for paged
    requests. It takes the message ID it gets from `search_ext()` and
    calls `result3()`, asking the server for the data associated with that
    specific message. Keystone continues to do this until the server sends
    an indicator that it has no more data relevant to the query (via a
    cookie). The `search_ext()` and `result3()` methods must use the same
    LDAP connection.

    Given the above information, keystone uses context managers to provide
    connections. This is relevant when deploying connection pools, where
    certain connections are re-used from a pool. Keystone relies on Python
    context managers to handle connections, which is pretty typical
    use-case for context managers. Connection managers allow us to do the
    following (assuming pseudocode):

      with self.get_connection as conn:
          response = conn.search_s()
          return format(response)

    The above snippet assumes the `get_connection` method provides a
    connection object and a callable that implements `search_s`. Upon
    exiting the `with` statement, the connection is disconnected, or put
    back into the pool, or whatever the implementation of the context
    manager decides to do. Most connections in the LDAP backend are
    handled in this fashion.

    Unfortunately, the LDAP driver is somewhat oblivious to paging, it's
    control implementation, or the fact that it uses an asynchronous API.
    Instead, the driver leaves it up to the handler objects it uses for
    connections to determine if the request should be controlled via
    paging. This is an anti-pattern since the backend establishes the
    connection for the request but doesn't ensure that connection is
    safely handled for asynchronous APIs.

    This forces the `search_ext()` and `result3()` implementations...

Read more...

Changed in keystone:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystone (stable/victoria)
Download full text (7.3 KiB)

Reviewed: https://review.opendev.org/755734
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=a26a40d441e7f211fd68e0c1b9c779d965c19dea
Submitter: Zuul
Branch: stable/victoria

commit a26a40d441e7f211fd68e0c1b9c779d965c19dea
Author: Lance Bragstad <email address hidden>
Date: Tue Sep 29 10:20:13 2020 -0500

    Implement more robust connection handling for asynchronous LDAP calls

    Keystone's paging implementation contains a memory leak. The issue is
    noticeable if you integrate keystone with an LDAP server that supports
    paging and set `keystone.conf [ldap] page_size` to a low integer
    (e.g., 5).

    Keystone's LDAP backend uses `python-ldap` to interact with LDAP
    servers. For paged requests, it uses `search_ext()`, which is an
    asynchronous API [0]. The server responds with a message ID, which the
    client uses to retrieve all data for the request. In keystone's case,
    the `search_ext()` method is invoked with a page control that tells
    the server to deliver responses in increments according to the page
    size configured with `keystone.conf [ldap] page_size`. So long as the
    client has the original connection used to fetch the message ID, it
    can request the rest of the information associated to the request.

    Keystone's paging implementation loops continuously for paged
    requests. It takes the message ID it gets from `search_ext()` and
    calls `result3()`, asking the server for the data associated with that
    specific message. Keystone continues to do this until the server sends
    an indicator that it has no more data relevant to the query (via a
    cookie). The `search_ext()` and `result3()` methods must use the same
    LDAP connection.

    Given the above information, keystone uses context managers to provide
    connections. This is relevant when deploying connection pools, where
    certain connections are re-used from a pool. Keystone relies on Python
    context managers to handle connections, which is pretty typical
    use-case for context managers. Connection managers allow us to do the
    following (assuming pseudocode):

      with self.get_connection as conn:
          response = conn.search_s()
          return format(response)

    The above snippet assumes the `get_connection` method provides a
    connection object and a callable that implements `search_s`. Upon
    exiting the `with` statement, the connection is disconnected, or put
    back into the pool, or whatever the implementation of the context
    manager decides to do. Most connections in the LDAP backend are
    handled in this fashion.

    Unfortunately, the LDAP driver is somewhat oblivious to paging, it's
    control implementation, or the fact that it uses an asynchronous API.
    Instead, the driver leaves it up to the handler objects it uses for
    connections to determine if the request should be controlled via
    paging. This is an anti-pattern since the backend establishes the
    connection for the request but doesn't ensure that connection is
    safely handled for asynchronous APIs.

    This forces the `search_ext()` and `result3()` implem...

Read more...

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystone (stable/ussuri)
Download full text (7.3 KiB)

Reviewed: https://review.opendev.org/755735
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=35c7406bffdd71cf63c65b8628f4eafa28baaac7
Submitter: Zuul
Branch: stable/ussuri

commit 35c7406bffdd71cf63c65b8628f4eafa28baaac7
Author: Lance Bragstad <email address hidden>
Date: Tue Sep 29 10:20:13 2020 -0500

    Implement more robust connection handling for asynchronous LDAP calls

    Keystone's paging implementation contains a memory leak. The issue is
    noticeable if you integrate keystone with an LDAP server that supports
    paging and set `keystone.conf [ldap] page_size` to a low integer
    (e.g., 5).

    Keystone's LDAP backend uses `python-ldap` to interact with LDAP
    servers. For paged requests, it uses `search_ext()`, which is an
    asynchronous API [0]. The server responds with a message ID, which the
    client uses to retrieve all data for the request. In keystone's case,
    the `search_ext()` method is invoked with a page control that tells
    the server to deliver responses in increments according to the page
    size configured with `keystone.conf [ldap] page_size`. So long as the
    client has the original connection used to fetch the message ID, it
    can request the rest of the information associated to the request.

    Keystone's paging implementation loops continuously for paged
    requests. It takes the message ID it gets from `search_ext()` and
    calls `result3()`, asking the server for the data associated with that
    specific message. Keystone continues to do this until the server sends
    an indicator that it has no more data relevant to the query (via a
    cookie). The `search_ext()` and `result3()` methods must use the same
    LDAP connection.

    Given the above information, keystone uses context managers to provide
    connections. This is relevant when deploying connection pools, where
    certain connections are re-used from a pool. Keystone relies on Python
    context managers to handle connections, which is pretty typical
    use-case for context managers. Connection managers allow us to do the
    following (assuming pseudocode):

      with self.get_connection as conn:
          response = conn.search_s()
          return format(response)

    The above snippet assumes the `get_connection` method provides a
    connection object and a callable that implements `search_s`. Upon
    exiting the `with` statement, the connection is disconnected, or put
    back into the pool, or whatever the implementation of the context
    manager decides to do. Most connections in the LDAP backend are
    handled in this fashion.

    Unfortunately, the LDAP driver is somewhat oblivious to paging, it's
    control implementation, or the fact that it uses an asynchronous API.
    Instead, the driver leaves it up to the handler objects it uses for
    connections to determine if the request should be controlled via
    paging. This is an anti-pattern since the backend establishes the
    connection for the request but doesn't ensure that connection is
    safely handled for asynchronous APIs.

    This forces the `search_ext()` and `result3()` implemen...

Read more...

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystone (stable/train)
Download full text (7.3 KiB)

Reviewed: https://review.opendev.org/755736
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=105f95795f661f8106b3f33b87662024e5bf6dcb
Submitter: Zuul
Branch: stable/train

commit 105f95795f661f8106b3f33b87662024e5bf6dcb
Author: Lance Bragstad <email address hidden>
Date: Tue Sep 29 10:20:13 2020 -0500

    Implement more robust connection handling for asynchronous LDAP calls

    Keystone's paging implementation contains a memory leak. The issue is
    noticeable if you integrate keystone with an LDAP server that supports
    paging and set `keystone.conf [ldap] page_size` to a low integer
    (e.g., 5).

    Keystone's LDAP backend uses `python-ldap` to interact with LDAP
    servers. For paged requests, it uses `search_ext()`, which is an
    asynchronous API [0]. The server responds with a message ID, which the
    client uses to retrieve all data for the request. In keystone's case,
    the `search_ext()` method is invoked with a page control that tells
    the server to deliver responses in increments according to the page
    size configured with `keystone.conf [ldap] page_size`. So long as the
    client has the original connection used to fetch the message ID, it
    can request the rest of the information associated to the request.

    Keystone's paging implementation loops continuously for paged
    requests. It takes the message ID it gets from `search_ext()` and
    calls `result3()`, asking the server for the data associated with that
    specific message. Keystone continues to do this until the server sends
    an indicator that it has no more data relevant to the query (via a
    cookie). The `search_ext()` and `result3()` methods must use the same
    LDAP connection.

    Given the above information, keystone uses context managers to provide
    connections. This is relevant when deploying connection pools, where
    certain connections are re-used from a pool. Keystone relies on Python
    context managers to handle connections, which is pretty typical
    use-case for context managers. Connection managers allow us to do the
    following (assuming pseudocode):

      with self.get_connection as conn:
          response = conn.search_s()
          return format(response)

    The above snippet assumes the `get_connection` method provides a
    connection object and a callable that implements `search_s`. Upon
    exiting the `with` statement, the connection is disconnected, or put
    back into the pool, or whatever the implementation of the context
    manager decides to do. Most connections in the LDAP backend are
    handled in this fashion.

    Unfortunately, the LDAP driver is somewhat oblivious to paging, it's
    control implementation, or the fact that it uses an asynchronous API.
    Instead, the driver leaves it up to the handler objects it uses for
    connections to determine if the request should be controlled via
    paging. This is an anti-pattern since the backend establishes the
    connection for the request but doesn't ensure that connection is
    safely handled for asynchronous APIs.

    This forces the `search_ext()` and `result3()` implement...

Read more...

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/keystone 19.0.0.0rc1

This issue was fixed in the openstack/keystone 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/keystone 16.0.2

This issue was fixed in the openstack/keystone 16.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/keystone 17.0.1

This issue was fixed in the openstack/keystone 17.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/keystone 18.1.0

This issue was fixed in the openstack/keystone 18.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.