"ctypes.CDLL" C functions could release the GIL during the execution call

Bug #1870352 reported by Rodolfo Alonso
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Rodolfo Alonso

Bug Description

Some Linux IP library functions make use of "ctype.CDLL" methods (create_netns, remove_netns). Those methods are called inside a privsep context; that means the function reference and the arguments are passed to a privileged context that will execute the method. "privsep" makes use of eventlet to implement multitasking. If the method executed returns the GIL, nothing guarantees that the eventlet executor will return it again to this task. Therefore we can see, in the CI, timeouts during the execution of those functions.

Log: https://81525168d755db537877-a5e4e29d4d6432c5c7202337ef0214bc.ssl.cf1.rackcdn.com/714731/1/gate/neutron-fullstack/8a9753b/testr_results.html

Log snippet: http://paste.openstack.org/show/791531/

Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/717017

Changed in neutron:
status: New → In Progress
Changed in neutron:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/717017
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=306280813f34f8bbe384ae5bea67f0f66e316b61
Submitter: Zuul
Branch: master

commit 306280813f34f8bbe384ae5bea67f0f66e316b61
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Apr 2 13:49:19 2020 +0000

    Replace ctype.CDLL by ctypes.PyDLL in linux.ip_lib

    Some linux.ip_lib functions make use of "ctype.CDLL" methods
    (create_netns, remove_netns). Those methods are called inside a
    "privsep" context; that means the function reference and the
    arguments are passed to a privileged context that will execute
    the method.

    "privsep" library makes use of eventlet to implement multitasking.
    If the method executed returns the GIL, nothing guarantees that
    the "eventlet" executor will return it again to this task. This
    could lead to timeouts during the execution of those methods.

    From https://docs.python.org/3.6/library/ctypes.html#ctypes.PyDLL:
      "Instances of this class behave like CDLL instances, except that
       the Python GIL is not released during the function call, and
       after the function execution the Python error flag is checked."

    Change-Id: I36ef9bf59e9c93f50464457a5d9a968738844079
    Closes-Bug: #1870352

Changed in neutron:
status: In Progress → Fix Released
tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/721699

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

This problem is also affecting to any other method executed in a namespace context. In order to avoid this problem, the method setting the namespace in Pyroute2 should also needs use ctypes.PyDLL shared library, instead of ctypes.CDLL.

Related bug in Pyroute2: https://github.com/svinota/pyroute2/issues/702
Related patch: https://github.com/ralonsoh/pyroute2/commit/0b8a9d21f2280b9cec700d0cdee32288f9514220

If this patch is merged, then we'll need to bump Pyroute2 version and call "NetNS" context with ctypes.PyDLL (similar to I36ef9bf59e9c93f50464457a5d9a968738844079).

Regards.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/722254

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/train)

Reviewed: https://review.opendev.org/721699
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4ef96cafa3a6804d8d40a5ca259dc56700c49581
Submitter: Zuul
Branch: stable/train

commit 4ef96cafa3a6804d8d40a5ca259dc56700c49581
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Apr 2 13:49:19 2020 +0000

    Replace ctype.CDLL by ctypes.PyDLL in linux.ip_lib

    Some linux.ip_lib functions make use of "ctype.CDLL" methods
    (create_netns, remove_netns). Those methods are called inside a
    "privsep" context; that means the function reference and the
    arguments are passed to a privileged context that will execute
    the method.

    "privsep" library makes use of eventlet to implement multitasking.
    If the method executed returns the GIL, nothing guarantees that
    the "eventlet" executor will return it again to this task. This
    could lead to timeouts during the execution of those methods.

    From https://docs.python.org/3.6/library/ctypes.html#ctypes.PyDLL:
      "Instances of this class behave like CDLL instances, except that
       the Python GIL is not released during the function call, and
       after the function execution the Python error flag is checked."

    Change-Id: I36ef9bf59e9c93f50464457a5d9a968738844079
    Closes-Bug: #1870352
    (cherry picked from commit 306280813f34f8bbe384ae5bea67f0f66e316b61)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/722254
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=68e5e1b8fe87b0b4938236f8f8570d92ae044e20
Submitter: Zuul
Branch: master

commit 68e5e1b8fe87b0b4938236f8f8570d92ae044e20
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Apr 23 09:42:44 2020 +0000

    Specify C shared library in Pyroute2 namespace context

    Since [1], it's possible to specify the shared library to be used
    when creating a Pyroute2 namespace context.

    As commented in [2], "privsep" library makes use of eventlet to
    implement multitasking. If the method executed returns the GIL,
    nothing guarantees that the "eventlet" executor will return it
    again to this task. This could lead to timeouts during the
    execution of those methods.

    From https://docs.python.org/3.6/library/ctypes.html#ctypes.PyDLL:
      "Instances of this class behave like CDLL instances, except that
       the Python GIL is not released during the function call, and
       after the function execution the Python error flag is checked."

    [1]https://github.com/svinota/pyroute2/issues/702
    [2]https://review.opendev.org/#/c/717017/

    Change-Id: I6c9f9adba8b4433cc96704bb69dd4e0d4b154ebd
    Related-Bug: #1870352

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/751210

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/751215

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/751217

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/751210
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=201c27202c302dc8199b796e0adc41b7b15dd6fb
Submitter: Zuul
Branch: stable/stein

commit 201c27202c302dc8199b796e0adc41b7b15dd6fb
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Apr 2 13:49:19 2020 +0000

    Replace ctype.CDLL by ctypes.PyDLL in linux.ip_lib

    Some linux.ip_lib functions make use of "ctype.CDLL" methods
    (create_netns, remove_netns). Those methods are called inside a
    "privsep" context; that means the function reference and the
    arguments are passed to a privileged context that will execute
    the method.

    "privsep" library makes use of eventlet to implement multitasking.
    If the method executed returns the GIL, nothing guarantees that
    the "eventlet" executor will return it again to this task. This
    could lead to timeouts during the execution of those methods.

    From https://docs.python.org/3.6/library/ctypes.html#ctypes.PyDLL:
      "Instances of this class behave like CDLL instances, except that
       the Python GIL is not released during the function call, and
       after the function execution the Python error flag is checked."

    Change-Id: I36ef9bf59e9c93f50464457a5d9a968738844079
    Closes-Bug: #1870352
    (cherry picked from commit 306280813f34f8bbe384ae5bea67f0f66e316b61)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/751217
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=aacf433edaca043c5164eaa5c45556c140b8b36a
Submitter: Zuul
Branch: stable/queens

commit aacf433edaca043c5164eaa5c45556c140b8b36a
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Apr 2 13:49:19 2020 +0000

    Replace ctype.CDLL by ctypes.PyDLL in linux.ip_lib

    Some linux.ip_lib functions make use of "ctype.CDLL" methods
    (create_netns, remove_netns). Those methods are called inside a
    "privsep" context; that means the function reference and the
    arguments are passed to a privileged context that will execute
    the method.

    "privsep" library makes use of eventlet to implement multitasking.
    If the method executed returns the GIL, nothing guarantees that
    the "eventlet" executor will return it again to this task. This
    could lead to timeouts during the execution of those methods.

    From https://docs.python.org/3.6/library/ctypes.html#ctypes.PyDLL:
      "Instances of this class behave like CDLL instances, except that
       the Python GIL is not released during the function call, and
       after the function execution the Python error flag is checked."

    Change-Id: I36ef9bf59e9c93f50464457a5d9a968738844079
    Closes-Bug: #1870352
    (cherry picked from commit 306280813f34f8bbe384ae5bea67f0f66e316b61)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/751215
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=574ba2e98251690f73bfd09cf47c17caa712eb4c
Submitter: Zuul
Branch: stable/rocky

commit 574ba2e98251690f73bfd09cf47c17caa712eb4c
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Apr 2 13:49:19 2020 +0000

    Replace ctype.CDLL by ctypes.PyDLL in linux.ip_lib

    Some linux.ip_lib functions make use of "ctype.CDLL" methods
    (create_netns, remove_netns). Those methods are called inside a
    "privsep" context; that means the function reference and the
    arguments are passed to a privileged context that will execute
    the method.

    "privsep" library makes use of eventlet to implement multitasking.
    If the method executed returns the GIL, nothing guarantees that
    the "eventlet" executor will return it again to this task. This
    could lead to timeouts during the execution of those methods.

    From https://docs.python.org/3.6/library/ctypes.html#ctypes.PyDLL:
      "Instances of this class behave like CDLL instances, except that
       the Python GIL is not released during the function call, and
       after the function execution the Python error flag is checked."

    Change-Id: I36ef9bf59e9c93f50464457a5d9a968738844079
    Closes-Bug: #1870352
    (cherry picked from commit 306280813f34f8bbe384ae5bea67f0f66e316b61)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron queens-eol

This issue was fixed in the openstack/neutron queens-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron rocky-eol

This issue was fixed in the openstack/neutron rocky-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.