race condition in rootwrap daemon leaves processes behind when restarting the openvswitch-agent

Bug #1658977 reported by Ralf Haferkamp
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
oslo.rootwrap
Fix Released
Undecided
Unassigned

Bug Description

There is a race condition in the daemonization code of oslo.rootwrap that causes rootwrap-daemons to stay around when the parent process (e.g. neutron-openvswitch-agent) exits that the "wrong" point in time.

If neutron-openvswitch-agent is shutdown while it is just forking the neutron-rootwrap-daemon process, there is certain chance of it leaving a dangling neutron-rootwrap-daemon process behind. This happens when neutron-openvswitch-agent exits before it was able setup the Finalize() callback that is supposed to cleanup the forked rootwrap-daemon child on shutdown.

PS: In conjunction with https://bugs.launchpad.net/oslo.rootwrap/+bug/1658973 this prevents a successful restart of neutron-openvswitch-agent, because it fails to listen on port 6633.

Ralf Haferkamp (rhafer)
Changed in oslo.rootwrap:
assignee: nobody → Ralf Haferkamp (rhafer)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.rootwrap (master)

Fix proposed to branch: master
Review: https://review.openstack.org/424642

Changed in oslo.rootwrap:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on oslo.rootwrap (master)

Change abandoned by Ralf Haferkamp (<email address hidden>) on branch: master
Review: https://review.openstack.org/424642

Revision history for this message
Ralf Haferkamp (rhafer) wrote :

I am a bit lost about how to fix this. Here's some more details about the race. This is from oslo_rootwrap/daemon.py:

111 try:
112 # allow everybody to connect to the socket
[..]
117 try:
118 # In Python 3 we have to use buffer to push in bytes directly
119 stdout = sys.stdout.buffer
120 except AttributeError:
121 stdout = sys.stdout
122 stdout.write(socket_path.encode('utf-8'))
123 stdout.write(b'\n')
124 stdout.write(bytes(server.authkey))

If the client (e.g. openvswitch-agent) exits before this, that's ok. We will notice as we receive a SIGPIPE. But now we're closing our communications channel with the client until the server.serve_forever() call (line 133) and until the client connected to it. If the client exit for whatever reason during this time, the rootwrap-daemon won't notice that will not stop. (Note: It's also not possible have the client kill the server as it's running as root, while the client is likely running as an unprivileged user.

125 sys.stdin.close()
126 sys.stdout.close()
127 sys.stderr.close()
128 # Gracefully shutdown on INT or TERM signals
129 stop = functools.partial(daemon_stop, server)
130 signal.signal(signal.SIGTERM, stop)
131 signal.signal(signal.SIGINT, stop)
132 LOG.info("Starting rootwrap daemon main loop")
133 server.serve_forever()
134 finally:

Any hints on how to fix this are highly appreciated.

Changed in oslo.rootwrap:
assignee: Ralf Haferkamp (rhafer) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to oslo.rootwrap (master)

Reviewed: https://review.openstack.org/438816
Committed: https://git.openstack.org/cgit/openstack/oslo.rootwrap/commit/?id=6285b63572c893391cb1a9e0c482658938f13329
Submitter: Jenkins
Branch: master

commit 6285b63572c893391cb1a9e0c482658938f13329
Author: IWAMOTO Toshihiro <email address hidden>
Date: Tue Feb 28 15:12:01 2017 +0900

    Allow rootwrap-daemon to timeout and exit

    If the client side abnormally exits, its rootwrap daemon cannot
    receive a shutdown message and will be left forever. Let it timeout
    and exit to save such cases.

    Change-Id: I783717b5fa019371747b98bf92965b6e689603f6
    Related-bug: #1658973
    Related-bug: #1658977
    Related-bug: #1663458

Revision history for this message
Adam Spiers (adam.spiers) wrote :

Can this be marked as resolved now?

Revision history for this message
Ralf Haferkamp (rhafer) wrote :

Yeah seems the fix/workaround was released with oslo.rootwrap 5.6.0

Changed in oslo.rootwrap:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to oslo.rootwrap (stable/ocata)

Related fix proposed to branch: stable/ocata
Review: https://review.openstack.org/448744

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to oslo.rootwrap (stable/newton)

Related fix proposed to branch: stable/newton
Review: https://review.openstack.org/448745

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to oslo.rootwrap (stable/ocata)

Reviewed: https://review.openstack.org/448744
Committed: https://git.openstack.org/cgit/openstack/oslo.rootwrap/commit/?id=fdacd0e60817db8455b3c2f21b60e8a2130953aa
Submitter: Jenkins
Branch: stable/ocata

commit fdacd0e60817db8455b3c2f21b60e8a2130953aa
Author: IWAMOTO Toshihiro <email address hidden>
Date: Tue Feb 28 15:12:01 2017 +0900

    Allow rootwrap-daemon to timeout and exit

    If the client side abnormally exits, its rootwrap daemon cannot
    receive a shutdown message and will be left forever. Let it timeout
    and exit to save such cases.

    Change-Id: I783717b5fa019371747b98bf92965b6e689603f6
    Related-bug: #1658973
    Related-bug: #1658977
    Related-bug: #1663458
    (cherry picked from commit 6285b63572c893391cb1a9e0c482658938f13329)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to oslo.rootwrap (stable/newton)

Reviewed: https://review.openstack.org/448745
Committed: https://git.openstack.org/cgit/openstack/oslo.rootwrap/commit/?id=af8ad2da809f68442da9aacd17a47bca342eb355
Submitter: Jenkins
Branch: stable/newton

commit af8ad2da809f68442da9aacd17a47bca342eb355
Author: IWAMOTO Toshihiro <email address hidden>
Date: Tue Feb 28 15:12:01 2017 +0900

    Allow rootwrap-daemon to timeout and exit

    If the client side abnormally exits, its rootwrap daemon cannot
    receive a shutdown message and will be left forever. Let it timeout
    and exit to save such cases.

    Change-Id: I783717b5fa019371747b98bf92965b6e689603f6
    Related-bug: #1658973
    Related-bug: #1658977
    Related-bug: #1663458
    (cherry picked from commit 6285b63572c893391cb1a9e0c482658938f13329)

tags: added: in-stable-newton
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.