Kill neutron-keepalived-state-change-monitor fails

Bug #1860326 reported by Slawek Kaplonski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Slawek Kaplonski

Bug Description

In case when graceful shutdown of neutron-keepalived-state-change-monitor with SIGTERM fails, Neutron will try to kill it with SIGKILL but as there is no correct rootwrap rule to kill it with -9 it will fail with error like:

2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent [-] Error while deleting router f2613902-6ea2-4f09-9fae-9d5a933c744e: multiprocessing.managers.RemoteError:
---------------------------------------------------------------------------
Unserializable message: Traceback (most recent call last):
  File "/usr/lib64/python3.6/multiprocessing/managers.py", line 283, in serve_client
    send(msg)
  File "/usr/lib/python3.6/site-packages/oslo_rootwrap/jsonrpc.py", line 128, in send
    s = self.dumps(obj)
  File "/usr/lib/python3.6/site-packages/oslo_rootwrap/jsonrpc.py", line 170, in dumps
    return json.dumps(obj, cls=RpcJSONEncoder).encode('utf-8')
  File "/usr/lib64/python3.6/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/usr/lib64/python3.6/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib64/python3.6/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.6/site-packages/oslo_rootwrap/jsonrpc.py", line 43, in default
    return super(RpcJSONEncoder, self).default(o)
  File "/usr/lib64/python3.6/json/encoder.py", line 180, in default
    o.__class__.__name__)
TypeError: Object of type 'ValueError' is not JSON serializable

---------------------------------------------------------------------------
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/common/utils.py", line 702, in wait_until_true
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent eventlet.sleep(sleep)
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/eventlet/greenthread.py", line 36, in sleep
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent hub.switch()
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent return self.greenlet.switch()
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent eventlet.timeout.Timeout: 10 seconds
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent During handling of the above exception, another exception occurred:
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/ha_router.py", line 420, in destroy_state_change_monitor
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent timeout=SIGTERM_TIMEOUT)
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/common/utils.py", line 707, in wait_until_true
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent raise WaitTimeout(_("Timed out after %d seconds") % timeout)
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent neutron.common.utils.WaitTimeout: Timed out after 10 seconds
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent During handling of the above exception, another exception occurred:
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 506, in _safe_router_removed
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent self._router_removed(ri, router_id)
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 542, in _router_removed
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent self.router_info[router_id] = ri
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent self.force_reraise()
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb)
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent raise value
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 539, in _router_removed
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent ri.delete()
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/ha_router.py", line 478, in delete
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent self.destroy_state_change_monitor(self.process_monitor)
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/ha_router.py", line 422, in destroy_state_change_monitor
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent pm.disable(sig=str(int(signal.SIGKILL)))
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/linux/external_process.py", line 113, in disable
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent utils.execute(cmd, run_as_root=self.run_as_root)
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py", line 122, in execute
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent execute_rootwrap_daemon(cmd, process_input, addl_env))
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py", line 109, in execute_rootwrap_daemon
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent LOG.error("Rootwrap error running command: %s", cmd)
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent self.force_reraise()
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb)
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent raise value
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py", line 106, in execute_rootwrap_daemon
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent return client.execute(cmd, process_input)
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/oslo_rootwrap/client.py", line 154, in execute
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent res = self._run_one_command(proxy, cmd, stdin)
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/oslo_rootwrap/client.py", line 139, in _run_one_command
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent res = proxy.run_one_command(cmd, stdin)
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "<string>", line 2, in run_one_command
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in _callmethod
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent raise convert_to_error(kind, result)
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent multiprocessing.managers.RemoteError:
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent ---------------------------------------------------------------------------
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent Unserializable message: Traceback (most recent call last):
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib64/python3.6/multiprocessing/managers.py", line 283, in serve_client
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent send(msg)
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/oslo_rootwrap/jsonrpc.py", line 128, in send
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent s = self.dumps(obj)
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/oslo_rootwrap/jsonrpc.py", line 170, in dumps
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent return json.dumps(obj, cls=RpcJSONEncoder).encode('utf-8')
2020-01-20 10:35:37.337 382811 ERROR neutron.agent.l3.agent File "/usr/lib64/python3.6/json/__init__.py", line 238, in dumps....

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/703366

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/703366
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d6fccd247f70abc84c8a480138e135717836c7b3
Submitter: Zuul
Branch: master

commit d6fccd247f70abc84c8a480138e135717836c7b3
Author: Slawek Kaplonski <email address hidden>
Date: Mon Jan 20 11:48:27 2020 +0100

    Allow to kill keepalived state change monitor process

    Usually Neutron stops neutron-keepalived-state-change-monitor process
    gracefully with SIGTERM.
    But in case if this will not stop process for some time, Neutron will
    try to kill this process with SIGKILL (-9).
    That was causing problem with rootwrap as kill filters for this process
    allowed to send only "-15" to it.
    Now it is possible to kill this process with "-9" too.

    Change-Id: Id019fa7649bd1158f9d56e63f8dad108d0ca8c1f
    Closes-bug: #1860326

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/704593

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/704594

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/704596

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/704597

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/704597
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e6c419b42c1b8c8902bedb02b791c4a2f4ea053b
Submitter: Zuul
Branch: stable/queens

commit e6c419b42c1b8c8902bedb02b791c4a2f4ea053b
Author: Slawek Kaplonski <email address hidden>
Date: Mon Jan 20 11:48:27 2020 +0100

    Allow to kill keepalived state change monitor process

    Usually Neutron stops neutron-keepalived-state-change-monitor process
    gracefully with SIGTERM.
    But in case if this will not stop process for some time, Neutron will
    try to kill this process with SIGKILL (-9).
    That was causing problem with rootwrap as kill filters for this process
    allowed to send only "-15" to it.
    Now it is possible to kill this process with "-9" too.

    Conflicts:
        etc/neutron/rootwrap.d/l3.filters

    Change-Id: Id019fa7649bd1158f9d56e63f8dad108d0ca8c1f
    Closes-bug: #1860326
    (cherry picked from commit d6fccd247f70abc84c8a480138e135717836c7b3)
    (cherry picked from commit f4d05266d21337538a2743a743ee1aa540407ac7)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/704594
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f4d05266d21337538a2743a743ee1aa540407ac7
Submitter: Zuul
Branch: stable/stein

commit f4d05266d21337538a2743a743ee1aa540407ac7
Author: Slawek Kaplonski <email address hidden>
Date: Mon Jan 20 11:48:27 2020 +0100

    Allow to kill keepalived state change monitor process

    Usually Neutron stops neutron-keepalived-state-change-monitor process
    gracefully with SIGTERM.
    But in case if this will not stop process for some time, Neutron will
    try to kill this process with SIGKILL (-9).
    That was causing problem with rootwrap as kill filters for this process
    allowed to send only "-15" to it.
    Now it is possible to kill this process with "-9" too.

    Conflicts:
        etc/neutron/rootwrap.d/l3.filters

    Change-Id: Id019fa7649bd1158f9d56e63f8dad108d0ca8c1f
    Closes-bug: #1860326
    (cherry picked from commit d6fccd247f70abc84c8a480138e135717836c7b3)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/train)

Reviewed: https://review.opendev.org/704593
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=79ea54b21dff0baaaaa8f83a60557c0be642e49e
Submitter: Zuul
Branch: stable/train

commit 79ea54b21dff0baaaaa8f83a60557c0be642e49e
Author: Slawek Kaplonski <email address hidden>
Date: Mon Jan 20 11:48:27 2020 +0100

    Allow to kill keepalived state change monitor process

    Usually Neutron stops neutron-keepalived-state-change-monitor process
    gracefully with SIGTERM.
    But in case if this will not stop process for some time, Neutron will
    try to kill this process with SIGKILL (-9).
    That was causing problem with rootwrap as kill filters for this process
    allowed to send only "-15" to it.
    Now it is possible to kill this process with "-9" too.

    Conflicts:
        etc/neutron/rootwrap.d/l3.filters

    Change-Id: Id019fa7649bd1158f9d56e63f8dad108d0ca8c1f
    Closes-bug: #1860326
    (cherry picked from commit d6fccd247f70abc84c8a480138e135717836c7b3)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/704596
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=be03bd225c8243a01208579128741608e8e037fe
Submitter: Zuul
Branch: stable/rocky

commit be03bd225c8243a01208579128741608e8e037fe
Author: Slawek Kaplonski <email address hidden>
Date: Mon Jan 20 11:48:27 2020 +0100

    Allow to kill keepalived state change monitor process

    Usually Neutron stops neutron-keepalived-state-change-monitor process
    gracefully with SIGTERM.
    But in case if this will not stop process for some time, Neutron will
    try to kill this process with SIGKILL (-9).
    That was causing problem with rootwrap as kill filters for this process
    allowed to send only "-15" to it.
    Now it is possible to kill this process with "-9" too.

    Conflicts:
        etc/neutron/rootwrap.d/l3.filters

    Change-Id: Id019fa7649bd1158f9d56e63f8dad108d0ca8c1f
    Closes-bug: #1860326
    (cherry picked from commit d6fccd247f70abc84c8a480138e135717836c7b3)
    (cherry picked from commit f4d05266d21337538a2743a743ee1aa540407ac7)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 15.0.2

This issue was fixed in the openstack/neutron 15.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.1.0

This issue was fixed in the openstack/neutron 14.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 16.0.0.0b1

This issue was fixed in the openstack/neutron 16.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.7

This issue was fixed in the openstack/neutron 13.0.7 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron queens-eol

This issue was fixed in the openstack/neutron queens-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.