While trying to debug a different issue, I encountered a situation where privsep hangs in the process of handling a request from neutron-openvswitch-agent when debug logging is enabled (juju debug-log neutron-openvswitch=true):
The issue gets reproduced reliably in the environment where I encountered it on all units. As a result, neutron-openvswitch-agent services hang while waiting for a response from the privsep daemon and do not progress past basic initialization. They never post any state back to the Neutron server and thus are marked dead by it.
The processes though are shown as "active (running)" by systemd which adds to the confusion since they do indeed start from the systemd's perspective.
systemctl --no-pager status neutron-openvswitch-agent.service
● neutron-openvswitch-agent.service - Openstack Neutron Open vSwitch Plugin Agent
Loaded: loaded (/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2020-09-23 08:28:41 UTC; 25min ago
Main PID: 247772 (/usr/bin/python)
Tasks: 4 (limit: 9830)
CGroup: /system.slice/neutron-openvswitch-agent.service
├─247772 /usr/bin/python3 /usr/bin/neutron-openvswitch-agent --config-file=/etc/neutron/neutron.conf --config-file=/etc/neutron/plugins/ml2/openvswitch_…og
└─248272 /usr/bin/python3 /usr/bin/privsep-helper --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini -…ck
An strace shows that the privsep daemon tries to receive input from fd 3 which is the unix socket it uses to communicate with the client:
# there is one extra neutron-openvvswitch-agent running in a LXD container which can be ignored here (there is an octavia unit on the node which has a neutron-openvswitch subordinate)
While trying to debug a different issue, I encountered a situation where privsep hangs in the process of handling a request from neutron- openvswitch- agent when debug logging is enabled (juju debug-log neutron- openvswitch= true):
https:/ /bugs.launchpad .net/charm- neutron- openvswitch/ +bug/1895652/ comments/ 11 /bugs.launchpad .net/charm- neutron- openvswitch/ +bug/1895652/ comments/ 12
https:/
The issue gets reproduced reliably in the environment where I encountered it on all units. As a result, neutron- openvswitch- agent services hang while waiting for a response from the privsep daemon and do not progress past basic initialization. They never post any state back to the Neutron server and thus are marked dead by it.
The processes though are shown as "active (running)" by systemd which adds to the confusion since they do indeed start from the systemd's perspective.
systemctl --no-pager status neutron- openvswitch- agent.service openvswitch- agent.service - Openstack Neutron Open vSwitch Plugin Agent system/ neutron- openvswitch- agent.service; enabled; vendor preset: enabled) slice/neutron- openvswitch- agent.service neutron- openvswitch- agent --config- file=/etc/ neutron/ neutron. conf --config- file=/etc/ neutron/ plugins/ ml2/openvswitch _…og privsep- helper --config-file /etc/neutron/ neutron. conf --config-file /etc/neutron/ plugins/ ml2/openvswitch _agent. ini -…ck
● neutron-
Loaded: loaded (/lib/systemd/
Active: active (running) since Wed 2020-09-23 08:28:41 UTC; 25min ago
Main PID: 247772 (/usr/bin/python)
Tasks: 4 (limit: 9830)
CGroup: /system.
├─247772 /usr/bin/python3 /usr/bin/
└─248272 /usr/bin/python3 /usr/bin/
------- ------- ------- ------- ------- ------- ------- -------
An strace shows that the privsep daemon tries to receive input from fd 3 which is the unix socket it uses to communicate with the client:
# there is one extra neutron- openvvswitch- agent running in a LXD container which can be ignored here (there is an octavia unit on the node which has a neutron-openvswitch subordinate)
root@node2:~# ps -eo pid,user,args --sort user | grep -P 'privsep. *openvswitch' privsep- helper --config-file /etc/neutron/ neutron. conf --config-file /etc/neutron/ plugins/ ml2/openvswitch _agent. ini --privsep_context neutron. privileged. default --privsep_sock_path /tmp/tmp910qakf k/privsep. sock privsep- helper --config-file /etc/neutron/ neutron. conf --config-file /etc/neutron/ plugins/ ml2/openvswitch _agent. ini --privsep_context neutron. privileged. default --privsep_sock_path /tmp/tmpcmwn7vo m/privsep. sock *openvswitch
860690 100000 /usr/bin/python3 /usr/bin/
248272 root /usr/bin/python3 /usr/bin/
363905 root grep --color=auto -P privsep.
root@node2:~# strace -f -p 248453 2>&1 1c1d0, FUTEX_WAIT_ BITSET_ PRIVATE| FUTEX_CLOCK_ REALTIME, 0, NULL, 0xffffffff <unfinished ...> 24590, FUTEX_WAIT_ BITSET_ PRIVATE| FUTEX_CLOCK_ REALTIME, 0, NULL, 0xffffffff <unfinished ...> d9fd0, FUTEX_WAIT_ BITSET_ PRIVATE| FUTEX_CLOCK_ REALTIME, 0, NULL, 0xffffffff <unfinished ...>
[pid 248786] futex(0x7f6a640
[pid 248475] futex(0x7f6a6c0
[pid 248473] futex(0x7f6a746
[pid 248453] recvfrom(3,
root@node2:~# lsof -p 248453 | grep 3u
privsep-h 248453 root 3u unix 0xffff8e6d8abdec00 0t0 356522977 type=STREAM
root@node2:~# ss -pax | grep 356522977 n/privsep. sock 356522978
* 356522977 users:( ("/usr/ bin/python" ,pid=247567, fd=16))
* 356522978 users:( ("privsep- helper" ,pid=248453, fd=3))
u_str ESTAB 0 0 /tmp/tmp2afa3en
u_str ESTAB 0 0 * 356522977
root@node2:~# lsof -p 247567 | grep 16u n/privsep. sock type=STREAM
/usr/bin/ 247567 neutron 16u unix 0xffff8e6d8abdb400 0t0 356522978 /tmp/tmp2afa3en