After a successful upgrade of the control-plance from Train -> Ussuri on Ubuntu Bionic, we upgraded a first compute / network node and immediately ran into issues with Neutron:
We noticed that Neutron is extremely slow in setting up and wiring the network ports, so slow it would never finish and throw all sorts of errors (RabbitMQ connection timeouts, full sync required, ...)
We were now able to reproduce the error on our Ussuri DEV cloud as well:
1) First we used strace -ffff -p $PID_OF_NEUTRON_LINUXBRIDGE_AGENT and noticed that the data exchange on the unix socket between the rootwrap-daemon and the main process is really really slow.
One could actually read line by line the read calls to the fd of the socket.
2) We then (after adding lots of log lines and other intensive manual debugging) used py-spy (https://github.com/benfred/py-spy) via "py-spy top --pid $PID" on the running neutron-linuxbridge-agent process and noticed all the CPU time (process was at 100% most of the time) was spent in msgpack/fallback.py
3) Since the issue was not observed in TRAIN we compared the msgpack version used and noticed that TRAIN was using version 0.5.6 while Ussuri upgraded this dependency to 0.6.2.
4) We then installed version 0.5.6 of msgpack (ignoring the actual dependencies)
and et voila: The Neutron-Linuxbridge-Agent worked just like before (building one port every few seconds) and all network ports eventually converged to ACTIVE.
After a successful upgrade of the control-plance from Train -> Ussuri on Ubuntu Bionic, we upgraded a first compute / network node and immediately ran into issues with Neutron:
We noticed that Neutron is extremely slow in setting up and wiring the network ports, so slow it would never finish and throw all sorts of errors (RabbitMQ connection timeouts, full sync required, ...)
We were now able to reproduce the error on our Ussuri DEV cloud as well:
1) First we used strace -ffff -p $PID_OF_ NEUTRON_ LINUXBRIDGE_ AGENT and noticed that the data exchange on the unix socket between the rootwrap-daemon and the main process is really really slow.
One could actually read line by line the read calls to the fd of the socket.
2) We then (after adding lots of log lines and other intensive manual debugging) used py-spy (https:/ /github. com/benfred/ py-spy) via "py-spy top --pid $PID" on the running neutron- linuxbridge- agent process and noticed all the CPU time (process was at 100% most of the time) was spent in msgpack/fallback.py
3) Since the issue was not observed in TRAIN we compared the msgpack version used and noticed that TRAIN was using version 0.5.6 while Ussuri upgraded this dependency to 0.6.2.
4) We then installed version 0.5.6 of msgpack (ignoring the actual dependencies)
--- cut --- ubuntu- cloud.archive. canonical. com/ubuntu bionic- updates/ ussuri/ main amd64 Packages de.archive. ubuntu. com/ubuntu bionic/main amd64 Packages dpkg/status
apt policy python3-msgpack
python3-msgpack:
Installed: 0.6.2-1~cloud0
Candidate: 0.6.2-1~cloud0
Version table:
*** 0.6.2-1~cloud0 500
500 http://
0.5.6-1 500
500 http://
100 /var/lib/
--- cut ---
and et voila: The Neutron- Linuxbridge- Agent worked just like before (building one port every few seconds) and all network ports eventually converged to ACTIVE.
I could not yet spot which commit of msgpack changes (https:/ /github. com/msgpack/ msgpack- python/ compare/ 0.5.6.. .v0.6.2) might have caused this issue, but I am really certain that this is a major issue for Ussuri on Ubuntu Bionic.
There are "similar" issues with /bugs.launchpad .net/oslo. privsep/ +bug/1844822 /bugs.launchpad .net/oslo. privsep/ +bug/1896734
* https:/
* https:/
both related to msgpack or the size of messages exchanged.