Comment 7 for bug 1839592

Revision history for this message
Juul Spies (juul) wrote :

I just came across this bug report and would like to share my expierence.

I've been having similar issues on 6 servers since we upgraded from 16.04 to 18.04 about 2 years ago with openvswitch.
Our biggest problem is our inabilty to reproduce it. We just see Openvswitch hanging from time to time. Sometimes it takes a day to get stuck, sometimes it takes months.
The only way to recover from it is to restart openvswitch.

Right now we are running with a backport of openvswitch from Disco (2.11.0-0ubuntu2) in Bionic. With that version backported we are having the same issues as with the previously installed 2.9.2-0ubuntu0.18.04.3 version that Bionic has.

I have gbd traces from both versions which I will attach.

Here a small portion from the ovs log and gdb trace of openvswitch 2.9.2-0ubuntu0.18.04.3:
Sun Aug 25 06:16:14 2019-2019-08-25T04:16:14.943Z|00001|ovs_rcu(urcu4)|WARN|blocked 1000 ms waiting for revalidator127 to quiesce
Sun Aug 25 06:16:15 2019-2019-08-25T04:16:15.943Z|00002|ovs_rcu(urcu4)|WARN|blocked 2000 ms waiting for revalidator127 to quiesce
Sun Aug 25 06:16:50 2019-2019-08-25T04:16:17.943Z|00003|ovs_rcu(urcu4)|WARN|blocked 4001 ms waiting for revalidator127 to quiesce

Small portion of the trace:
32 Thread 0x7f1bfa7fc700 (LWP 1461) "revalidator127" 0x00007f1c61aeb37b in futex_abstimed_wait (private=<optimized out>, abstime=0x0, expected=10, futex_word=0x55e4ed0aa800 <rwlock>) at ../sysdeps/unix/sysv/linux/futex-internal.h:172

The full trace is attached in gdbwrap.1566706577.log.gz (Openvswitch 2.9.2)