udpif_revalidator crash in ofpbuf_resize__

Bug #1916708 reported by Trent Lloyd
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openvswitch (Ubuntu)
New
Undecided
Unassigned

Bug Description

The udpif_revalidator thread crashed in ofpbuf_resize__ on openvswitch 2.9.2-0ubuntu0.18.04.3~cloud0 (on 16.04 from the xenial-queens cloud archive, backported from the 18.04 release of the same version). Kernel version was 4.4.0-159-generic.

The issue is suspected to still exist in upstream master as Feb 2021/v2.15.0 but has not been completed understood. Opening this bug to track future occurances.

The general issue appears to be that the udpif_revaliditator thread tried
to expand a stack-allocated ofpbuf to fit a netlink reply with size 3204
but the buffer is of size 2048. This intentionally raises an assertion as
we can't expand the memory on the stack.

The crash in __ofpbuf_resize__ appears due to OVS_NOT_REACHED() being
called because b->source = OFPBUF_STACK (the line number indicates it's the
default: case but this appears to be an optimiser quirk, b->source is
OFPBUF_STACK). We can't realloc() the buffer memory if it's allocated on
the stack.

This buffer is provided in #7 nl_sock_transact_multiple__ during the call
to nl_sock_recv__, specified as buf_txn->reply. In this specific case it
seems we found transactions[0] available and so we used that rather than
tmp_txn.
The original source of transactions (it's passed through most of the
function calls) appears to be op_auxdata allocated on the stack at the top
of the dpif_netlink_operate__ function (dpif-netlink.c:1875).

The size of this particular message was 3204, so 2048 went into the buffer
and 1156 went into the tail iovector setup inside nl_sock_recv__ which it
then tried to expand the ofpbuf to hold. Various nl_sock_* functions have
comments about the buffer ideally being the right size for optimal
performance (I guess to avoid the reallocation), but it seems like a
possible oversight in the dpif_netlink_operate__ workflow that the
nl_sock_* functions may ultimately want to try to expand that buffer and
then fail because of the stack allocation.

The relevant source tree can be found here:
git clone -b applied/2.9.2-0ubuntu0.18.04.3
https://git.launchpad.net/ubuntu/+source/openvswitch
https://git.launchpad.net/ubuntu/+source/openvswitch/tree/?h=applied/2.9.2-0ubuntu0.18.04.3

Thread 1 (Thread 0x7f3e0ffff700 (LWP 1539131)):
#0 0x00007f3ed30c8428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007f3ed30ca02a in __GI_abort () at abort.c:89
#2 0x00000000004e5035 in ofpbuf_resize__ (b=b@entry=0x7f3e0fffb050, new_headroom=<optimized out>, new_tailroom=new_tailroom@entry=1156) at ../lib/ofpbuf.c:262
#3 0x00000000004e5338 in ofpbuf_prealloc_tailroom (b=b@entry=0x7f3e0fffb050, size=size@entry=1156) at ../lib/ofpbuf.c:291
#4 0x00000000004e54e5 in ofpbuf_put_uninit (size=size@entry=1156, b=b@entry=0x7f3e0fffb050) at ../lib/ofpbuf.c:365
#5 ofpbuf_put (b=b@entry=0x7f3e0fffb050, p=p@entry=0x7f3e0ffcf0a0, size=size@entry=1156) at ../lib/ofpbuf.c:388
#6 0x00000000005392a6 in nl_sock_recv__ (sock=sock@entry=0x7f3e50009150, buf=0x7f3e0fffb050, wait=wait@entry=false) at ../lib/netlink-socket.c:705
#7 0x0000000000539474 in nl_sock_transact_multiple__ (sock=sock@entry=0x7f3e50009150, transactions=transactions@entry=0x7f3e0ffdff20, n=1, done=done@entry=0x7f3e0ffdfe10) at ../lib/netlink-socket.c:824
#8 0x000000000053980a in nl_sock_transact_multiple (sock=0x7f3e50009150, transactions=transactions@entry=0x7f3e0ffdff20, n=n@entry=1) at ../lib/netlink-socket.c:1009
#9 0x000000000053aa1b in nl_sock_transact_multiple (n=1, transactions=0x7f3e0ffdff20, sock=<optimized out>) at ../lib/netlink-socket.c:1765
#10 nl_transact_multiple (protocol=protocol@entry=16, transactions=transactions@entry=0x7f3e0ffdff20, n=n@entry=1) at ../lib/netlink-socket.c:1764
#11 0x0000000000528b01 in dpif_netlink_operate__ (dpif=dpif@entry=0x25a6150, ops=ops@entry=0x7f3e0fffaf28, n_ops=n_ops@entry=1) at ../lib/dpif-netlink.c:1964
#12 0x0000000000529956 in dpif_netlink_operate_chunks (n_ops=1, ops=0x7f3e0fffaf28, dpif=<optimized out>) at ../lib/dpif-netlink.c:2243
#13 dpif_netlink_operate (dpif_=0x25a6150, ops=<optimized out>, n_ops=<optimized out>) at ../lib/dpif-netlink.c:2279
#14 0x00000000004756de in dpif_operate (dpif=0x25a6150, ops=<optimized out>, ops@entry=0x7f3e0fffaf28, n_ops=n_ops@entry=1) at ../lib/dpif.c:1359
#15 0x00000000004758e7 in dpif_flow_get (dpif=<optimized out>, key=<optimized out>, key_len=<optimized out>, ufid=<optimized out>, pmd_id=<optimized out>, buf=buf@entry=0x7f3e0fffb050, flow=<optimized out>) at ../lib/dpif.c:1014
#16 0x000000000043f662 in ukey_create_from_dpif_flow (udpif=0x229cbf0, udpif=0x229cbf0, ukey=<synthetic pointer>, flow=0x7f3e0fffc790) at ../ofproto/ofproto-dpif-upcall.c:1709
#17 ukey_acquire (error=<synthetic pointer>, result=<synthetic pointer>, flow=0x7f3e0fffc790, udpif=0x229cbf0) at ../ofproto/ofproto-dpif-upcall.c:1914
#18 revalidate (revalidator=0x250eaa8) at ../ofproto/ofproto-dpif-upcall.c:2473
#19 0x000000000043f816 in udpif_revalidator (arg=0x250eaa8) at ../ofproto/ofproto-dpif-upcall.c:913
#20 0x00000000004ea4b4 in ovsthread_wrapper (aux_=<optimized out>) at ../lib/ovs-thread.c:348
#21 0x00007f3ed39756ba in start_thread (arg=0x7f3e0ffff700) at pthread_create.c:333
#22 0x00007f3ed319a41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Tags: sts
Revision history for this message
Trent Lloyd (lathiat) wrote :
tags: added: sts
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.