nl_recv returned with error: No buffer space available

Bug #2025647 reported by Roberto Bartzen Acosta
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
New
Undecided
Unassigned

Bug Description

We are seeing the following error when creating VMs:

Jun 28 19:33:29 SR-0001 libvirtd[5289]: nl_recv returned with error: No buffer space available

This problem occurs in deployments with different OS versions.

Ubuntu 20.04.4 LTS
ii libnl-3-200:amd64 3.4.0-1
ii libvirt-daemon 6.0.0-0ubuntu8.16

Ubuntu 22.04 LTS
ii libnl-3-200:amd64 3.5.0-0.1
ii libvirt-daemon 8.0.0-1ubuntu7.5

The host has around 40 native interfaces (bounds, vlans, tunneling bridges, mgmt bridges, physical NICs, etc), and plus 40 or more tap interfaces (depending on the number of VMs allocated), but we observed errors with more than 40 tap interfaces added to the host.

I saw with strace that libvirt receives messages with more than 128k (one message with approximately 170k), apparently the MSG_PEEK mechanism seems to be working, but it may fail in some cases.

In a "good received flow" of incoming messages I see something like the log below, multiple messages for the same sequence number until an error occurs.

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=1336, nlmsg_type=RTM_NEWLINK, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1688146694, nlmsg_pid=3840517},....

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 2840

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=1420, nlmsg_type=RTM_NEWLINK, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1688146694, nlmsg_pid=3840517},.....

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 30740

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=1420, nlmsg_type=RTM_NEWLINK, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1688146694, nlmsg_pid=3840517},....

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 31292

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=1772, nlmsg_type=RTM_NEWLINK, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1688146694, nlmsg_pid=3840517},....

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 31984

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=1716, nlmsg_type=RTM_NEWLINK, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1688146694, nlmsg_pid=3840517}, ...

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 31296

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=1428, nlmsg_type=RTM_NEWLINK, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1688146694, nlmsg_pid=3840517},...

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC},
MSG_PEEK|MSG_TRUNC) = 31280

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=1428, nlmsg_type=RTM_NEWLINK, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1688146694, nlmsg_pid=3840517}, ...

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 24124

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=1428, nlmsg_type=RTM_NEWLINK, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1688146694, nlmsg_pid=3840517},

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1688146694, nlmsg_pid=3840517}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20

3840517 sendmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=48, nlmsg_type=RTM_NEWQDISC, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK|NLM_F_EXCL|NLM_F_CREATE, nlmsg_seq=1688146695, nlmsg_pid=0}, {tcm_family=AF_UNSPEC, tcm_ifindex=if_nametoindex("tapf28271ff-34"), tcm_handle=0, tcm_parent=4294967295, tcm_info=0}, [{nla_len=12, nla_type=TCA_KIND}, "noqueue"]], iov_len=48}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 48

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 36

3840517 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=1688146695, nlmsg_pid=3840517}, {error=0, msg={nlmsg_len=48, nlmsg_type=RTM_NEWQDISC, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK|NLM_F_EXCL|NLM_F_CREATE, nlmsg_seq=1688146695, nlmsg_pid=0}}], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36

There are some abandoned discussions as pointed out in [1], [2], and [3] about MSG_PEEK and libnl to dynamically allocate more buffers, libvirt already sets MSG_PEEK at startup and sets the default buffer to 128k, but even so, messages like this are still logged anyway, what would it be the problem? Any idea?

This problem does not occur when we have few interfaces on the hypervisor, would it be a scaling problem with libnl?

Regards,
Roberto

[1] - https://listman.redhat.com/archives/libvir-list/2017-July/msg00566.html
[2] - https://<email address hidden>/msg149554.html
[3] - https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1794997

Revision history for this message
Roberto Bartzen Acosta (rbartzen) wrote :

netlink drops are observed when new VMs are created.

cat /proc/net/netlink
sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode
ff411bb5d82fc000 0 3015708447 00000551 0 0 0 2 47 187536
ff411bb5da735000 15 3749713065 00000001 0 0 0 2 3806 142072

Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

Hi Roberto,

Thanks for taking the time to report this bug and trying to make Ubuntu better.

As you mentioned, this issue does not seem to be reproducible locally (at least I can't), which makes it harder to understand what is going on. I went through the links you added, and from this LP bug:

https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1794997

This does not seem a libvirt bug but a libnl one. I'll try to find some guidance within the Canonical Server team and we will get in touch here.

Could you share how exactly are you creating the VMs?

Revision history for this message
Roberto Bartzen Acosta (rbartzen) wrote (last edit ):

Hi Lucas,

Thank you for looking at this report.

VMs are created via OpenStack. The problem occurs with Yoga and Ussuri OpenStack deployments (with versions corresponding to the ones informed in the BUG description).

Via OpenStack, these VMs are associated with Networks, Subnets, Security Groups, key pairs, etc.

In my tests, after creating a VM, I start tests with iperf from one VM to another - using the Hypervisor processor and memory.

After a few VMs created, and some load on the host I see the buffer error messages.

I believe that it is not a problem with libvirt itself, but with the memory allocation that libnl makes during the reception of messages in the netlink event callback.

I changed a memory reservation parameter for VMs and noticed that the problem no longer occurred (as far as I've tested it), but that doesn't guarantee it can't occur in the future.

sysctl vm.min_free_kbytes
vm.min_free_kbytes = 4194303

I changed this to vm.min_free_kbytes = 8192 KBytes...

Do you have any idea about additional parameters that need to be set/tuned so that libvirt - memory allocation can work correctly?

Aditional memory footprint info in the attached file.

Regards,
Roberto

Revision history for this message
Paride Legovini (paride) wrote :

Hello Roberto,

I am not the subject matter expert here, but I'd say that's something that depends on the load openstack / libvirt will have to handle. Maybe that time of tuning is documented in those projects?

I'm marking this report as Incomplete for now, to encourage further discussion, but I don't believe we have a bug in Ubuntu here.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (6.2 KiB)

Hi,
as the others said, thanks for reporting and thanks for also looking at it from the right angle, as IMHO this is correct:

> I believe that it is not a problem with libvirt itself, but with the memory allocation that libnl makes during the reception of messages in the netlink event callback.

Yeah, but the question for me right now is "is it a bug or just too much pressure with no other choice"

Just a summary (for my own brain next time I come by):
- read [1]
- all that is still true, a history trace would be
  - 1. libnl: systems grew, grow initial buffer [2]
  - 2. libnl: peeking by default to [3]
  - 3. libvirt: even more buffer + peeking [4]
- All that is in Ubuntu for quite a while, the versions you use easily have that

> I changed a memory reservation parameter for VMs and
> noticed that the problem no longer occurred (as far
> as I've tested it), but that doesn't guarantee it
> can't occur in the future.
> sysctl vm.min_free_kbytes
> vm.min_free_kbytes = 4194303
> I changed this to vm.min_free_kbytes = 8192 KBytes...

So you changed min from 4M to 8k causing the kernel to keep less (=2!) pages available for critical internal allocations. That sounds like asking for trouble in other cases. And if it really helped then that IMHO means your system might have been running just on the border of dying by an OOM.
This is only the starting value to calculate such for low/min/high watermarks per zone/node, you can see /proc/zoneinfo about the effective outcome on a huge system. I personally was always afraid of large systems to consume memory as fast as they want to then hit a sudden wall (= all those limits being low) and often wished that at least low/high would not be as small to start e.g. dedicating process cpu cycles to scanning and to swap early so it can be discarded later if needed. Anyway, I start to ramble - this isn't your problem .. :-)

This case is bad because there isn't much we could help/do here, let me outline the options I see:
- a) This is a battle of numbers, your System might exceed the defaults?
- b) There is an actual bug in libnl, maybe a new version fix to test
- c) The load too much to handle for the system
Let me go into my thoughts on each of those:

# (a) Battle of big numbers

The need for [4] was triggered by systems with many CPUs and many interfaces. You have many interfaces and what was "many cpus" in 2016 is normal now. You might have way more CPUs. In any case peeking should fix that unless if the initial response by the kernel exceeds the initial buffer (ref [1]).
So unlikely, but maybe the setup and size of your system are now enough to make even the 128k of [4] not be enough for the very first message before reallocation due to peeking.
Pro:
- If that would be our theory we could provide a test build with a bigger default buffer for you to test
Con:
- Then changing vm.min_free_kbytes should not have helped
- With peeking the size shouldn't matter (only slowdown if not big enough) at all, as it would realloc if not fitting

# (b) Bug in libnl

The wording is a bit weak, does setting the buffer + enabling peeking actually peek?
For example [5] could be read as "nl_socket_enable_msg_peek+nl_s...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote (last edit ):

Something that quickly gave me the libc realloc calls including their size argument but at relatively low overhead was:

$ git clone https://github.com/brendangregg/perf-tools.git
$ cd perf-tools
$ sudo chmod +x /lib/x86_64-linux-gnu/libc.so.6
$ sudo ./bin/uprobe -H 'p:libc:realloc %di %si' | grep libvirtd

# you could do similar with perf trace looking at brk/mmap
$ sudo apt install linux-tools-common linux-tools-generic
$ perf trace

You might need to modify this to your needs, e.g. restrict the PID/Thread to your libvirt instead of grep (have a look with the above first how/if pids fluctuate throughout your load).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.