nl_cache_refill; rtnl_neigh_get fail to find neighbors in cache

Bug #1312419 reported by Eyal Perry on 2014-04-24
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
libnl3 (Ubuntu)
Medium
Unassigned
Nominated for Trusty by Mathew Hodson

Bug Description

Retrieving information about configured neighbors fail
my application is running the following procedure:

neigh = NULL;
cache = rtnl_neigh_alloc_cache(sk);
while(NULL == neigh){
    nl_cache_refill(sk, chache);
    neigh = rtnl_neigh_get(cache, ifindex, dst_addr);
}

with libnl3 3.2.21-1 this loop will never end, even when adding a static arp entry.
However, with libnl-3.2.24 the neighbor lookup succeed.

additional general info:
$ lsb_release -rd
Description: Ubuntu 14.04 LTS
Release: 14.04
$ uname -r
3.13.0-24-generic
$ apt-cache policy libnl-genl-3-200 libnl-route-3-200
libnl-genl-3-200:
  Installed: 3.2.21-1
  Candidate: 3.2.21-1
  Version table:
 *** 3.2.21-1 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status
libnl-route-3-200:
  Installed: 3.2.21-1
  Candidate: 3.2.21-1
  Version table:
 *** 3.2.21-1 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status

Brian Fromme (brianfromme) wrote :

Can you please attach the test case(s) you are using to recreate this bug?

Brian Fromme (brianfromme) wrote :

Also, can you explain what you expect to see as results from these test cases?

Eyal Perry (eyalpe) wrote :

what i checked is a small application called udaddy which is part of the librdmacm examples.
But this information I think is useless since the method which actually reveled this issue is part of my own built private version of libibverbs.
however I'm intend to write you a small app to demonstrate this faulty behavior - it actually should should be almost the same as the code snippet + some setup routines.

Micah Cowan (micahcowan) wrote :

I can confirm this issue in Trusty (libnl3 version 3.2.21-1), relative to Precise (3.2.3-2ubuntu2). I can install the Precise binaries for:
  libnl-3-200
  libnl-3-dev
  libnl-genl-3-200
  libnl-genl-3-dev
  libnl-route-3-200
  libnl-route-3-dev
onto a Trusty system, and the issue goes away.

To reproduce, use the attached netlink-test.c and build/run like this:
  gcc -I/usr/include/libnl3 netlink-test.c -lnl-3 -lnl-route-3 -o netlink-test
  ./netlink-test 1.2.3.4

Or, for much more debugging output, run like:
  NLDBG=4 ./netlink-test 1.2.3.4

where 1.2.3.4 is the IP address of a neighboring machine (I used the output of
  route | sed -n '3{p;q}' | awk '{print $2}'
, which gives my router).

I recommend you try it in a Precise chroot (or just with the Precise binaries listed above installed) first, so you can verify you have a working IP address for the test. Then try it in Trusty and confirm the failure.

With libnl3 binaries from Precise, I get:
  calling nl_socket_alloc
  calling rtnl_neigh_alloc_cache
  calling rtnl_neigh_get
  calling rtnl_neigh_get_lladdr
  hwLen: 6

But with Trusty's libnl3 stuff I get:
  calling nl_socket_alloc
  calling rtnl_neigh_alloc_cache
  calling rtnl_neigh_get
  rtnl_neigh_get returned NULL

The attached program code is not my authorship, but Carl Soeder, my coworker at Akamai Technologies.

Since the offending version is a *-1 release, the problem is not Ubuntu-specific. Since, also, Debian has not added any patches that change source code, the problem is not Debian-specific either. I'll open bugs with Debian and the libnl folks, and link to them here.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libnl3 (Ubuntu):
status: New → Confirmed
Micah Cowan (micahcowan) wrote :

I've confirmed that the issue is fixed in Utopic (3.2.24-2) (I didn't know whether Utopic-built binaries work directly - I built from the Utopic source package). Looks like a backport is what's called for.

Obviously, I won't be opening upstream bugs after all, since it's already been fixed for both of them at this point.

Changed in libnl3 (Ubuntu):
assignee: nobody → Rafael David Tinoco (inaddy)
Rafael David Tinoco (inaddy) wrote :

Kamal,

As we talked about, I'm making the following PPA available:

https://launchpad.net/~inaddy/+archive/ubuntu/lp1312419

It contains Trusty version + commit:

From 3700cc1ad3a3b507848deb401b9d0f41ff7010bb Mon Sep 17 00:00:00 2001
From: Thomas Graf <email address hidden>
Date: Fri, 1 Feb 2013 10:41:45 +0100
Subject: [PATCH] neigh: Remove check for AF_UNSPEC in rtnl_neigh_get()

Please test this PPA and provide feedback if it fixes the problem. I'll suggest SRU as soon as I get confirmation.

Thank you

Best Regards

Rafael

Changed in libnl3 (Ubuntu):
status: Confirmed → In Progress
Talat Batheesh (talat-b87) wrote :

Hi Rafael,

 Unfortunately, after installing the PPA, the udaddy test still fail on Ubuntu 14.04.3 LTS.
Tested with ConnectX3-pro.

root # udaddy -b 1.1.1.1
udaddy: starting server
receiving data transfers

^C
root# uname -r
3.13.0-68-generic
root# cat /etc/issue
Ubuntu 14.04.3 LTS \n \l

Client side
# udaddy -s 1.1.1.1 -b 1.1.1.2
udaddy: starting client
udaddy: connecting
udaddy: failure creating address handle
test complete
return status -1

As a reference, may we need to add the following patches that mentioned in Bug #1364442

Ubuntu is missing some code to make this work. The issue is that libmlx4 and libibverbs is missing the code for RoCE UD neighboor code.
Here I found the series of patches needed:
For libmlx4:
[PATCH libmlx4 V4 0/2] Add RoCE IP based addressing support for UD QPs
[PATCH libmlx4 V4 1/2] Add ibv_query_port caching support
[PATCH libmlx4 V4 2/2] Add RoCE IP based addressing support for UD QPs
For libibverbs:
[PATCH libibverbs V5 0/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution
[PATCH libibverbs V5 1/2] Add ibv_port_cap_flags
[PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2 resolution

Yours,
Talat

Download full text (3.7 KiB)

Awesome. Thank you Talat. This was actually a simple example I was giving
to Kamal, explaining to him how the SRU process worked and giving him a
real example on how to provide a debdiff and create a PPA. So, I really did
not expect the fix to be "good" but to give him so clue how things worked.

It is good that you are catching up with bugs. I'll compile the testcase,
backport the commits you are mentioning and provide you a PPA to test, okay
? Please expect this to take sometime (a week) because of my Ubuntu
Advantage workload.

Thank you !!

Cheers

Rafael

On Sun, Nov 22, 2015 at 11:41 AM, Talat Batheesh <<email address hidden>
> wrote:

> Hi Rafael,
>
> Unfortunately, after installing the PPA, the udaddy test still fail on
> Ubuntu 14.04.3 LTS.
> Tested with ConnectX3-pro.
>
>
> root # udaddy -b 1.1.1.1
> udaddy: starting server
> receiving data transfers
>
> ^C
> root# uname -r
> 3.13.0-68-generic
> root# cat /etc/issue
> Ubuntu 14.04.3 LTS \n \l
>
>
> Client side
> # udaddy -s 1.1.1.1 -b 1.1.1.2
> udaddy: starting client
> udaddy: connecting
> udaddy: failure creating address handle
> test complete
> return status -1
>
> As a reference, may we need to add the following patches that mentioned in
> Bug #1364442
> “
> Ubuntu is missing some code to make this work. The issue is that libmlx4
> and libibverbs is missing the code for RoCE UD neighboor code.
> Here I found the series of patches needed:
> For libmlx4:
> [PATCH libmlx4 V4 0/2] Add RoCE IP based addressing support for UD QPs
> [PATCH libmlx4 V4 1/2] Add ibv_query_port caching support
> [PATCH libmlx4 V4 2/2] Add RoCE IP based addressing support for UD QPs
> For libibverbs:
> [PATCH libibverbs V5 0/2] Use neighbour lookup for RoCE UD QPs Eth L2
> resolution
> [PATCH libibverbs V5 1/2] Add ibv_port_cap_flags
> [PATCH libibverbs V5 2/2] Use neighbour lookup for RoCE UD QPs Eth L2
> resolution
> ”
>
> Yours,
> Talat
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1312419
>
> Title:
> nl_cache_refill; rtnl_neigh_get fail to find neighbors in cache
>
> Status in libnl3 package in Ubuntu:
> In Progress
>
> Bug description:
> Retrieving information about configured neighbors fail
> my application is running the following procedure:
>
> neigh = NULL;
> cache = rtnl_neigh_alloc_cache(sk);
> while(NULL == neigh){
> nl_cache_refill(sk, chache);
> neigh = rtnl_neigh_get(cache, ifindex, dst_addr);
> }
>
> with libnl3 3.2.21-1 this loop will never end, even when adding a static
> arp entry.
> However, with libnl-3.2.24 the neighbor lookup succeed.
>
> additional general info:
> $ lsb_release -rd
> Description: Ubuntu 14.04 LTS
> Release: 14.04
> $ uname -r
> 3.13.0-24-generic
> $ apt-cache policy libnl-genl-3-200 libnl-route-3-200
> libnl-genl-3-200:
> Installed: 3.2.21-1
> Candidate: 3.2.21-1
> Version table:
> *** 3.2.21-1 0
> 500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64
> Packages
> 100 /var/lib/dpkg/status
> libnl-route-3-200:
> Installed: 3.2.21-1
> Candidate: 3.2.21-1
> Version table:
> *** 3.2.21-1 ...

Read more...

Talat Batheesh (talat-b87) wrote :

Hi Rafael,

After updating libibverbs and libmlx4 (upstream version) from the openfabrics, the issue solved.

git clone git://git.kernel.org/pub/scm/libs/infiniband/libibverbs.git libibverbs
git clone git://git.openfabrics.org/~yishaih/libmlx4.git

# apt-get install libnl-3-dev libnl-route-3-dev

we must to update the libibverbs abd libmlx4 to updated version.

Yours,
Talat

tags: added: trusty
tags: added: testcase
Changed in libnl3 (Ubuntu):
importance: Undecided → Medium
Changed in libnl3 (Ubuntu):
assignee: Rafael David Tinoco (inaddy) → nobody
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in libnl3 (Ubuntu):
assignee: Rafael David Tinoco (inaddy) → nobody
Abhik (coldproof) on 2018-04-27
no longer affects: libnl
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers