Comment 3 for bug 1988018

Revision history for this message
Frode Nordahl (fnordahl) wrote :

I think they are two distinct problems, and hopefully we would get a comment from NVIDIA/Mellanox as the statements in bug 2020409 contradicts the documentation [0] the current Netplan implementation is based on.

Martin may have more details, but wanted to mention that one of our suspected culprits is with how Netplan lays out the udev rules for VF activation [1]:
1) It takes a long time when many are configured, as opposed to the expectation in the comment.
2) The process appears to be executed multiple times, which combined with the fact it takes a long time in turn may end up clashing with both the networking backends creation of the bond and the systemd unit rebinding the VFs.

Bug 2020409 also raises the question if there are any bond/LAG related system bringup quirks for systems using only Scalable Functions (SF) or a combination of SFs and VFs. I have yet to see any documentation about that.

0: https://enterprise-support.nvidia.com/s/article/Configuring-VF-LAG-using-TC
1: https://github.com/canonical/netplan/blob/a7e4be03918c986020650743cb6cf0934696ef0c/src/sriov.c#L107-L112