Comment 0 for bug 1755268

Revision history for this message
kvaps (kvapss) wrote : Kernel panic when using KVM and Mellanox OFED driver (bonding and sriov enabled)

##### System information #####

    # uname -a
    Linux m5c37 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

    # cat /etc/os-release
    NAME="Ubuntu"
    VERSION="16.04.4 LTS (Xenial Xerus)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 16.04.4 LTS"
    VERSION_ID="16.04"
    HOME_URL="http://www.ubuntu.com/"
    SUPPORT_URL="http://help.ubuntu.com/"
    BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
    VERSION_CODENAME=xenial
    UBUNTU_CODENAME=xenial

    # ethtool -i eno1
    driver: mlx4_en
    version: 4.3-1.0.1
    firmware-version: 2.42.5004
    expansion-rom-version:
    bus-info: 0000:11:00.0
    supports-statistics: yes
    supports-test: yes
    supports-eeprom-access: no
    supports-register-dump: no
    supports-priv-flags: yes

    # ethtool -i bond0
    driver: bonding
    version: 3.7.1
    firmware-version: 2
    expansion-rom-version:
    bus-info:
    supports-statistics: no
    supports-test: no
    supports-eeprom-access: no
    supports-register-dump: no
    supports-priv-flags: no

    # ethtool -i vmbr0
    driver: bridge
    version: 2.3
    firmware-version: N/A
    expansion-rom-version:
    bus-info: N/A
    supports-statistics: no
    supports-test: no
    supports-eeprom-access: no
    supports-register-dump: no
    supports-priv-flags: no

Mellanox driver was installed from
http://content.mellanox.com/ofed/MLNX_OFED-4.3-1.0.1.0/MLNX_OFED_LINUX-4.3-1.0.1.0-ubuntu16.04-x86_64.tgz

    ./mlnxofedinstall --kernel 4.13.0-36-generic --without-dkms --add-kernel-support

##### Steps for reproduce #####

This is my /etc/network/interfaces file:

    auto lo
    iface lo inet loopback

    auto openibd
    iface openibd inet manual
            pre-up /etc/init.d/openibd start
            pre-down /etc/init.d/openibd force-stop

    auto bond0
    iface bond0 inet manual
            pre-up ip link add bond0 type bond || true
            pre-up ip link set bond0 down
            pre-up ip link set bond0 type bond mode active-backup arp_interval 2000 arp_ip_target 10.36.0.1 arp_validate 3 primary eno1
            pre-up ip link set eno1 down
            pre-up ip link set eno1d1 down
            pre-up ip link set eno1 master bond0
            pre-up ip link set eno1d1 master bond0
            pre-up ip link set bond0 up
            pre-down ip link del bond0

    auto vmbr0
    iface vmbr0 inet static
            address 10.36.128.217
            netmask 255.255.0.0
            gateway 10.36.0.1
            bridge_ports bond0
            bridge_stp off
            bridge_fd 0

I execute these commands:

    wget http://dl-cdn.alpinelinux.org/alpine/v3.7/releases/x86_64/alpine-virt-3.7.0-x86_64.iso -O alpine.iso
    qemu-system-x86_64 -boot d -cdrom alpine.iso -m 512 -nographic -device e1000,netdev=net0 -netdev tap,id=net0

And after few moments I have hang kernel, and theese messages in console:

    [74390.187908] mlx4_core 0000:11:00.0: bond for multifunction failed
    [74390.486476] mlx4_en: eno1d1: Fail to bond device
    [74390.750758] cache_from_obj: Wrong slab cache. kmalloc-256 but object is from kmalloc-192
    [74391.152326] general protection fault: 0000 [#1] SMP PTI
    [74391.410424] cache_from_obj: Wrong slab cache. kmalloc-256 but object is from kmalloc-192

kernel trace log in attachment

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.13.0-36-generic 4.13.0-36.40~16.04.1
ProcVersionSignature: Ubuntu 4.13.0-36.40~16.04.1-generic 4.13.13
Uname: Linux 4.13.0-36-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.15
Architecture: amd64
Date: Mon Mar 12 19:59:16 2018
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=C
 SHELL=/bin/bash
SourcePackage: linux-hwe
UpgradeStatus: No upgrade log present (probably fresh install)