MOS: 9.2
Package: i40e-dkms_1.6.42-1~u14.04+mos1_all.deb
Our customer faces the following problem: VMs with SR-IOV VF from a Fortville Intel NIC can not be booted occasionally causing Libvirt to die.
Here is a sample of crash:
(kernel.log)
2017-08-22T15:20:22.801344+02:00 compute-0-4 kernel: [336496.567186] i40e 0000:41:00.0: Setting MAC fa:16:3e:06:e2:86 on VF 2
2017-08-22T15:20:22.877826+02:00 compute-0-4 kernel: [336496.638967] i40e 0000:41:00.0: Reload the VF driver to make this change effective.
2017-08-22T15:20:22.877836+02:00 compute-0-4 kernel: [336496.638975] BUG: scheduling while atomic: libvirtd/48368/0x00000200
2017-08-22T15:20:22.889422+02:00 compute-0-4 kernel: [336496.647435] Modules linked in: btrfs ufs qnx4 minix ntfs msdos jfs i40evf iptable_raw xt_CT xt_tcpmss xt_multiport xt_comment br_netfilter vhost_net vhost macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables ip6table_filter ip6_tables kvm_intel 8021q garp mrp bridge stp llc bonding vfio_pci vfio_virqfd vfio_iommu_type1 vfio openvswitch iptable_filter ip_tables x_tables nf_conntrack_proto_gre nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ipmi_devintf dcdbas ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac edac_core input_leds joydev ipmi_si lpc_ich mei_me ipmi_msghandler mei shpchp wmi 8250_fintek acpi_power_meter acpi_pad mac_hid xfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 hid_generic raid0 usbhid hid ixgbe igb i2c_algo_bit i40e(OE) multipath vxlan megaraid_sas ip6_udp_tunnel dca udp_tunnel ahci ptp mdio libahci pps_core fjes linear dm_multipath [last unloaded: kvm_intel]
2017-08-22T15:20:22.889438+02:00 compute-0-4 kernel: [336496.647497] CPU: 24 PID: 48368 Comm: libvirtd Tainted: G W OE 4.4.0-81-generic # 104~14.04.1-Ubuntu
2017-08-22T15:20:22.889440+02:00 compute-0-4 kernel: [336496.647498] Hardware name: Dell Inc. PowerEdge R620/01W23F, BIOS 2.5.4 01/22/2016
2017-08-22T15:20:22.889441+02:00 compute-0-4 kernel: [336496.647500] 0000000000000000 ffff88019dac7610 ffffffff813dde9c ffff880fffb16e00
2017-08-22T15:20:22.889452+02:00 compute-0-4 kernel: [336496.647502] 0000000000016e00 ffff88019dac7620 ffffffff8118369a ffff88019dac7668
2017-08-22T15:20:22.889455+02:00 compute-0-4 kernel: [336496.647504] ffffffff81808e73 00ffffff811691f5 ffff88005ff1e200 ffff88019dac8000
2017-08-22T15:20:22.889456+02:00 compute-0-4 kernel: [336496.647506] Call Trace:
2017-08-22T15:20:22.889458+02:00 compute-0-4 kernel: [336496.647511] [<ffffffff813dde9c>] dump_stack+0x63/0x87
2017-08-22T15:20:22.889459+02:00 compute-0-4 kernel: [336496.647514] [<ffffffff8118369a>] __schedule_bug+0x4b/0x59
2017-08-22T15:20:22.889460+02:00 compute-0-4 kernel: [336496.647518] [<ffffffff81808e73>] __schedule+0x8b3/0x980
2017-08-22T15:20:22.889461+02:00 compute-0-4 kernel: [336496.647520] [<ffffffff81808f75>] schedule+0x35/0x80
2017-08-22T15:20:22.889462+02:00 compute-0-4 kernel: [336496.647523] [<ffffffff8180bc6c>] schedule_hrtimeout_range_clock+0xac/0x130
2017-08-22T15:20:22.889463+02:00 compute-0-4 kernel: [336496.647526] [<ffffffff810ea120>] ? hrtimer_init+0x180/0x180
2017-08-22T15:20:22.889464+02:00 compute-0-4 kernel: [336496.647528] [<ffffffff8180bc60>] ? schedule_hrtimeout_range_clock+0xa0/0x130
2017-08-22T15:20:22.889464+02:00 compute-0-4 kernel: [336496.647530] [<ffffffff8180bd03>] schedule_hrtimeout_range+0x13/0x20
2017-08-22T15:20:22.889464+02:00 compute-0-4 kernel: [336496.647532] [<ffffffff8180b6f0>] usleep_range+0x40/0x50
2017-08-22T15:20:22.889465+02:00 compute-0-4 kernel: [336496.647541] [<ffffffffc01263e7>] i40e_asq_send_command+0x477/0x760 [i40e]
2017-08-22T15:20:22.889466+02:00 compute-0-4 kernel: [336496.647545] [<ffffffffc0120080>] ? i40e_get_priv_flags+0x40/0xc0 [i40e]
2017-08-22T15:20:22.889467+02:00 compute-0-4 kernel: [336496.647551] [<ffffffffc0128dfd>] i40e_aq_update_vsi_params+0x5d/0x90 [i40e]
2017-08-22T15:20:22.889467+02:00 compute-0-4 kernel: [336496.647555] [<ffffffffc0115e14>] i40e_vsi_add_pvid+0x124/0x1b0 [i40e]
2017-08-22T15:20:22.889468+02:00 compute-0-4 kernel: [336496.647560] [<ffffffffc013cf4c>] i40e_ndo_set_vf_port_vlan+0x1bc/0x3b0 [i40e]
2017-08-22T15:20:22.889468+02:00 compute-0-4 kernel: [336496.647563] [<ffffffff817191fb>] do_setlink+0x1cb/0xb20
2017-08-22T15:20:22.889469+02:00 compute-0-4 kernel: [336496.647566] [<ffffffff81409e73>] ? nla_parse+0xa3/0x100
2017-08-22T15:20:22.889470+02:00 compute-0-4 kernel: [336496.647568] [<ffffffff81719c15>] rtnl_setlink+0xc5/0x120
2017-08-22T15:20:22.889471+02:00 compute-0-4 kernel: [336496.647571] [<ffffffff817181c5>] rtnetlink_rcv_msg+0x95/0x240
2017-08-22T15:20:22.889471+02:00 compute-0-4 kernel: [336496.647574] [<ffffffff816f426e>] ? __alloc_skb+0x7e/0x280
2017-08-22T15:20:22.889472+02:00 compute-0-4 kernel: [336496.647575] [<ffffffff81718130>] ? rtnetlink_rcv+0x30/0x30
2017-08-22T15:20:22.889472+02:00 compute-0-4 kernel: [336496.647579] [<ffffffff8173a427>] netlink_rcv_skb+0xa7/0xc0
2017-08-22T15:20:22.889473+02:00 compute-0-4 kernel: [336496.647580] [<ffffffff81718128>] rtnetlink_rcv+0x28/0x30
2017-08-22T15:20:22.889473+02:00 compute-0-4 kernel: [336496.647582] [<ffffffff81739ded>] netlink_unicast+0x15d/0x230
2017-08-22T15:20:22.889474+02:00 compute-0-4 kernel: [336496.647584] [<ffffffff8173a1d9>] netlink_sendmsg+0x319/0x390
2017-08-22T15:20:22.889475+02:00 compute-0-4 kernel: [336496.647586] [<ffffffff816ebb08>] sock_sendmsg+0x38/0x50
2017-08-22T15:20:22.889475+02:00 compute-0-4 kernel: [336496.647588] [<ffffffff816ec430>] ___sys_sendmsg+0x270/0x290
2017-08-22T15:20:22.889476+02:00 compute-0-4 kernel: [336496.647592] [<ffffffff81384f38>] ? aa_sk_perm+0x78/0x230
2017-08-22T15:20:22.889476+02:00 compute-0-4 kernel: [336496.647594] [<ffffffff816ecd82>] __sys_sendmsg+0x42/0x80
2017-08-22T15:20:22.889477+02:00 compute-0-4 kernel: [336496.647596] [<ffffffff816ecdd2>] SyS_sendmsg+0x12/0x20
2017-08-22T15:20:22.889478+02:00 compute-0-4 kernel: [336496.647598] [<ffffffff8180c7f6>] entry_SYSCALL_64_fastpath+0x16/0x75
2017-08-22T15:20:22.889478+02:00 compute-0-4 kernel: [336496.647603] i40e 0000:41:00.0: Setting VLAN 100, QOS 0x0 on VF 2
2017-08-22T15:20:22.889479+02:00 compute-0-4 kernel: [336496.647674] ------------[ cut here ]------------
Canonical investigated the crashes and determined that they are caused by a bug in i40e driver.
Could you please update i40e driver to the latest version and provide it as a DKMS package?
sla2 for 9.0-updates