Crash when using IPsec VTI interfaces on 4.15 and 4.18.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
Medium
|
Unassigned |
Bug Description
Hey!
After upgrading a few VPN to 4.15.0-38.41 (either Xenial or Bionic), we get random crashes. This also happens with the 4.18 in bionic-proposed. These crashes didn't happen with 4.4 from Xenial. Here is a stack trace:
[ 31.154360] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
[ 31.162233] PGD 0 P4D 0
[ 31.164786] Oops: 0000 [#1] SMP PTI
[ 31.168291] CPU: 5 PID: 42 Comm: ksoftirqd/5 Not tainted 4.18.0-11-generic #12~18.04.1-Ubuntu
[ 31.176854] Hardware name: Supermicro Super Server/
[ 31.184980] RIP: 0010:vti_
[ 31.189962] Code: 8b 44 24 70 0f c8 89 87 b4 00 00 00 48 8b 86 20 05 00 00 8b 80 f8 14 00 00 85 c0 75 05 48 85 d2 74 0e 48 8b 43 58 48 83 e0 fe <f6> 40 38 04 74 7d 44 89 b3 b4 00 00 00 49 8b 44 24 20 48 39 86 20
[ 31.208916] RSP: 0018:ffffbc6183
[ 31.214160] RAX: 0000000000000000 RBX: ffff9a3504964a00 RCX: 0000000000000002
[ 31.221328] RDX: ffff9a351add4080 RSI: ffff9a351aa08000 RDI: ffff9a3504964a00
[ 31.228485] RBP: ffffbc61832e3940 R08: 0000000000000004 R09: ffffffffc0aa612b
[ 31.235643] R10: 0008f09b99881884 R11: 1884bd4e2d6b1fac R12: ffff9a3507b31900
[ 31.242803] R13: ffff9a3507b31000 R14: 0000000000000000 R15: ffff9a3504964a00
[ 31.249964] FS: 000000000000000
[ 31.258077] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 31.263848] CR2: 0000000000000038 CR3: 000000041a40a003 CR4: 00000000003606e0
[ 31.271004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 31.278163] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 31.285320] Call Trace:
[ 31.287789] xfrm4_rcv_
[ 31.291297] xfrm_input+
[ 31.294807] vti_input+
[ 31.298926] vti_rcv+0x33/0x3c [ip_vti]
[ 31.302783] xfrm4_esp_
[ 31.306375] ip_local_
[ 31.310923] ip_local_
[ 31.314775] ? ip_rcv_
[ 31.318718] ip_rcv_
[ 31.322486] ip_rcv+0x28f/0x360
[ 31.325655] ? inet_del_
[ 31.329686] __netif_
[ 31.334413] ? kmem_cache_
[ 31.338532] ? __build_
[ 31.342128] __netif_
[ 31.346244] ? __netif_
[ 31.350536] netif_receive_
[ 31.355263] napi_gro_
[ 31.359141] mlx5e_handle_
[ 31.364476] ? skb_release_
[ 31.368430] mlx5e_poll_
[ 31.373432] mlx5e_napi_
[ 31.378333] ? __switch_
[ 31.382270] ? __switch_
[ 31.386214] ? __switch_
[ 31.391056] ? __switch_
[ 31.395905] ? __switch_
[ 31.400743] net_rx_
[ 31.405379] ? __switch_
[ 31.409887] __do_softirq+
[ 31.414448] run_ksoftirqd+
[ 31.418862] smpboot_
[ 31.423700] kthread+0x121/0x140
[ 31.427701] ? sort_range+
[ 31.432040] ? kthread_
[ 31.437816] ret_from_
[ 31.442219] Modules linked in: esp6 authenc echainiv xfrm6_mode_tunnel xfrm4_mode_tunnel xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key xfrm_algo ip_vti ip_tunnel ip6_vti ip6_tunnel tunnel6 8021q garp mrp stp llc bonding ipt_REJECT nf_reject_ipv4 nfnetlink_log n
fnetlink xt_NFLOG xt_hl xt_limit xt_nat xt_TCPMSS xt_HL xt_comment xt_tcpudp xt_multiport xt_conntrack iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_connmark xt_mark iptable_mangle xt_CT nf_conntrack xt_addrtype iptable_raw bpfilter ipmi_ssif gpio_
ich intel_rapl sb_edac x86_pkg_
[ 31.519488] ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_
_core raid1 hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ast pcbc ttm drm_kms_helper aesni_intel syscopyarea aes_x86_64 sysfillrect mxm_wmi crypto_simd sysimgblt cryptd glue_helper fb_sys_fops mlx5_core ixgbe igb mpt3sas drm ahci tls libahci i2c_algo_bit m
lxfw raid_class dca devlink mdio scsi_transport_sas wmi
[ 31.578877] CR2: 0000000000000038
[ 31.583249] ---[ end trace c4bada38847a0075 ]---
Upgrading to mainline 4.18.17 seems to solve the issue. It's difficult to bissect as it doesn't happen often. 4.18.17 contains c473a489d409896
Hardware is Mellanox ConnectX-4 Lx (no ESP offload).
May I suggest upgrade 4.18 to 4.18.17 and to backport these two patches to Bionic 4.15?
Thanks.
Commit fdb06c787b34fd3 97f28f515105627 307d615025 title is "xfrm: reset transport header back to network header after all input transforms ahave been applied"