PCI Call Traces hw csum failure in dmesg with 4.4.0-2-generic

Bug #1544978 reported by bugproxy on 2016-02-12
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
High
Unassigned
linux (Ubuntu)
Wishlist
Tim Gardner
Xenial
High
Unassigned
Yakkety
Wishlist
Tim Gardner

Bug Description

== Comment: #0 - Helmut Grauer <email address hidden> - 2016-02-12 03:00:03 ==
Hi
 getting the following Call Traces when PCI interfaces will be configured
[ 246.051566] enp0s0: hw csum failure
[ 246.051571] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G E 4.4.0-2-generic #16-Ubuntu
[ 246.051573] 00000000f9793778 00000000f9793808 0000000000000002 0000000000000000
                      00000000f97938a8 00000000f9793820 00000000f9793820 0000000000114182
                      0000000000000166 000000000091e9ca 000000000000000a 000000000000000a
                      00000000f9793868 00000000f9793808 0000000000000000 00000000f9d38000
                      0000000000000000 0000000000114182 00000000f9793808 00000000f9793868
[ 246.051581] Call Trace:
[ 246.051589] ([<00000000001140b8>] show_trace+0x140/0x148)
[ 246.051590] [<0000000000114136>] show_stack+0x76/0xe8
[ 246.051595] [<00000000005172d6>] dump_stack+0x6e/0x90
[ 246.051599] [<0000000000673500>] __skb_checksum_complete+0xd0/0xd8
[ 246.051605] [<000000000076ae24>] icmpv6_rcv+0x124/0x500
[ 246.051608] [<0000000000746e60>] ip6_input_finish+0x170/0x4e0
[ 246.051610] [<000000000074775c>] ip6_input+0x4c/0xd0
[ 246.051611] [<00000000007478ee>] ip6_mc_input+0x10e/0x280
[ 246.051612] [<0000000000747538>] ipv6_rcv+0x368/0x540
[ 246.051616] [<000000000067e5d4>] __netif_receive_skb_core+0x6fc/0xaf8
[ 246.051618] [<0000000000681a56>] netif_receive_skb_internal+0x3e/0xd8
[ 246.051619] [<0000000000682314>] napi_gro_frags+0x17c/0x208
[ 246.051627] [<000003ff805f3a2c>] mlx4_en_process_rx_cq+0x8b4/0xbd0 [mlx4_en]
[ 246.051630] [<000003ff805f3e62>] mlx4_en_poll_rx_cq+0xc2/0x1a0 [mlx4_en]
[ 246.051631] [<00000000006839e2>] net_rx_action+0x2a2/0x418
[ 246.051635] [<0000000000162726>] __do_softirq+0x156/0x300
[ 246.051637] [<0000000000162ace>] irq_exit+0xd6/0xf8
[ 246.051641] [<000000000010cc5a>] do_IRQ+0x6a/0x88
[ 246.051644] [<00000000007a99c2>] io_int_handler+0x112/0x220
[ 246.051646] [<0000000000104856>] enabled_wait+0x56/0xa8
[ 246.051649] ([<0000000000ccb888>] cpu_dead_idle+0x0/0x8)
[ 246.051651] [<0000000000104b5a>] arch_cpu_idle+0x32/0x48
[ 246.051669] [<00000000001a8198>] cpu_startup_entry+0x200/0x278
[ 246.051674] [<00000000001156ba>] smp_start_secondary+0xea/0xf8
[ 246.051679] [<00000000007a9f42>] restart_int_handler+0x62/0x78
[ 246.051680] [<0000000000000000>] (null)

bugproxy (bugproxy) on 2016-02-12
tags: added: architecture-s39064 bugnameltc-137072 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
dann frazier (dannf) on 2016-02-12
affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High
bugproxy (bugproxy) on 2016-02-12
tags: added: targetmilestone-inin1604
removed: targetmilestone-inin---

------- Comment From <email address hidden> 2016-04-18 05:46 EDT-------
This problem is know by Mellanox, where the problem is understood and a fix available.
But not upstream posted.

This can only be solved Canonical can get this upstream fix from Mellanox!

dann frazier (dannf) on 2016-04-19
Changed in ubuntu-z-systems:
importance: Undecided → High
Dimitri John Ledkov (xnox) wrote :

@hws

Is there a contact at Mellanox, or any Mellanox specific linux trees or mailing lists where this fix is available? Could you please put us in touch? If we don't have a fix we cannot schedule to include it in Y-series and then SRU as per SRU cadence into xenial. For the time being, this bug will be marked incomplete until a fix is available to us. Please expect this bug report to be fixed in an SRU update to the kernel, the earliest.

Regards,

Dimitri.

Changed in linux (Ubuntu):
status: New → Incomplete
importance: High → Wishlist
Changed in ubuntu-z-systems:
status: New → Incomplete
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-04-20 09:05 EDT-------
We are still awaiting the link of the upstream commit or mailing list.

tags: added: targetmilestone-inin16041
removed: targetmilestone-inin1604
Talat Batheesh (talat-b87) wrote :

Hi,
This upstream commit should fix the bug

commit 82d69203df634b4dfa765c94f60ce9482bcc44d6
Author: Daniel Jurgens <email address hidden>
Date: Wed May 4 15:00:33 2016 +0300

    net/mlx4_en: Fix endianness bug in IPV6 csum calculation

    Use htons instead of unconditionally byte swapping nexthdr. On a little
    endian systems shifting the byte is correct behavior, but it results in
    incorrect csums on big endian architectures.

Tim Gardner (timg-tpi) on 2016-05-09
Changed in linux (Ubuntu):
status: Incomplete → In Progress
Changed in linux (Ubuntu Xenial):
status: New → In Progress
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Tim Gardner (timg-tpi) on 2016-05-09
Changed in linux (Ubuntu Yakkety):
assignee: Canonical Kernel Team (canonical-kernel-team) → Tim Gardner (timg-tpi)
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-05-10 04:14 EDT-------
Hi
could you please implement this bug fix for Mellanux Call Trace Problem which is mentioned in Comment 11. I tried it on internal driver and it works.

Greetings Helmut

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Mathew Hodson (mhodson) on 2016-05-12
Changed in linux (Ubuntu Xenial):
importance: Undecided → High
milestone: none → ubuntu-16.04.1
Changed in ubuntu-z-systems:
status: Incomplete → Fix Committed
Kamal Mostafa (kamalmostafa) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-05-23 07:32 EDT-------
checked with kernel
root@s83lp18:~# uname -a

dmesg show no hw checksum failure message anymore
1636620k SSFS
[ 9.820659] 8021q: 802.1Q VLAN Support v1.8
[ 9.824597] mlx4_en: enP1s41: frag:0 - size:1522 prefix:0 stride:1536
[ 10.204678] audit: type=1400 audit(1464002873.505:2): apparmor="STATUS" opera tion="profile_load" profile="unconfined" name="/usr/sbin/tcpdump" pid=2127 comm= "apparmor_parser"
[ 10.205168] audit: type=1400 audit(1464002873.505:3): apparmor="STATUS" opera tion="profile_load" profile="unconfined" name="/sbin/dhclient" pid=2125 comm="ap parmor_parser"
[ 10.205174] audit: type=1400 audit(1464002873.505:4): apparmor="STATUS" opera tion="profile_load" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-c lient.action" pid=2125 comm="apparmor_parser"
[ 10.205177] audit: type=1400 audit(1464002873.505:5): apparmor="STATUS" opera tion="profile_load" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-h elper" pid=2125 comm="apparmor_parser"
[ 10.205180] audit: type=1400 audit(1464002873.505:6): apparmor="STATUS" opera tion="profile_load" profile="unconfined" name="/usr/lib/connman/scripts/dhclient -script" pid=2125 comm="apparmor_parser"
[ 10.331622] IPv6: ADDRCONF(NETDEV_UP): enP1s41: link is not ready
[ 10.331630] 8021q: adding VLAN 0 to HW filter on device enP1s41
[ 10.334389] mlx4_en: enP1s41: Link Up
[ 10.337072] IPv6: ADDRCONF(NETDEV_CHANGE): enP1s41: link becomes ready
[ 10.340982] 8021q: adding VLAN 0 to HW filter on device enccw0.0.f500
[ 11.370898] mlx4_en: enP1s41: frag:0 - size:1534 prefix:0 stride:1536
[ 11.370902] mlx4_en: enP1s41: frag:1 - size:4096 prefix:1534 stride:4096
[ 11.370903] mlx4_en: enP1s41: frag:2 - size:3392 prefix:5630 stride:3584

Helmut

Launchpad Janitor (janitor) wrote :
Download full text (16.9 KiB)

This bug was fixed in the package linux - 4.4.0-23.41

---------------
linux (4.4.0-23.41) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1582431

  * zfs: disable module checks for zfs when cross-compiling (LP: #1581127)
    - [Packaging] disable zfs module checks when cross-compiling

  * Xenial update to v4.4.10 stable release (LP: #1580754)
    - Revert "UBUNTU: SAUCE: (no-up) ACPICA: Dispatcher: Update thread ID for
      recursive method calls"
    - Revert "UBUNTU: SAUCE: nbd: ratelimit error msgs after socket close"
    - Revert: "powerpc/tm: Check for already reclaimed tasks"
    - RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips
    - ipvs: handle ip_vs_fill_iph_skb_off failure
    - ipvs: correct initial offset of Call-ID header search in SIP persistence
      engine
    - ipvs: drop first packet to redirect conntrack
    - mfd: intel-lpss: Remove clock tree on error path
    - nbd: ratelimit error msgs after socket close
    - ata: ahci_xgene: dereferencing uninitialized pointer in probe
    - mwifiex: fix corner case association failure
    - CNS3xxx: Fix PCI cns3xxx_write_config()
    - clk-divider: make sure read-only dividers do not write to their register
    - soc: rockchip: power-domain: fix err handle while probing
    - clk: rockchip: free memory in error cases when registering clock branches
    - clk: meson: Fix meson_clk_register_clks() signature type mismatch
    - clk: qcom: msm8960: fix ce3_core clk enable register
    - clk: versatile: sp810: support reentrance
    - clk: qcom: msm8960: Fix ce3_src register offset
    - lpfc: fix misleading indentation
    - ath9k: ar5008_hw_cmn_spur_mitigate: add missing mask_m & mask_p
      initialisation
    - mac80211: fix statistics leak if dev_alloc_name() fails
    - tracing: Don't display trigger file for events that can't be enabled
    - MD: make bio mergeable
    - Minimal fix-up of bad hashing behavior of hash_64()
    - mm, cma: prevent nr_isolated_* counters from going negative
    - mm/zswap: provide unique zpool name
    - ARM: EXYNOS: Properly skip unitialized parent clock in power domain on
    - ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel
    - xen: Fix page <-> pfn conversion on 32 bit systems
    - xen/balloon: Fix crash when ballooning on x86 32 bit PAE
    - xen/evtchn: fix ring resize when binding new events
    - HID: wacom: Add support for DTK-1651
    - HID: Fix boot delay for Creative SB Omni Surround 5.1 with quirk
    - Input: zforce_ts - fix dual touch recognition
    - proc: prevent accessing /proc/<PID>/environ until it's ready
    - mm: update min_free_kbytes from khugepaged after core initialization
    - batman-adv: fix DAT candidate selection (must use vid)
    - batman-adv: Check skb size before using encapsulated ETH+VLAN header
    - batman-adv: Fix broadcast/ogm queue limit on a removed interface
    - batman-adv: Reduce refcnt of removed router when updating route
    - writeback: Fix performance regression in wb_over_bg_thresh()
    - MAINTAINERS: Remove asterisk from EFI directory names
    - x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO
    - ARM: cpuidle: Pass on arm_cpuidle_s...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Kamal Mostafa (kamalmostafa) wrote :

Verified per Comment #37.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.4.0-24.43

---------------
linux (4.4.0-24.43) xenial; urgency=low

  [ Kamal Mostafa ]

  * CVE-2016-1583 (LP: #1588871)
    - ecryptfs: fix handling of directory opening
    - SAUCE: proc: prevent stacking filesystems on top
    - SAUCE: ecryptfs: forbid opening files without mmap handler
    - SAUCE: sched: panic on corrupted stack end

  * arm64: statically link rtc-efi (LP: #1583738)
    - [Config] Link rtc-efi statically on arm64

 -- Kamal Mostafa <email address hidden> Fri, 03 Jun 2016 10:02:16 -0700

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Frank Heimes (fheimes) on 2016-06-10
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers