Mellanox [mlx5] [bionic] UBSAN: Undefined behaviour in ./include/linux/net_dim.h

Bug #1763269 reported by Talat Batheesh on 2018-04-12
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned

Bug Description

We see UBSAN: Undefined behaviour in ./include/linux/net_dim.h:243:6
 we saw the following trace during traffic in the regression:

[12885.292500] UBSAN: Undefined behaviour in ./include/linux/net_dim.h:243:6
[12885.296358] signed integer overflow:
[12885.300100] 358869104 * 100 cannot be represented in type 'int'
[12885.304001] CPU: 2 PID: 19630 Comm: sock_stream_tes Tainted: G OE 4.15.0-rc8-for-upstream-dbg-2018-01-25_19-31-23-61 #1
[12885.311856] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
[12885.316091] Call Trace:
[12885.320234] <IRQ>
[12885.324366] dump_stack+0xd1/0x159
[12885.328586] ? dma_virt_map_sg+0x147/0x147
[12885.332804] ? val_to_string.constprop.4+0x88/0xd1
[12885.337055] ubsan_epilogue+0x9/0x49
[12885.341345] handle_overflow+0x15e/0x189
[12885.345636] ? __ubsan_handle_negate_overflow+0x108/0x108
[12885.349891] ? kvm_clock_read+0x1f/0x30
[12885.354230] ? ktime_get+0x18d/0x280
[12885.358654] ? getrawmonotonic64+0x320/0x320
[12885.363116] ? mark_lock+0x1cf/0xc50
[12885.367624] ? inet_recvmsg+0x121/0x4a0
[12885.372114] mlx5e_napi_poll+0x1199/0x15c0 [mlx5_core]
[12885.376774] ? mlx5e_rx_dim_work+0x160/0x160 [mlx5_core]
[12885.381406] ? print_irqtrace_events+0x120/0x120
[12885.385907] ? mark_held_locks+0x93/0x100
[12885.392099] ? print_irqtrace_events+0x120/0x120
[12885.396589] ? trace_hardirqs_on_caller+0x206/0x390
[12885.401278] ? kasan_slab_free+0x87/0xc0
[12885.406000] ? pvclock_clocksource_read+0x146/0x280
[12885.410608] ? mark_held_locks+0x71/0x100
[12885.415251] net_rx_action+0x58c/0x10a0
[12885.419873] ? napi_complete_done+0x3d0/0x3d0
[12885.424385] ? check_chain_key+0x150/0x260
[12885.428784] ? debug_check_no_locks_freed+0x200/0x200
[12885.433041] ? match_held_lock+0x8a/0x4f0
[12885.437215] ? match_held_lock+0x8a/0x4f0
[12885.441249] ? lock_downgrade+0x3e0/0x3e0
[12885.445151] ? do_raw_spin_unlock+0x14d/0x230
[12885.448970] ? save_trace+0x1f0/0x1f0
[12885.452664] ? save_trace+0x1f0/0x1f0
[12885.456224] ? match_held_lock+0xa2/0x4f0
[12885.459668] ? pvclock_clocksource_read+0x146/0x280
[12885.463085] ? save_trace+0x1f0/0x1f0
[12885.466361] ? preempt_count_sub+0x14/0xd0
[12885.469566] ? __lock_is_held+0x5d/0x110
[12885.472665] ? preempt_count_sub+0x14/0xd0
[12885.475653] ? __lock_is_held+0x5d/0x110
[12885.478529] ? mark_lock+0x1cf/0xc50
[12885.481276] ? match_held_lock+0xa2/0x4f0
[12885.483984] ? print_irqtrace_events+0x120/0x120
[12885.486679] ? save_trace+0x1f0/0x1f0
[12885.490891] ? irq_exit+0x150/0x150
[12885.493454] ? __napi_schedule+0x1ae/0x220
[12885.495936] ? netdev_master_upper_dev_link+0x20/0x20
[12885.498402] ? check_chain_key+0x150/0x260
[12885.500774] ? __tasklet_schedule+0x22/0xf0
[12885.503086] ? match_held_lock+0xa2/0x4f0
[12885.505431] ? mlx5_eq_int+0x821/0xb50 [mlx5_core]
[12885.507775] ? save_trace+0x1f0/0x1f0
[12885.510082] ? pvclock_clocksource_read+0x146/0x280
[12885.512416] ? pvclock_read_flags+0x80/0x80
[12885.514705] ? save_trace+0x1f0/0x1f0
[12885.516995] ? __handle_irq_event_percpu+0x1b0/0x800
[12885.519305] ? __lock_is_held+0x5d/0x110
[12885.521630] __do_softirq+0x248/0xba9
[12885.523913] ? __irqentry_text_end+0x1f8a70/0x1f8a70
[12885.526234] ? pvclock_clocksource_read+0x146/0x280
[12885.528563] ? pvclock_read_flags+0x80/0x80
[12885.530843] ? do_raw_spin_trylock+0x120/0x120
[12885.533178] ? kvm_clock_read+0x1f/0x30
[12885.535432] ? kvm_sched_clock_read+0x5/0x10
[12885.537702] ? sched_clock_cpu+0x14/0x1f0
[12885.539968] irq_exit+0xf4/0x150
[12885.542186] do_IRQ+0xe8/0x1e0
[12885.544390] common_interrupt+0xa2/0xa2
[12885.546607] </IRQ>
There is int overflow in:
include/linux/net_dim.h
#define IS_SIGNIFICANT_DIFF(val, ref) \
(((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */

The include/linux/net_dim.h library in new in kernel 4.16, in 4.15 kernel this code was in drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c

The upstream fix that fix this issue is
commit f97c3dc3c0e8d23a5c4357d182afeef4c67f5c33
Author: Tal Gilboa <email address hidden>
Date: Thu Mar 29 13:53:52 2018 +0300

    net/dim: Fix int overflow

    When calculating difference between samples, the values
    are multiplied by 100. Large values may cause int overflow
    when multiplied (usually on first iteration).
    Fixed by forcing 100 to be of type unsigned long.

    Fixes: 4c4dbb4a7363 ("net/mlx5e: Move dynamic interrupt coalescing code to include/linux")
    Signed-off-by: Tal Gilboa <email address hidden>
    Reviewed-by: Andy Gospodarek <email address hidden>
    Signed-off-by: David S. Miller <email address hidden>

diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index bebeaad..29ed8fd 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -231,7 +231,7 @@ static inline void net_dim_exit_parking(struct net_dim *dim)
 }

 #define IS_SIGNIFICANT_DIFF(val, ref) \
- (((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */
+ (((100UL * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */

 static inline int net_dim_stats_compare(struct net_dim_stats *curr,
                                        struct net_dim_stats *prev)

Will sent a patch to Ubuntu kernel mailing list with a backported patch to the old location

CVE References

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1763269

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Seth Forshee (sforshee) on 2018-04-12
Changed in linux (Ubuntu):
status: Incomplete → Fix Committed
Launchpad Janitor (janitor) wrote :
Download full text (35.7 KiB)

This bug was fixed in the package linux - 4.15.0-19.20

---------------
linux (4.15.0-19.20) bionic; urgency=medium

  * linux: 4.15.0-19.20 -proposed tracker (LP: #1766021)

  * Kernel 4.15.0-15 breaks Dell PowerEdge 12th Gen servers (LP: #1765232)
    - Revert "blk-mq: simplify queue mapping & schedule with each possisble CPU"
    - Revert "genirq/affinity: assign vectors to all possible CPUs"

linux (4.15.0-18.19) bionic; urgency=medium

  * linux: 4.15.0-18.19 -proposed tracker (LP: #1765490)

  * [regression] Ubuntu 18.04:[4.15.0-17-generic #18] KVM Guest Kernel:
    meltdown: rfi/fallback displacement flush not enabled bydefault (kvm)
    (LP: #1765429)
    - powerpc/pseries: Fix clearing of security feature flags

  * signing: only install a signed kernel (LP: #1764794)
    - [Packaging] update to Debian like control scripts
    - [Packaging] switch to triggers for postinst.d postrm.d handling
    - [Packaging] signing -- switch to raw-signing tarballs
    - [Packaging] signing -- switch to linux-image as signed when available
    - [Config] signing -- enable Opal signing for ppc64el
    - [Packaging] printenv -- add signing options

  * [18.04 FEAT] Sign POWER host/NV kernels (LP: #1696154)
    - [Packaging] signing -- add support for signing Opal kernel binaries

  * Please cherrypick s390 unwind fix (LP: #1765083)
    - s390/compat: fix setup_frame32

  * Ubuntu 18.04 installer does not detect any IPR based HDD/RAID array [S822L]
    [ipr] (LP: #1751813)
    - d-i: move ipr to storage-core-modules on ppc64el

  * drivers/gpu/drm/bridge/adv7511/adv7511.ko missing (LP: #1764816)
    - SAUCE: (no-up) rename the adv7511 drm driver to adv7511_drm

  * Miscellaneous Ubuntu changes
    - [Packaging] Add linux-oem to rebuild test blacklist.

linux (4.15.0-17.18) bionic; urgency=medium

  * linux: 4.15.0-17.18 -proposed tracker (LP: #1764498)

  * Eventual OOM with profile reloads (LP: #1750594)
    - SAUCE: apparmor: fix memory leak when duplicate profile load

linux (4.15.0-16.17) bionic; urgency=medium

  * linux: 4.15.0-16.17 -proposed tracker (LP: #1763785)

  * [18.04] [bug] CFL-S(CNP)/CNL GPIO testing failed (LP: #1757346)
    - [Config]: Set CONFIG_PINCTRL_CANNONLAKE=y

  * [Ubuntu 18.04] USB Type-C test failed on GLK (LP: #1758797)
    - SAUCE: usb: typec: ucsi: Increase command completion timeout value

  * Fix trying to "push" an already active pool VP (LP: #1763386)
    - SAUCE: powerpc/xive: Fix trying to "push" an already active pool VP

  * hisi_sas: Revert and replace SAUCE patches w/ upstream (LP: #1762824)
    - Revert "UBUNTU: SAUCE: scsi: hisi_sas: export device table of v3 hw to
      userspace"
    - Revert "UBUNTU: SAUCE: scsi: hisi_sas: config for hip08 ES"
    - scsi: hisi_sas: modify some register config for hip08
    - scsi: hisi_sas: add v3 hw MODULE_DEVICE_TABLE()

  * Realtek card reader - RTS5243 [VEN_10EC&DEV_5260] (LP: #1737673)
    - misc: rtsx: Move Realtek Card Reader Driver to misc
    - updateconfigs for Realtek Card Reader Driver
    - misc: rtsx: Add support for RTS5260
    - misc: rtsx: Fix symbol clashes

  * Mellanox [mlx5] [bionic] UBSAN: Undefined behaviour in
    ./include/linux/net_dim.h (LP: #1...

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers