BUG: scheduling while atomic: ip/1210/0x00000200 on xenial/hwe rumford

Bug #1995870 reported by Luke Nowakowski-Krijger
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Xenial
Invalid
Undecided
Unassigned
Bionic
Fix Released
Medium
Unassigned
linux-hwe (Ubuntu)
Fix Committed
Undecided
Luke Nowakowski-Krijger
Xenial
Fix Committed
Medium
Unassigned
Bionic
Invalid
Undecided
Unassigned

Bug Description

[Impact]

There were BUG: scheduling while atomic: ip/1210/0x00000200 appearing
on xenial/hwe with the tg3 ethernet driver when running on rumford instance.
Probably some performace degradation as the driver was trying to sleep while
holding a spinlock.
There also was cluttering of the kernel log with stack traces.

[Fix]

Change usleep_range -> udelay which is safe to call from atomic
contexts.

[Test]

Compile tested only. Should see this resolved in the next cycle on
xenial/hwe rumford instance.

[Where problems could occur]

Not much regression potential as the correct delay call is now being
called.

--------------------------------------------------------------------------

There is a bug being triggered on a usleep within the tg3 driver with xenial/linux-hwe 4.15.0-197.208~16.04.1 on rumford.

This has been observed in multiple cycles during boot-testing.

[ 27.437584] BUG: scheduling while atomic: ip/1210/0x00000200
[ 27.468440] Modules linked in: nls_iso8859_1 intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_ssif kvm_intel hpilo kvm irqbypass intel_cstate shpchp ioatdma dca intel_rapl_perf ipmi_si ipmi_devintf lpc_ich ipmi_msghandler acpi_power_meter mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mgag200 i2c_algo_bit ses crct10dif_pclmul enclosure crc32_pclmul ttm ghash_clmulni_intel drm_kms_helper mlx5_core pcbc syscopyarea aesni_intel mlxfw sysfillrect sysimgblt aes_x86_64 fb_sys_fops tg3 devlink crypto_simd hpsa nvme glue_helper ptp drm cryptd pps_core nvme_core scsi_transport_sas wmi
[ 27.468491] CPU: 1 PID: 1210 Comm: ip Not tainted 4.15.0-197-generic #208~16.04.1-Ubuntu
[ 27.468493] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 05/21/2018
[ 27.468494] Call Trace:
[ 27.468508] dump_stack+0x6d/0x8b
[ 27.468515] __schedule_bug+0x54/0x70
[ 27.468519] __schedule+0x635/0x8b0
[ 27.468522] schedule+0x36/0x80
[ 27.468527] schedule_hrtimeout_range_clock+0xbc/0x1b0
[ 27.468532] ? __hrtimer_init+0x90/0x90
[ 27.468536] schedule_hrtimeout_range+0x13/0x20
[ 27.468539] usleep_range+0x62/0x90
[ 27.468547] tg3_ape_event_lock+0x36/0xa0 [tg3]
[ 27.468552] tg3_ape_driver_state_change.part.69+0xc6/0x160 [tg3]
[ 27.468557] tg3_start+0xebc/0x10b0 [tg3]
[ 27.468563] tg3_open+0x130/0x280 [tg3]
[ 27.468568] __dev_open+0xd7/0x150
[ 27.468572] __dev_change_flags+0x186/0x1c0
[ 27.468575] dev_change_flags+0x29/0x70
[ 27.468580] do_setlink+0x355/0xd90
[ 27.468589] ? nla_parse+0xa7/0x120
[ 27.468592] rtnl_newlink+0x773/0x8f0
[ 27.468597] ? do_anonymous_page+0x24b/0x430
[ 27.468605] ? ns_capable_common+0x2b/0x50
[ 27.468608] ? ns_capable+0x10/0x20
[ 27.468612] rtnetlink_rcv_msg+0x205/0x290
[ 27.468615] ? _cond_resched+0x1a/0x50
[ 27.468620] ? __kmalloc_node_track_caller+0x201/0x2c0
[ 27.468623] ? rtnl_calcit.isra.25+0x100/0x100
[ 27.468627] netlink_rcv_skb+0xd9/0x110
[ 27.468630] rtnetlink_rcv+0x15/0x20
[ 27.468633] netlink_unicast+0x198/0x260
[ 27.468637] netlink_sendmsg+0x2ea/0x410
[ 27.468640] sock_sendmsg+0x3e/0x50
[ 27.468642] ___sys_sendmsg+0x2e9/0x300
[ 27.468646] ? lru_cache_add_active_or_unevictable+0x36/0xb0
[ 27.468649] ? do_anonymous_page+0x24b/0x430
[ 27.468653] ? __handle_mm_fault+0xae7/0xc80
[ 27.468657] __sys_sendmsg+0x54/0x90
[ 27.468659] ? __sys_sendmsg+0x54/0x90
[ 27.468662] SyS_sendmsg+0x12/0x20
[ 27.468667] do_syscall_64+0x73/0x130
[ 27.468670] entry_SYSCALL_64_after_hwframe+0x41/0xa6
[ 27.468673] RIP: 0033:0x7f285dd57590
[ 27.468675] RSP: 002b:00007ffd4370bf38 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 27.468678] RAX: ffffffffffffffda RBX: 00007ffd43714040 RCX: 00007f285dd57590
[ 27.468679] RDX: 0000000000000000 RSI: 00007ffd4370bf80 RDI: 0000000000000003
[ 27.468681] RBP: 000000006364ddd8 R08: 0000000000000040 R09: 0000000000000008
[ 27.468682] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007ffd4370bf80
[ 27.468683] R13: 0000000000000000 R14: 00000000006573a0 R15: 00007ffd43714018
[ 27.468703] NOHZ: local_softirq_pending 282

CVE References

affects: ubuntu → linux (Ubuntu)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1995870

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
no longer affects: linux (Ubuntu)
Changed in linux-hwe (Ubuntu):
status: New → Confirmed
Revision history for this message
Luke Nowakowski-Krijger (lukenow) wrote :

Confirmed that this patch https://lore.kernel.org/lkml/<email address hidden>/ fixes this issue. Will send something to the mailing list

description: updated
Changed in linux-hwe (Ubuntu):
assignee: nobody → Luke Nowakowski-Krijger (lukenow)
Changed in linux-hwe (Ubuntu):
status: Confirmed → In Progress
description: updated
Changed in linux-hwe (Ubuntu):
status: In Progress → Fix Committed
Stefan Bader (smb)
Changed in linux-hwe (Ubuntu Bionic):
status: New → Invalid
Changed in linux-hwe (Ubuntu Xenial):
importance: Undecided → Medium
status: New → Fix Committed
Changed in linux (Ubuntu Xenial):
status: New → Invalid
Changed in linux (Ubuntu):
status: New → Invalid
Changed in linux (Ubuntu Bionic):
importance: Undecided → Medium
status: New → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (8.1 KiB)

This bug was fixed in the package linux - 4.15.0-201.212

---------------
linux (4.15.0-201.212) bionic; urgency=medium

  * bionic/linux: 4.15.0-201.212 -proposed tracker (LP: #1997871)

  * Expose built-in trusted and revoked certificates (LP: #1996892)
    - [Packaging] Expose built-in trusted and revoked certificates

  * Bionic update: upstream stable patchset 2022-09-21 (LP: #1990434)
    - s390/archrandom: prevent CPACF trng invocations in interrupt context

  * BUG: scheduling while atomic: ip/1210/0x00000200 on xenial/hwe rumford
    (LP: #1995870)
    - tg3: prevent scheduling while atomic splat

  * Bionic update: upstream stable patchset 2022-10-18 (LP: #1993349)
    - bpf: Verifer, adjust_scalar_min_max_vals to always call update_reg_bounds()
    - selftests/bpf: Fix test_align verifier log patterns
    - drm/msm/dsi: Fix number of regulators for msm8996_dsi_cfg
    - platform/x86: pmc_atom: Fix SLP_TYPx bitfield mask
    - wifi: cfg80211: debugfs: fix return type in ht40allow_map_read()
    - ethernet: rocker: fix sleep in atomic context bug in neigh_timer_handler
    - kcm: fix strp_init() order and cleanup
    - serial: fsl_lpuart: RS485 RTS polariy is inverse
    - staging: rtl8712: fix use after free bugs
    - vt: Clear selection before changing the font
    - USB: serial: ftdi_sio: add Omron CS1W-CIF31 device id
    - binder: fix UAF of ref->proc caused by race condition
    - drm/i915/reg: Fix spelling mistake "Unsupport" -> "Unsupported"
    - Input: rk805-pwrkey - fix module autoloading
    - hwmon: (gpio-fan) Fix array out of bounds access
    - thunderbolt: Use the actual buffer in tb_async_error()
    - xhci: Add grace period after xHC start to prevent premature runtime suspend.
    - USB: serial: cp210x: add Decagon UCA device id
    - USB: serial: option: add support for OPPO R11 diag port
    - USB: serial: option: add Quectel EM060K modem
    - USB: serial: option: add support for Cinterion MV32-WA/WB RmNet mode
    - usb: dwc2: fix wrong order of phy_power_on and phy_init
    - USB: cdc-acm: Add Icom PMR F3400 support (0c26:0020)
    - usb-storage: Add ignore-residue quirk for NXP PN7462AU
    - s390/hugetlb: fix prepare_hugepage_range() check for 2 GB hugepages
    - s390: fix nospec table alignments
    - USB: core: Prevent nested device-reset calls
    - usb: gadget: mass_storage: Fix cdrom data transfers on MAC-OS
    - wifi: mac80211: Don't finalize CSA in IBSS mode if state is disconnected
    - net: mac802154: Fix a condition in the receive path
    - ALSA: seq: oss: Fix data-race for max_midi_devs access
    - ALSA: seq: Fix data-race at module auto-loading
    - efi: capsule-loader: Fix use-after-free in efi_capsule_write
    - wifi: iwlegacy: 4965: corrected fix for potential off-by-one overflow in
      il4965_rs_fill_link_cmd()
    - fs: only do a memory barrier for the first set_buffer_uptodate()
    - Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()"
    - drm/amdgpu: Check num_gfx_rings for gfx v9_0 rb setup.
    - drm/radeon: add a force flush to delay work when radeon
    - parisc: ccio-dma: Handle kmalloc failure in ccio_init_resources()
    - parisc: Add runtime check to pre...

Read more...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/4.15.0-201.212 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-bionic-linux verification-needed-bionic
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Verified on node rumford with linux/4.15.0-201.212 this issue does not exist anymore, thanks!

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure-4.15/4.15.0-1162.177 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-bionic-linux-azure-4.15 verification-needed-bionic
removed: verification-done-bionic
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.