linux-4.4 scheduling while atomic on azure

Bug #1960059 reported by Thadeu Lima de Souza Cascardo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Xenial
Fix Committed
Medium
Tim Gardner

Bug Description

SRU Justification

[Impact]

Xenial Azure instance types Standard_F32s_v2 and Standard_D8d_v4 frequently panic with linux-lowlatency.

[ 11.338145] BUG: scheduling while atomic: systemd-udevd/891/0x00000002
[ 11.343832] Modules linked in: edac_core mlx4_core(+) kvm_intel input_leds kvm irqbypass nf_conntrack_ipv4 serio_raw hv_balloon pci_hyperv i2c_piix4 nf_defrag_ipv4 8250_fintek xt_conntrack joydev nf_conntrack mac_hid xt_owner xt_tcpudp iptable_security ip_tables x_tables ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic hid_hyperv hv_utils ptp hv_storvsc hyperv_keyboard hid hv_netvsc pps_core scsi_transport_fc hyperv_fb crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper psmouse pata_acpi cryptd hv_vmbus floppy fjes
[ 11.343869] CPU: 10 PID: 891 Comm: systemd-udevd Not tainted 4.4.0-219-lowlatency #252-Ubuntu
[ 11.343870] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
[ 11.343872] 0000000000000286 1002a46b615445cb ffff881095a8f598 ffffffff81861c01
[ 11.343874] ffff88109f096280 0000000000000000 ffff881095a8f5a8 ffffffff810ac88b
[ 11.343876] ffff881095a8f5f8 ffffffff8186c241 ffff881095a8f5c8 ffffffff0000000a
[ 11.343877] Call Trace:
[ 11.343883] [<ffffffff81861c01>] dump_stack+0x6d/0x8b
[ 11.343887] [<ffffffff810ac88b>] __schedule_bug+0x4b/0x60
[ 11.343890] [<ffffffff8186c241>] __schedule+0x641/0x830
[ 11.343892] [<ffffffff8186c46c>] schedule+0x3c/0x90
[ 11.343894] [<ffffffff8186f86d>] schedule_timeout+0x21d/0x2b0
[ 11.343896] [<ffffffff8186da55>] wait_for_completion+0xa5/0x120
[ 11.343899] [<ffffffff810b2ca0>] ? wake_up_q+0x70/0x70
[ 11.343902] [<ffffffffc03b6116>] hv_compose_msi_msg+0x1d6/0x290 [pci_hyperv]
[ 11.343904] [<ffffffffc03b5150>] ? put_hvpcibus+0x20/0x20 [pci_hyperv]
[ 11.343908] [<ffffffff810e726b>] irq_chip_compose_msi_msg+0x4b/0x60
[ 11.343910] [<ffffffff810eb64c>] msi_domain_activate+0x2c/0x70
[ 11.343912] [<ffffffff810ea054>] irq_domain_activate_irq+0x44/0x50
[ 11.343914] [<ffffffff810e6878>] irq_startup+0x38/0x90
[ 11.343916] [<ffffffff810e5142>] __setup_irq+0x5a2/0x650
[ 11.343928] [<ffffffffc074915d>] ? mlx4_free_cmd_mailbox+0x2d/0x40 [mlx4_core]
[ 11.343932] [<ffffffff811fb8fd>] ? kmem_cache_alloc_trace+0x1ed/0x210
[ 11.343940] [<ffffffffc0750590>] ? mlx4_interrupt+0x80/0x80 [mlx4_core]
[ 11.343942] [<ffffffff810e538b>] request_threaded_irq+0xfb/0x1a0
[ 11.343948] [<ffffffffc0751608>] mlx4_init_eq_table+0x3f8/0x630 [mlx4_core]
[ 11.343956] [<ffffffffc075c248>] mlx4_setup_hca+0x1f8/0x770 [mlx4_core]
[ 11.343962] [<ffffffffc075d507>] mlx4_load_one+0xb67/0x1670 [mlx4_core]
[ 11.343967] [<ffffffffc075e538>] mlx4_init_one+0x528/0x6c0 [mlx4_core]
[ 11.343970] [<ffffffff814640ea>] local_pci_probe+0x4a/0xa0
[ 11.343972] [<ffffffff814650f0>] ? pci_match_device+0xe0/0x110
[ 11.343973] [<ffffffff814655b3>] pci_device_probe+0x103/0x150
[ 11.343976] [<ffffffff8157ed0e>] driver_probe_device+0x1be/0x4b0
[ 11.343978] [<ffffffff8157f087>] __driver_attach+0x87/0x90
[ 11.343980] [<ffffffff8157f000>] ? driver_probe_device+0x4b0/0x4b0
[ 11.343982] [<ffffffff8157c992>] bus_for_each_dev+0x72/0xc0
[ 11.343983] [<ffffffff8157e52e>] driver_attach+0x1e/0x20
[ 11.343985] [<ffffffff8157e072>] bus_add_driver+0x1e2/0x280
[ 11.343986] [<ffffffffc078e000>] ? 0xffffffffc078e000
[ 11.343988] [<ffffffff8157fa20>] driver_register+0x60/0xe0
[ 11.343990] [<ffffffff81463a0c>] __pci_register_driver+0x4c/0x50
[ 11.343996] [<ffffffffc078e115>] mlx4_init+0x115/0x1000 [mlx4_core]
[ 11.343998] [<ffffffff81002135>] do_one_initcall+0xb5/0x200
[ 11.344003] [<ffffffff811ddcc5>] ? __vunmap+0xa5/0x100
[ 11.344005] [<ffffffff811fc1c6>] ? kfree+0x166/0x180
[ 11.344007] [<ffffffff8185f6ab>] do_init_module+0x5f/0x1da
[ 11.344010] [<ffffffff811139d2>] load_module+0x1712/0x1c90
[ 11.344012] [<ffffffff8110fd70>] ? __symbol_put+0x70/0x70
[ 11.344014] [<ffffffff81224860>] ? kernel_read+0x50/0x80
[ 11.344016] [<ffffffff81114194>] SYSC_finit_module+0xb4/0xe0
[ 11.344018] [<ffffffff811141de>] SyS_finit_module+0xe/0x10
[ 11.344020] [<ffffffff81870d1b>] entry_SYSCALL_64_fastpath+0x22/0xd0
[ 11.344383] BUG: scheduling while atomic: systemd-udevd/891/0x00000000
[ 11.349908] Modules linked in: edac_core mlx4_core(+) kvm_intel input_leds kvm irqbypass nf_conntrack_ipv4 serio_raw hv_balloon pci_hyperv i2c_piix4 nf_defrag_ipv4 8250_fintek xt_conntrack joydev nf_conntrack mac_hid xt_owner xt_tcpudp iptable_security ip_tables x_tables ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic hid_hyperv hv_utils ptp hv_storvsc hyperv_keyboard hid hv_netvsc pps_core scsi_transport_fc hyperv_fb crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper psmouse pata_acpi cryptd hv_vmbus floppy fjes
[ 11.349951] CPU: 10 PID: 891 Comm: systemd-udevd Tainted: G W 4.4.0-219-lowlatency #252-Ubuntu
[ 11.349953] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
[ 11.349954] 0000000000000286 1002a46b615445cb ffff881095a8f738 ffffffff81861c01
[ 11.349957] ffff88109f096280 0000000000000000 ffff881095a8f748 ffffffff810ac88b
[ 11.349959] ffff881095a8f798 ffffffff8186c241 ffffffff81293b26 ffff88100000000a
[ 11.349961] Call Trace:
[ 11.349967] [<ffffffff81861c01>] dump_stack+0x6d/0x8b
[ 11.349970] [<ffffffff810ac88b>] __schedule_bug+0x4b/0x60
[ 11.349974] [<ffffffff8186c241>] __schedule+0x641/0x830
[ 11.349978] [<ffffffff81293b26>] ? proc_mkdir_data+0x66/0x90
[ 11.349980] [<ffffffff8186c46c>] schedule+0x3c/0x90
[ 11.349983] [<ffffffff8186f779>] schedule_timeout+0x129/0x2b0
[ 11.349986] [<ffffffff810f5bf0>] ? del_timer_sync+0x50/0x50
[ 11.349989] [<ffffffff8186d3be>] wait_for_completion_timeout+0x9e/0x130
[ 11.349991] [<ffffffff810b2ca0>] ? wake_up_q+0x70/0x70
[ 11.350007] [<ffffffffc074a922>] mlx4_comm_cmd+0x1e2/0x350 [mlx4_core]

[ 11.350016] [<ffffffffc074b064>] __mlx4_cmd+0x5d4/0xa00 [mlx4_core]
[ 11.350025] [<ffffffffc0755849>] mlx4_NOP+0x39/0x40 [mlx4_core]
[ 11.350035] [<ffffffffc075c2e9>] mlx4_setup_hca+0x299/0x770 [mlx4_core]
[ 11.350042] [<ffffffffc075d507>] mlx4_load_one+0xb67/0x1670 [mlx4_core]
[ 11.350049] [<ffffffffc075e538>] mlx4_init_one+0x528/0x6c0 [mlx4_core]
[ 11.350052] [<ffffffff814640ea>] local_pci_probe+0x4a/0xa0
[ 11.350054] [<ffffffff814650f0>] ? pci_match_device+0xe0/0x110
[ 11.350056] [<ffffffff814655b3>] pci_device_probe+0x103/0x150
[ 11.350059] [<ffffffff8157ed0e>] driver_probe_device+0x1be/0x4b0
[ 11.350062] [<ffffffff8157f087>] __driver_attach+0x87/0x90
[ 11.350064] [<ffffffff8157f000>] ? driver_probe_device+0x4b0/0x4b0
[ 11.350066] [<ffffffff8157c992>] bus_for_each_dev+0x72/0xc0
[ 11.350068] [<ffffffff8157e52e>] driver_attach+0x1e/0x20
[ 11.350070] [<ffffffff8157e072>] bus_add_driver+0x1e2/0x280
[ 11.350072] [<ffffffffc078e000>] ? 0xffffffffc078e000
[ 11.350074] [<ffffffff8157fa20>] driver_register+0x60/0xe0
[ 11.350076] [<ffffffff81463a0c>] __pci_register_driver+0x4c/0x50
[ 11.350084] [<ffffffffc078e115>] mlx4_init+0x115/0x1000 [mlx4_core]
[ 11.350087] [<ffffffff81002135>] do_one_initcall+0xb5/0x200
[ 11.350090] [<ffffffff811ddcc5>] ? __vunmap+0xa5/0x100
[ 11.350094] [<ffffffff811fc1c6>] ? kfree+0x166/0x180
[ 11.350096] [<ffffffff8185f6ab>] do_init_module+0x5f/0x1da
[ 11.350100] [<ffffffff811139d2>] load_module+0x1712/0x1c90
[ 11.350102] [<ffffffff8110fd70>] ? __symbol_put+0x70/0x70
[ 11.350106] [<ffffffff81224860>] ? kernel_read+0x50/0x80
[ 11.350109] [<ffffffff81114194>] SYSC_finit_module+0xb4/0xe0
[ 11.350111] [<ffffffff811141de>] SyS_finit_module+0xe/0x10
[ 11.350113] [<ffffffff81870d1b>] entry_SYSCALL_64_fastpath+0x22/0xd0
[ 11.370387] mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.2-1 (Feb 2014)
[ 11.370976] mlx4_en 0001:00:02.0: Activating port:1
[ 11.382735] blk_update_request: I/O error, dev fd0, sector 0
[ 11.383648] floppy: error -5 while reading block 0
[ 11.462172] blk_update_request: I/O error, dev fd0, sector 0
[ 11.463084] floppy: error -5 while reading block 0
[ 11.632792] mlx4_en: 0001:00:02.0: Port 1: Using 256 TX rings
[ 11.632795] mlx4_en: 0001:00:02.0: Port 1: Using 8 RX rings
[ 11.632797] mlx4_en: 0001:00:02.0: Port 1: frag:0 - size:1522 prefix:0 stride:1536
[ 11.640259] mlx4_en: 0001:00:02.0: Port 1: Initializing port
[ 11.642243] hv_netvsc 000d3a3a-c481-000d-3a3a-c481000d3a3a eth0: VF registering: eth1
[ 11.646155] mlx4_core 0001:00:02.0 enP1p0s2: renamed from eth1

[Fix]

commit 80bfeeb9dd6b54ac108c884c792f0fc7d4912bee ("PCI: hv: Do not sleep in compose_msi_msg()")

[Test plan]

Provision Azure xenial instance types Standard_F32s_v2 and Standard_D8d_v4. Reboot
10 times.

[Where things could go wrong]

Boot panics could still happen.

Changed in linux (Ubuntu):
status: New → Invalid
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

80bfeeb9dd6b54ac108c884c792f0fc7d4912bee ("PCI: hv: Do not sleep in compose_msi_msg()") might fix this issue.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

I applied commit 80bfeeb9dd6b54ac108c884c792f0fc7d4912bee ("PCI: hv: Do not sleep in compose_msi_msg()") and rebooted a Standard_D8d_v4 instance 10 times without a BUG panic.

Tim Gardner (timg-tpi)
description: updated
Changed in linux (Ubuntu Xenial):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Tim Gardner (timg-tpi)
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.