[Regression] Failed to boot disco kernel built from master-next (kernel kernel NULL pointer dereference)

Bug #1853981 reported by Po-Hsu Lin on 2019-11-26
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Disco
High
Stefan Bader

Bug Description

Build environment: Tangerine build server with dchroot disco-amd64
Build command: fakeroot debian/rules do_tools=0 no_dumpfile=1 clean binary-generic binary-headers

While building the test kernel for Disco, the kernel built from master-next branch (head: e903d6a UBUNTU: upstream stable to v4.19.83, v5.3.10) will result in the following error on boot with an AMD64 KVM node:
[ 2.724234] [TTM] Initializing pool allocator
[ 2.725770] [TTM] Initializing DMA pool allocator
[ 2.727918] [drm] fb mappable at 0xFC000000
[ 2.729462] [drm] vram aper at 0xFC000000
[ 2.729980] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[ 2.730902] [drm] size 33554432
[ 2.735026] #PF error: [normal kernel read fault]
[ 2.735028] PGD 0 P4D 0
[ 2.735032] Oops: 0000 [#1] SMP PTI
[ 2.735036] CPU: 1 PID: 211 Comm: systemd-udevd Not tainted 5.0.0-37-generic #40
[ 2.736230] [drm] fb depth is 16
[ 2.738490] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 2.739526] [drm] pitch is 2048
[ 2.740751] RIP: 0010:attempt_merge+0x145/0x9d0
[ 2.740753] Code: 84 a2 02 00 00 41 8b 7c 24 24 4d 8b 44 24 30 c1 ef 09 89 f8 4c 01 c0 48 3b 43 30 0f 85 0e ff ff ff 48 8b 53 38 4d 8b 7c 24 40 <8b> 42 20 45 8b 5f 24 89 44 24 3c 4d 85 ff 74 20 41 8b 47 30 85 c0
[ 2.740754] RSP: 0018:ffffb4964067f2f8 EFLAGS: 00010246
[ 2.753490] input: VirtualPS/2 VMware VMMouse as /devices/platform/i8042/serio1/input/input4
[ 2.758417] RAX: 0000000000000000 RBX: ffff9a1b33261200 RCX: 0000000000000000
[ 2.758418] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 2.758418] RBP: ffffb4964067f378 R08: 0000000000000000 R09: 0000000000000004
[ 2.758419] R10: 0000000000000001 R11: 0000000000000800 R12: ffff9a1b33260000
[ 2.758420] R13: ffff9a1b330c7070 R14: ffff9a1b330c7070 R15: 0000000000000000
[ 2.758421] FS: 00007fd50d4748c0(0000) GS:ffff9a1b7db00000(0000) knlGS:0000000000000000
[ 2.758422] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.758423] CR2: 0000000000000020 CR3: 000000003377e000 CR4: 00000000000006e0
[ 2.758426] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2.786182] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2.786183] Call Trace:
[ 2.786188] ? insert_work+0x6c/0x80
[ 2.786193] ? __sbitmap_get_word+0x31/0x90
[ 2.786194] blk_attempt_req_merge+0xe/0x30
[ 2.786198] elv_attempt_insert_merge+0x35/0x90
[ 2.786201] blk_mq_sched_try_insert_merge+0x42/0x50
[ 2.786203] dd_insert_requests+0x90/0x1c0
[ 2.786206] blk_mq_sched_insert_request+0x12d/0x1a0
[ 2.786208] blk_mq_make_request+0x36c/0x4d0
[ 2.786210] generic_make_request+0x19e/0x400
[ 2.786212] submit_bio+0x49/0x140
[ 2.786215] ? guard_bio_eod+0x32/0x100
[ 2.786217] submit_bh_wbc+0x185/0x1b0
[ 2.786219] block_read_full_page+0x24d/0x330
[ 2.786222] ? check_disk_change+0x70/0x70
[ 2.786225] ? count_shadow_nodes+0x140/0x140
[ 2.786228] blkdev_readpage+0x18/0x20
[ 2.786232] do_read_cache_page+0x374/0x7a0
[ 2.786234] ? blkdev_writepages+0x10/0x10
[ 2.786237] ? get_page_from_freelist+0xefe/0x1440
[ 2.786240] ? __radix_tree_replace+0x59/0xf0
[ 2.786243] read_cache_page+0x12/0x20
[ 2.786244] read_dev_sector+0x2d/0xd0
[ 2.786246] read_lba+0xcd/0x220
[ 2.786248] efi_partition+0x1e4/0x6de
[ 2.786250] ? vsnprintf+0x103/0x520
[ 2.786252] ? snprintf+0x49/0x60
[ 2.786253] ? is_gpt_valid.part.7+0x470/0x470
[ 2.786255] check_partition+0x13e/0x238
[ 2.786256] rescan_partitions+0xaf/0x2a0
[ 2.786259] ? _cond_resched+0x19/0x30
[ 2.786260] __blkdev_get+0x393/0x550
[ 2.786262] blkdev_get+0x10c/0x330
[ 2.786264] ? wake_up_bit+0x42/0x50
[ 2.786267] ? unlock_new_inode+0x53/0x70
[ 2.786269] ? bdget+0x111/0x130
[ 2.786270] __device_add_disk+0x321/0x480
[ 2.786272] device_add_disk+0x13/0x20
[ 2.786275] virtblk_probe+0x4ef/0x79f [virtio_blk]
[ 2.786278] virtio_dev_probe+0x172/0x230
[ 2.786282] really_probe+0xfe/0x3b0
[ 2.786284] driver_probe_device+0xba/0x100
[ 2.786286] __driver_attach+0xf1/0x120
[ 2.786288] ? driver_probe_device+0x100/0x100
[ 2.786290] bus_for_each_dev+0x79/0xc0
[ 2.786292] ? kmem_cache_alloc_trace+0x1bd/0x1d0
[ 2.786294] driver_attach+0x1e/0x20
[ 2.786295] bus_add_driver+0x159/0x230
[ 2.786297] ? 0xffffffffc0410000
[ 2.786299] driver_register+0x70/0xc0
[ 2.786300] ? 0xffffffffc0410000
[ 2.786302] register_virtio_driver+0x20/0x30
[ 2.786304] init+0x56/0x1000 [virtio_blk]
[ 2.786308] do_one_initcall+0x4a/0x1c4
[ 2.786310] ? _cond_resched+0x19/0x30
[ 2.786311] ? kmem_cache_alloc_trace+0x167/0x1d0
[ 2.786314] do_init_module+0x60/0x220
[ 2.786316] load_module+0x1797/0x1a00
[ 2.786319] __do_sys_finit_module+0xbd/0x120
[ 2.786321] ? __do_sys_finit_module+0xbd/0x120
[ 2.786323] __x64_sys_finit_module+0x1a/0x20
[ 2.786325] do_syscall_64+0x5a/0x110
[ 2.786329] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2.786330] RIP: 0033:0x7fd50d7012e9
[ 2.786332] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 77 cb 0c 00 f7 d8 64 89 01 48
[ 2.786332] RSP: 002b:00007fff6dd12248 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 2.786334] RAX: ffffffffffffffda RBX: 000055b98c946aa0 RCX: 00007fd50d7012e9
[ 2.786335] RDX: 0000000000000000 RSI: 00007fd50d5e2cad RDI: 0000000000000005
[ 2.786335] RBP: 00007fd50d5e2cad R08: 0000000000000000 R09: 000055b98c926ab0
[ 2.786336] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000000000
[ 2.786337] R13: 000055b98c934800 R14: 0000000000020000 R15: 000055b98c946aa0
[ 2.786338] Modules linked in: psmouse cirrus(+) ttm virtio_blk(+) drm_kms_helper virtio_net(+) syscopyarea sysfillrect net_failover sysimgblt failover fb_sys_fops drm floppy i2c_piix4 pata_acpi
[ 2.786345] CR2: 0000000000000020
[ 2.786379] ---[ end trace 1c7dd9ecceb0cff9 ]---
[ 2.786382] RIP: 0010:attempt_merge+0x145/0x9d0
[ 2.786383] Code: 84 a2 02 00 00 41 8b 7c 24 24 4d 8b 44 24 30 c1 ef 09 89 f8 4c 01 c0 48 3b 43 30 0f 85 0e ff ff ff 48 8b 53 38 4d 8b 7c 24 40 <8b> 42 20 45 8b 5f 24 89 44 24 3c 4d 85 ff 74 20 41 8b 47 30 85 c0
[ 2.786384] RSP: 0018:ffffb4964067f2f8 EFLAGS: 00010246
[ 2.786385] RAX: 0000000000000000 RBX: ffff9a1b33261200 RCX: 0000000000000000
[ 2.786386] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 2.786386] RBP: ffffb4964067f378 R08: 0000000000000000 R09: 0000000000000004
[ 2.786387] R10: 0000000000000001 R11: 0000000000000800 R12: ffff9a1b33260000
[ 2.786388] R13: ffff9a1b330c7070 R14: ffff9a1b330c7070 R15: 0000000000000000
[ 2.786389] FS: 00007fd50d4748c0(0000) GS:ffff9a1b7db00000(0000) knlGS:0000000000000000
[ 2.786390] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.786391] CR2: 0000000000000020 CR3: 000000003377e000 CR4: 00000000000006e0
[ 2.786394] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2.786395] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2.786728] input: VirtualPS/2 VMware VMMouse as /devices/platform/i8042/serio1/input/input3
[ 2.787939] fbcon: cirrusdrmfb (fb0) is primary device
[ 2.802935] Console: switching to colour frame buffer device 128x48
[ 2.969597] virtio_net virtio1 ens4: renamed from eth0
[ 3.078798] cirrus 0000:00:02.0: fb0: cirrusdrmfb frame buffer device

Po-Hsu Lin (cypressyew) on 2019-11-26
description: updated

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1853981

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Disco):
status: New → Incomplete
tags: added: disco
Po-Hsu Lin (cypressyew) wrote :

Current bisect result (not finished yet):
git bisect start
# bad: [e903d6a85368180af5f09bf5d0be0308b757694d] UBUNTU: upstream stable to v4.19.83, v5.3.10
git bisect bad e903d6a85368180af5f09bf5d0be0308b757694d
# good: [e421062fe0c6fba6d639c9ca57e3587171e7664f] UBUNTU: Ubuntu-5.0.0-37.40
git bisect good e421062fe0c6fba6d639c9ca57e3587171e7664f
# bad: [120233013313edb21f4003433af1661e8961739d] tty: serial: owl: Fix the link time qualifier of 'owl_uart_exit()'
git bisect bad 120233013313edb21f4003433af1661e8961739d
# good: [482f7e77e9d7193bc165fab34bb54f142157d270] ARM: dts: Fix wrong clocks for dra7 mcasp
git bisect good 482f7e77e9d7193bc165fab34bb54f142157d270
# bad: [cc337c66ceb8c0bd641a821889fd06016106422f] dm snapshot: rework COW throttling to fix deadlock
git bisect bad cc337c66ceb8c0bd641a821889fd06016106422f

Po-Hsu Lin (cypressyew) wrote :

Bisect complete:
# first bad commit: [1078686a99eabb5f627cf0f1e4d9f4a05993917a] blk-mq: honor IO scheduler for multiqueue devices

Complete bisect log can be found in the attachment.

I have a test kernel with this patch reverted, and it boots fine.

Po-Hsu Lin (cypressyew) wrote :

With kernel built with b99079a8045a084722a6a33cca3c396634e6d241 (UBUNTU: upstream stable to v4.19.86, v5.3.13)
this issue still exists.

A more complete dmesg output could be found in the attachment.

Stefan Bader (smb) wrote :

I think that "blk-mq: honor IO scheduler for multiqueue devices" is broken in disco because it misses the following commit:

commit 970d168de636ddac8221cbd4a11d7678943e7379
blk-mq: simplify blk_mq_make_request()

Move the blk_mq_bio_to_request() call in front of the if-statement.

Either this has to be picked before the mq change or there must be a call to blk_mq_bio_to_request(rq, bio) before calling blk_mq_sched_insert_request(rq, false, true, true):

                }

                blk_add_rq_to_plug(plug, rq);
+ } else if (q->elevator) {
+ blk_mq_bio_to_request(rq, bio);
+ blk_mq_sched_insert_request(rq, false, true, true);
        } else if (plug && !blk_queue_nomerges(q)) {
                blk_mq_bio_to_request(rq, bio);

Changed in linux (Ubuntu):
status: Incomplete → Invalid
Changed in linux (Ubuntu Disco):
status: Incomplete → In Progress
Stefan Bader (smb) on 2019-12-02
Changed in linux (Ubuntu Disco):
importance: Undecided → High
assignee: nobody → Stefan Bader (smb)
Sean Feole (sfeole) wrote :

I attempted to test a boot kernel built by smb, using the same test env filed in the bug originally,

linux-buildinfo-5.0.0-37-generic_5.0.0-37.40+mnext1_amd64.deb
linux-cloud-tools-5.0.0-37-generic_5.0.0-37.40+mnext1_amd64.deb
linux-headers-5.0.0-37-generic_5.0.0-37.40+mnext1_amd64.deb
linux-image-unsigned-5.0.0-37-generic_5.0.0-37.40+mnext1_amd64.deb
linux-modules-5.0.0-37-generic_5.0.0-37.40+mnext1_amd64.deb
linux-modules-extra-5.0.0-37-generic_5.0.0-37.40+mnext1_amd64.deb
linux-tools-5.0.0-37-generic_5.0.0-37.40+mnext1_amd64.deb

dmesg attached, I did not see the failure as noted in the original description. Nor any other failures to note.

Linux automation-vm1 5.0.0-37-generic #40+mnext1 SMP Mon Dec 2 13:37:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

The system booted successfully

Stefan Bader (smb) on 2019-12-02
Changed in linux (Ubuntu Disco):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-disco' to 'verification-done-disco'. If the problem still exists, change the tag 'verification-needed-disco' to 'verification-failed-disco'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-disco
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers