[4.8.0-14/ppc64el regression] rmmod scsi_debug keeps causing kernel oops

Bug #1626737 reported by Martin Pitt on 2016-09-22
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
systemd (Ubuntu)
Medium
Martin Pitt

Bug Description

Since upgrading to 4.8.0-14, the "storage" autopkgtest of systemd is broken. This uses scsi_debug to get a test hard drive, which is reset between the test through unloading/reloading the module. This has worked fine so far (and still works on amd64/i386), but now regularly triggers a kernel oops:

[ 161.120362] Unable to handle kernel paging request for data at address 0x00000000
[ 161.120468] Faulting instruction address: 0xc000000000538ecc
[ 161.120517] Oops: Kernel access of bad area, sig: 11 [#1]
[ 161.120555] SMP NR_CPUS=2048 NUMA pSeries
[ 161.120595] Modules linked in: dm_crypt dm_mod xts algif_skcipher af_alg sd_mod sg xt_TCPMSS xt_tcpudp iptable_mangle ghash_generic gf128mul vmx_crypto virtio_balloon ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache crc32c_generic btrfs xor raid6_pq ohci_pci ehci_pci ohci_hcd virtio_blk virtio_net ehci_hcd usbcore crc32c_vpmsum usb_common virtio_pci virtio_ring virtio [last unloaded: scsi_debug]
[ 161.121016] CPU: 0 PID: 5473 Comm: rmmod Not tainted 4.8.0-15-generic #16-Ubuntu
[ 161.121067] task: c00000005ae51980 task.stack: c00000005ef58000
[ 161.121110] NIP: c000000000538ecc LR: c000000000538ee0 CTR: c0000000000f7250
[ 161.121162] REGS: c00000005ef5b9f0 TRAP: 0300 Not tainted (4.8.0-15-generic)
[ 161.121213] MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28002444 XER: 20000000
[ 161.121390] CFAR: c00000000009a8e0 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1
               GPR00: c000000000538e98 c00000005ef5bc70 c000000000f67b00 ffffffffffffffff
               GPR04: d000000001302018 0000000000000002 0000000000000000 c0000000010d7b00
               GPR08: c000000000fa7b00 0000000000000063 0000000000000073 0000000000000004
               GPR12: 0000000028002844 c00000000fb80000 0000000000000000 0000000000000000
               GPR16: 0000000000000000 00000100331f11f0 00000000384b3890 00000000384b3848
               GPR20: 00000000384b3830 00000000384b3870 00000000384b38a8 00000000384b3888
               GPR24: 00003fffd23d6e70 c000000000ebdec8 fffffffffffffffe d000000001302018
               GPR28: c000000000ebdeb8 0000000000000000 0000000000000000 0000000000000000
[ 161.122099] NIP [c000000000538ecc] ddebug_remove_module+0x8c/0x160
[ 161.122143] LR [c000000000538ee0] ddebug_remove_module+0xa0/0x160
[ 161.122186] Call Trace:
[ 161.122205] [c00000005ef5bc70] [c000000000538e98] ddebug_remove_module+0x58/0x160 (unreliable)
[ 161.122280] [c00000005ef5bd10] [c00000000018961c] free_module+0x21c/0x3c0
[ 161.122333] [c00000005ef5bd60] [c000000000189a38] SyS_delete_module+0x278/0x2f0
[ 161.122394] [c00000005ef5be30] [c0000000000095e0] system_call+0x38/0x108
[ 161.122445] Instruction dump:
[ 161.122472] 3d42fff5 e92a63b8 7fa9e000 7d3d4b78 ebe90000 419e00bc 7d3e4b78 3b40fffe
[ 161.122561] 48000018 7fbfe000 7ffdfb78 7ffefb78 <ebff0000> 419e0060 e87e0010 7f64db78
[ 161.122651] ---[ end trace 5f19b96c7077a0e0 ]---

This isn't reproducible by merely loading and unloading the module, it apparently needs to get some actual exercise. I'll find a simpler reproducer than running the systemd test tomorrow morning.

Martin Pitt (pitti) on 2016-09-22
tags: added: bot-stop-nagging
summary: - [4.8 regression] rmmod scsi_debug keeps causing kernel oops
+ [4.8.0-14/ppc64el regression] rmmod scsi_debug keeps causing kernel oops
Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: kernel-4.8
Martin Pitt (pitti) wrote :

I adjusted the test to avoid "rmmod scsi_debug": https://anonscm.debian.org/cgit/pkg-systemd/systemd.git/commit/?id=be77e470d8

So there's still a bug there, but it won't block testing any more at least. And rmmod is always a bit brittle anyway, so let's avoid it.

Changed in systemd (Ubuntu):
assignee: nobody → Martin Pitt (pitti)
importance: Undecided → Medium
status: New → Fix Committed
Brad Figg (brad-figg) on 2016-09-23
Changed in linux (Ubuntu):
importance: Undecided → Medium
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 231-9

---------------
systemd (231-9) unstable; urgency=medium

  * pid1: process zero-length notification messages again.
    Just remove the assertion, the "n" value was not used anyway. This fixes
    a local DoS due to unprocessed/unclosed fds which got introduced by the
    previous fix. (Closes: #839171) (LP: #1628687)
  * pid1: Robustify manager_dispatch_notify_fd()
  * test/networkd-test.py: Add missing writeConfig() helper function.

 -- Martin Pitt <email address hidden> Thu, 29 Sep 2016 23:39:24 +0200

Changed in systemd (Ubuntu):
status: Fix Committed → Fix Released
Martin Pitt (pitti) wrote :

I didn't find a simpler reproducer on the CLI, and the systemd test now does not call rmmod any more, so there's no handle on this any more.

Changed in linux (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers