kernel panic after "rmmod cx23885" by upstream commit 2f1ea29f / "[media] si2157: implement signal strength stats"

Bug #1532412 reported by Ernst Martin Witte on 2016-01-09
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned

Bug Description

[Reported also on <email address hidden>]

  * relevance / use-case:

    HTPC going into sleep (currently requires rmmod as isolated here)
    when not being used.

  * base system:

    - Ubuntu 15.10 (arch: amd64)
    - PCIe DVB-C card DVBSky T982 (2 DVB-C/T tuners)
    - starting point: Ubuntu kernel 4.2.x / x86_64

  * reliably trigger kernel panic (analyzed with kdb / gdb):

   - run tvheadend 4.x using the DVB-C tuners (from git/master in my
      case, but actual release should not be that relevant) ... likely
      any other DVB-C application would be sufficient.

    - stop tvheadend

    - rmmod cx23885

      Please note: It seems to be relevant that the DVB-C tuners were
      in use (and released) before the call to rmmod. During testing,
      I had situations where I could successfully do the rmmod if
      tvheadend was not yet started after a fresh boot.

      The kernel panic happens after the rmmod with a little delay
      (less than a second) - in some completely unrelated parts of the
      kernel, but reproducible locations, e.g. a stack frame like:

          call_timer_fn+0x35/0xf0
          __run_timers (inlined)
          run_timer_softirq+0x221/0x2d0
          __do_softirq+0x0f6/0250

          (unable to handle paging request, aka invalid pointer)

      or

          call_timer_fn+0x35/0xf0
          __run_timers (inlined)
          run_timer_softirq+0x162/0x2d0
          __do_softirq+0x0f6/0250

          (null pointer dereference)

      The faulting line is a call to NULL or invalid function pointer,
      namely line 1178 in media_tree/kernel/time/timer.c

      1177 trace_timer_expire_entry(timer);
      1178 fn(data);
      1179 trace_timer_expire_exit(timer);

      fn is a null pointer if called within the "if (irqsafe)" block
      or an invalid pointer if called from the respective else branch
      (ll. 1284 in media_tree/kernel/time/timer.c)

  * Affected:

       - v4.2.x

       - v4.3.x

       - v4.4-rcX

  * Not affected: v4.1.x

  * Bisect result on git://github.com/torvalds/linux.git/master
    between v4.1 and v4.2

    2f1ea29fca781b8e6600f3ece1f2dd98ae276294 is the first bad commit
    commit 2f1ea29fca781b8e6600f3ece1f2dd98ae276294
    Author: Antti Palosaari <crope <at> iki.fi>
    Date: Sun Sep 7 11:20:34 2014 -0300

        [media] si2157: implement signal strength stats

        Implement DVBv5 signal strength stats. Returns dBm.

        Signed-off-by: Antti Palosaari <crope <at> iki.fi>
        Tested-by: Adam Baker <linux <at> baker-net.org.uk>
        Signed-off-by: Mauro Carvalho Chehab <mchehab <at> osg.samsung.com>

    :040000 040000 1fc70a3d18532f91289e8b581081ee59feefc321
5130e9b011e9c4ba683cd4db3eae8dca67c3ef0e M drivers

  * Workaround:

    Reverting this patch on my tested kernels (incl. 4.4.rc8) prevents
    the kernel panic.

BR,
  Martin

P.S.: Bisect log:

user <at> host:/a/b/c$ git bisect log
# bad: [64291f7db5bd8150a74ad2036f1037e6a0428df2] Linux 4.2
# good: [b953c0d234bc72e8489d3bf51a276c5c4ec85345] Linux 4.1
git bisect start 'v4.2' 'v4.1'
# bad: [c11d716218910c3aa2bac1bb641e6086ad649555] Merge tag 'armsoc-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect bad c11d716218910c3aa2bac1bb641e6086ad649555
# good: [8a8c35fadfaf55629a37ef1a8ead1b8fb32581d2] mm: kmemleak_alloc_percpu() should follow the
gfp from per_alloc()
git bisect good 8a8c35fadfaf55629a37ef1a8ead1b8fb32581d2
# good: [14738e03312ff1137109d68bcbf103c738af0f4a] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
git bisect good 14738e03312ff1137109d68bcbf103c738af0f4a
# good: [4570a37169d4b44d316f40b2ccc681dc93fedc7b] Merge tag 'sound-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect good 4570a37169d4b44d316f40b2ccc681dc93fedc7b
# bad: [78aad7f81aa6dfccdb2804ac35db6fc371d265cf] [media] vivid-tpg: precalculate
colorspace/xfer_func combinations
git bisect bad 78aad7f81aa6dfccdb2804ac35db6fc371d265cf
# good: [356484cabe44984d2dc66a90bd5e3465ba1f64fb] [media] dw2102: resync fifo when demod locks
git bisect good 356484cabe44984d2dc66a90bd5e3465ba1f64fb
# good: [e4aa18d33c3a05f9ac51a8c8c7863318c807650f] [media] DocBook: Improve the description of the
properties API
git bisect good e4aa18d33c3a05f9ac51a8c8c7863318c807650f
# good: [e01dfc01914ab9a078ca8d08287c19c6663b5438] [media] videodev2.h: add COLORSPACE_DEFAULT
git bisect good e01dfc01914ab9a078ca8d08287c19c6663b5438
# good: [dc9ef7d11207a04514ca195f0c9f4d2ac56696e1] [media] DocBook media: rewrite frontend open/close
git bisect good dc9ef7d11207a04514ca195f0c9f4d2ac56696e1
# bad: [5ac417efe66ddd7cd70a98f7f4e32a14ae40a651] [media] sh_vou: avoid going past arrays
git bisect bad 5ac417efe66ddd7cd70a98f7f4e32a14ae40a651
# bad: [171fe6d1270d535eae798e4b5acc9f5d25e6e17e] [media] media: davinci_vpfe: set minimum
required buffers to three
git bisect bad 171fe6d1270d535eae798e4b5acc9f5d25e6e17e
# good: [d2b72f6482b9a3c57f036c11786a2489dcc81176] [media] si2168: Implement own I2C adapter locking
git bisect good d2b72f6482b9a3c57f036c11786a2489dcc81176
# bad: [694f9963edd831e4ed6fdbcb7134525cf5715a79] [media] media: davinci_vpfe: clear the output_specs
git bisect bad 694f9963edd831e4ed6fdbcb7134525cf5715a79
# bad: [2f1ea29fca781b8e6600f3ece1f2dd98ae276294] [media] si2157: implement signal strength stats
git bisect bad 2f1ea29fca781b8e6600f3ece1f2dd98ae276294
# first bad commit: [2f1ea29fca781b8e6600f3ece1f2dd98ae276294] [media] si2157: implement signal
strength stats

ProblemType: Bug
DistroRelease: Ubuntu 15.10
Package: linux-image-4.2.0-23-generic 4.2.0-23.28
ProcVersionSignature: Ubuntu 4.2.0-23.28-generic 4.2.6
Uname: Linux 4.2.0-23-generic x86_64
ApportVersion: 2.19.1-0ubuntu5
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D2c', '/dev/snd/pcmC1D3p', '/dev/snd/pcmC1D0c', '/dev/snd/pcmC1D0p', '/dev/snd/controlC1', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D3p', '/dev/snd/controlC0', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Sat Jan 9 10:53:16 2016
HibernationDevice: RESUME=UUID=9b9f7b19-cb13-45bc-8d5f-acd5610a93dd
InstallationDate: Installed on 2014-07-09 (548 days ago)
InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Release amd64 (20140417)
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.
MachineType: ASUS All Series
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.2.0-23-generic.efi.signed root=UUID=9613d690-85bf-4888-aa33-e0df9be775e7 ro
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-4.2.0-23-generic N/A
 linux-backports-modules-4.2.0-23-generic N/A
 linux-firmware 1.149.3
RfKill:

SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 05/15/2015
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 2404
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: H97M-E
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr2404:bd05/15/2015:svnASUS:pnAllSeries:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnH97M-E:rvrRevX.0x:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.name: All Series
dmi.product.version: System Version
dmi.sys.vendor: ASUS

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed

patch set submitted to <email address hidden>:
         [PATCH 0/5] [media] cancel_delayed_work_sync before device removal / kfree

Seems that the cause for the kernel panic is a missing call to
cancel_delayed_work_sync in si2157_remove before the call to kfree.
After adding cancel_delayed_work_sync(&dev->stat_work), rmmod does not
trigger the kernel panic any more.

However, very similar issues could be identified also in other modules:

   ts2020
   af9013
   af9033
   rtl2830

when looking in drivers/media/tuners and drivers/media/dvb-frontends.

Therefore, a patch set has been submitted to <email address hidden>
which contains fixes for those modules, too. The submitted patch set is:

   [PATCH 0/5] [media] cancel_delayed_work_sync before device removal / kfree

I hope these patches completely fix the issue and are ok for inclusion
in the kernel.

BR and thx!
  Martin

tags: added: patch
tags: added: bios-outdated-2602 bisect-done
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged

FYI a (rephrased ;-) note from Antti Palosaari on <email address hidden>:

It might be that the patches above are not the correct solution. If I understood correctly, the root cause might be more complicated: Seems that si2157_sleep (containing the single original cancel_delayed_work_sync) was not called as it should have been.

BR, Martin

To post a comment you must log in.