BUG: soft lockup - CPU#3 stuck for 22s! [tvheadend:29245] - flush_tlb_others_ipi

Bug #1171096 reported by Bianco Veigel
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

This is the continuation of bug 986392.

Since I've updated tvheadend I have the problem, that after a few hours, the servers gets stuck and is not responding to network input nor Hardware Keyboard. The attached Monitor won't even wake from Powersafe. When tvheadend is not running, the server works flawless. This bug has probably broken my raid 2 days ago, I had to rebuild it by hand (hopefully I haven't lost any data).
I've restarted a few times in the last two days and whenever forgot to stop tvheadend the server died after some hours.

One time when this happened I had an open ssh session, this is the last output I got:

Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344029] BUG: soft lockup - CPU#3 stuck for 22s! [tvheadend:29245]
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344145] Modules linked in: xfs nfsd nfs lockd fscache auth_rpcgss binfmt_misc nfs_acl sunrpc stv6110x lnbp21 snd_hda_codec_hdmi snd_hda_codec_via snd_hda_intel snd_hda_codec bridge stp snd_hwdep snd_pcm snd_seq_midi stv090x snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device usblp ngene sp5100_tco dvb_core i2c_piix4 edac_core snd edac_mce_amd psmouse soundcore serio_raw shpchp snd_page_alloc k10temp ppdev asus_atk0110 parport_pc cxd2099(C) mac_hid hwmon_vid lp parport xts gf128mul dm_crypt raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid0 multipath linear raid1 radeon ttm drm_kms_helper drm usbhid hid wmi i2c_algo_bit r8169 pata_atiixp sata_mv [last unloaded: kvm]
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344270] CPU 3
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344273] Modules linked in: xfs nfsd nfs lockd fscache auth_rpcgss binfmt_misc nfs_acl sunrpc stv6110x lnbp21 snd_hda_codec_hdmi snd_hda_codec_via snd_hda_intel snd_hda_codec bridge stp snd_hwdep snd_pcm snd_seq_midi stv090x snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device usblp ngene sp5100_tco dvb_core i2c_piix4 edac_core snd edac_mce_amd psmouse soundcore serio_raw shpchp snd_page_alloc k10temp ppdev asus_atk0110 parport_pc cxd2099(C) mac_hid hwmon_vid lp parport xts gf128mul dm_crypt raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid0 multipath linear raid1 radeon ttm drm_kms_helper drm usbhid hid wmi i2c_algo_bit r8169 pata_atiixp sata_mv [last unloaded: kvm]
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344377]
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344384] Pid: 29245, comm: tvheadend Tainted: G C 3.2.0-40-generic #64-Ubuntu System manufacturer System Product Name/M5A78L-M/USB3
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344397] RIP: 0010:[<ffffffff81046977>] [<ffffffff81046977>] flush_tlb_others_ipi+0x127/0x140
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344416] RSP: 0000:ffff8803936efc18 EFLAGS: 00000246
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344421] RAX: 0000000000000000 RBX: 57ffc3cb668d28c0 RCX: 0000000000000004
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344427] RDX: ffffffff81806c60 RSI: 0000000000000100 RDI: ffffffff81e0c918
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344433] RBP: ffff8803936efc58 R08: 0000000000000007 R09: 0000000000000004
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344438] R10: 57ffc3cb668d28c0 R11: 0000000000000002 R12: ffff88041fcd3830
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344444] R13: ffff8803936efc58 R14: ffffffff811ab057 R15: ffff8803936efb88
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344451] FS: 00007fc935c30700(0000) GS:ffff88041fcc0000(0000) knlGS:0000000000000000
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344457] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344462] CR2: 000000000246d8b8 CR3: 00000003894d0000 CR4: 00000000000006e0
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344468] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344474] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344480] Process tvheadend (pid: 29245, threadinfo ffff8803936ee000, task ffff880403790000)
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344485] Stack:
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344525] ffff880404235c48 ffff880403790000 ffff8803936efc88 ffff880403a13480
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344536] ffff880403a13748 000000000246d8b8 000000000246d8b8 ffff880389549368
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344546] ffff8803936efc68 ffffffff81046b0e ffff8803936efc98 ffffffff81046c7e
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344555] Call Trace:
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344608] [<ffffffff81046b0e>] native_flush_tlb_others+0xe/0x10
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344618] [<ffffffff81046c7e>] flush_tlb_page+0x5e/0xb0
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344628] [<ffffffff8104597c>] ptep_set_access_flags+0x6c/0x70
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344637] [<ffffffff8113cefa>] do_wp_page+0x37a/0x740
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344646] [<ffffffff8165c58f>] ? schedule+0x3f/0x60
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344654] [<ffffffff8113eebb>] handle_pte_fault+0x1cb/0x200
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344663] [<ffffffff81140059>] handle_mm_fault+0x269/0x370
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344673] [<ffffffff81661f90>] do_page_fault+0x150/0x520
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344682] [<ffffffff810a1a88>] ? do_futex+0xd8/0x1b0
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344690] [<ffffffff810a1ca7>] ? sys_futex+0x147/0x1a0
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344698] [<ffffffff8117ae65>] ? fput+0x25/0x30
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344707] [<ffffffff8165ebf5>] page_fault+0x25/0x30
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.344712] Code: 00 00 00 00 48 8b 05 e9 7a c9 00 41 8d b6 cf 00 00 00 4c 89 e7 ff 90 d0 00 00 00 eb 09 0f 1f 80 00 00 00 00 f3 90 be 00 01 00 00 <4c> 89 e7 e8 61 5f 2d 00 85 c0 74 ed e9 59 ff ff ff 0f 1f 84 00
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.345566] Call Trace:
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.345575] [<ffffffff81046b0e>] native_flush_tlb_others+0xe/0x10
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.345584] [<ffffffff81046c7e>] flush_tlb_page+0x5e/0xb0
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.345593] [<ffffffff8104597c>] ptep_set_access_flags+0x6c/0x70
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.345601] [<ffffffff8113cefa>] do_wp_page+0x37a/0x740
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.345608] [<ffffffff8165c58f>] ? schedule+0x3f/0x60
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.345616] [<ffffffff8113eebb>] handle_pte_fault+0x1cb/0x200
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.345624] [<ffffffff81140059>] handle_mm_fault+0x269/0x370
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.345633] [<ffffffff81661f90>] do_page_fault+0x150/0x520
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.345642] [<ffffffff810a1a88>] ? do_futex+0xd8/0x1b0
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.345649] [<ffffffff810a1ca7>] ? sys_futex+0x147/0x1a0
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.345656] [<ffffffff8117ae65>] ? fput+0x25/0x30
Apr 19 22:46:38 dana.bln.wg1337.de kernel: [17872.345664] [<ffffffff8165ebf5>] page_fault+0x25/0x30

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-40-generic 3.2.0-40.64
ProcVersionSignature: Ubuntu 3.2.0-40.64-generic 3.2.40
Uname: Linux 3.2.0-40-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 2.0.1-0ubuntu17.2
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: SB [HDA ATI SB], device 0: VT1708S Analog [VT1708S Analog]
   Subdevices: 2/2
   Subdevice #0: subdevice #0
   Subdevice #1: subdevice #1
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC1', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D3p', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D2p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Card0.Amixer.info:
 Card hw:0 'SB'/'HDA ATI SB at 0xfe5f4000 irq 16'
   Mixer name : 'VIA VT1708S'
   Components : 'HDA:11060397,1043836c,00100000'
   Controls : 44
   Simple ctrls : 21
Card1.Amixer.info:
 Card hw:1 'HDMI'/'HDA ATI HDMI at 0xfe7e8000 irq 19'
   Mixer name : 'ATI RS690/780 HDMI'
   Components : 'HDA:1002791a,00791a00,00100000'
   Controls : 4
   Simple ctrls : 1
Card1.Amixer.values:
 Simple mixer control 'IEC958',0
   Capabilities: pswitch pswitch-joined penum
   Playback channels: Mono
   Mono: Playback [off]
Date: Sun Apr 21 10:19:55 2013
HibernationDevice: RESUME=
MachineType: System manufacturer System Product Name
MarkForUpload: True
ProcEnviron:
 LANGUAGE=de_DE:de:en_US:en
 TERM=xterm
 PATH=(custom, no user)
 LANG=de_DE.utf8
 SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: root=/dev/mapper/md1_crypt ro single
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-40-generic N/A
 linux-backports-modules-3.2.0-40-generic N/A
 linux-firmware 1.79.1
RfKill: Error: [Errno 2] No such file or directory
SourcePackage: linux
StagingDrivers: cxd2099
UpgradeStatus: Upgraded to precise on 2012-04-30 (355 days ago)
WifiSyslog:

dmi.bios.date: 03/23/2012
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1103
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: M5A78L-M/USB3
dmi.board.vendor: ASUSTeK Computer INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1103:bd03/23/2012:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKComputerINC.:rnM5A78L-M/USB3:rvrRevX.0x:cvnChassisManufacture:ct3:cvrChassisVersion:
dmi.product.name: System Product Name
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer

Revision history for this message
Bianco Veigel (binco) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Rob van Hoboken (rob-on-lists) wrote :

I had this same crash too on Mythbuntu 12.04 (updated to 3.2.0-40).
My dvb adapter is Digital Devices Octopus (ddbridge driver) and DuoFlex S2.
The crash/hang was preceded by messages "I2C timeout"

The problems were resolved by installing the latest dvb build from Roger Endriss http://linuxtv.org/hg/~endriss/media_build_experimental/ (on April 11, 2013). This also fixed other bugs with the adapter.

penalvch (penalvch)
tags: added: needs-upstream-testing
Revision history for this message
Bianco Veigel (binco) wrote :

I've found a corresponding bug entry for tvheadend at https://tvheadend.org/issues/1701.

I've searched my log files a haven't found any "I2C timeout", but you may be correct, that there is an issue with the dvb driver. What makes me suspicious is the point, that I've only updated tvheadend and the previous version was working flawless.

Revision history for this message
Rob van Hoboken (rob-on-lists) wrote : Re: [Bug 1171096] Re: BUG: soft lockup - CPU#3 stuck for 22s! [tvheadend:29245] - flush_tlb_others_ipi

Hi Bianco
I had the same thing (as I described in https://tvheadend.org/issues/1678):

    gc338802 does not have problem (same hardware, same actions, no
    timeout).
    The last 2 GIT version cause I2C timeouts and module traces pointing
    to tvheadend. After a while Linux becomes unresponsive and you need
    a hardware reset to boot.

No changes to kernel, older version of tvheadend = OK, newer version
changed the way tuning or mutex locking is used, and runs into a latent
kernel bug.
Update the dvb drivers in kernel, and problem is solved.

Thank you for reporting it to ubuntu. Maybe someday I can use plain
mythbuntu again.
Rob

On 22-Apr-2013 12:04, Bianco Veigel wrote:
> I've found a corresponding bug entry for tvheadend at
> https://tvheadend.org/issues/1701.
>
> I've searched my log files a haven't found any "I2C timeout", but you
> may be correct, that there is an issue with the dvb driver. What makes
> me suspicious is the point, that I've only updated tvheadend and the
> previous version was working flawless.
>

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.9 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc8-raring/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Bianco Veigel (binco) wrote :

I've tested with the v3.9 Kernel and the server is up and running since 14 hours without any error oder lockup. Is there any reason why I should switch back to the older kernel?

tags: added: kernel-fixed-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
penalvch (penalvch)
tags: added: kernel-fixed-upstream-v3.9-rc8
removed: needs-upstream-testing
Revision history for this message
Gero Graubner (graubner) wrote :

Hello,

same Problem here, running

Ubuntu 12.04 Server x64, Kernel 3.2.0-41-generic with XBMC 12.2 and tvheadend 3.4.
I have an nvidia gforce 210 with 304.84 driver, and an L4M Cine S2 V.6.2.

03:00.0 Multimedia controller: Digital Devices GmbH Octopus LE DVB adapter
Subsystem: Digital Devices GmbH Device 0020
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 16
Region 0: Memory at fb100000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [70] MSI: Enable- Count=1/2 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [90] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Latency L0 unlimited, L1 <1us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range A, TimeoutDis+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: 6dB
Transmit Margin: Normal Operating Range, EnterModifiedCompliance ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB
Capabilities: [100 v1] Vendor Specific Information: ID=0000 Rev=0 Len=00c
Kernel driver in use: DDBridge
Kernel modules: ddbridge

In TVHeadend the card is detected as STV090x.

With tvheadend 3.2 all things ran smoothly, but since the Upgrade to Version 3.4 there are random lockups.

When the Server died, there is no opportunity to get a message from it (via ssh), also there are no helpfull entries in the logs.

If Kernel 3.9 running smoothly, there must be something wrong withe the frontend-driver STV090x ore the backend ddbridge in the lower kernels 3.2-3.8.

Gero

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Many updates including the 3.2.62 upstream updates have been applied to Precise.

Would it be possible for you to apply the latest udpates to Precise and confirm if this bug still exists or not?

tags: added: kernel-da-key
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.