Regression in dib0700 dvb-t driver

Bug #1088733 reported by Stephen Thirlwall
32
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Joseph Salisbury

Bug Description

I have two dib0700-based dvb-t usb tuners. A Sony PlayTV dual tuner, and an Asus U3100+ single tuner. Since the 3.2.0-32 kernel, both these tuners have been failing to record.

The problem first appeared with the 3.2.0-32 kernel. Every kernel up to 3.2.0-31 has worked, and the problem has persisted with every kernel since 3.2.0-32.

I've bisected the problem down to this commit:
  http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-precise.git;a=commit;h=1cef533d4e688acbf048b5b32e6f3ca9a265ed4e

Every build before this works, every one from here on fails.
Reverting this commit onto the current git head fixes the problem.

Symptoms: zero-byte recordings. The system wakes via bios timer and boots cold form poweroff (not sleeping or hibernating).

There are no error messages in the kernel logs, syslog, or mythbuntu logs.

This is not 100% failure. If I try to manually kick off a recording, or watch live tv, it works some of the time, maybe 25% to 50%. I find that if a recording has worked once it will tend to continue to work. Care must be taken in testing to avoid false positives.

Unfortunately I do not have a simple test case. My test case is to schedule a series of recordings overnight (usually ten), spaced apart so that the machine powers off between recordings.

When failing, I typically get all recordings zero-byte length. Occasionally one will work.
When passing, they all work. (Once a month I get a zero-byte recording, but they are pretty rare).

I get the exact same behaviour between the Asus tuner and the Sony tuner, neither one of these appears to be the culprit.

I'm more than happy to run any test cases for this.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-34-generic-pae 3.2.0-34.53
ProcVersionSignature: Ubuntu 3.2.0-34.53-generic-pae 3.2.33
Uname: Linux 3.2.0-34-generic-pae i686
NonfreeKernelModules: nvidia
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 2.0.1-0ubuntu15
Architecture: i386
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: sdt 2034 F.... xfce4-volumed
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Card0.Amixer.info:
 Card hw:0 'NVidia'/'HDA NVidia at 0xfae78000 irq 21'
   Mixer name : 'Nvidia MCP79/7A HDMI'
   Components : 'HDA:10ec0885,18495890,00100101 HDA:10de0007,10de0101,00100100'
   Controls : 41
   Simple ctrls : 19
CurrentDmesg: [ 33.456031] eth0: no IPv6 routers present
Date: Tue Dec 11 11:53:51 2012
HibernationDevice: RESUME=UUID=cbd9edb8-460f-42e8-acc8-7e325f087a52
InstallationMedia: Mythbuntu 10.04 LTS "Lucid Lynx" - Release i386 (20100427.1)
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
Lsusb:
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 003: ID 1415:0003 Nam Tai E&E Products Ltd. or OmniVision Technologies, Inc.
 Bus 002 Device 002: ID 1997:0409
MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M.
MarkForUpload: True
ProcEnviron:
 TERM=xterm-color
 PATH=(custom, user)
 LANG=en_AU.UTF-8
 SHELL=/bin/bash
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-34-generic-pae root=UUID=411557e4-14b3-44a5-8100-438f51b49711 ro quiet splash vt.handoff=7
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-34-generic-pae N/A
 linux-backports-modules-3.2.0-34-generic-pae N/A
 linux-firmware 1.79.1
RfKill: Error: [Errno 2] No such file or directory
SourcePackage: linux
UpgradeStatus: Upgraded to precise on 2012-10-23 (48 days ago)
UserAsoundrc:
 # # contents of .asoundrc # #
 pcm.!default {
 type plug
 slave.pcm "iec958"
 }
dmi.bios.date: 08/17/2010
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: P1.20
dmi.board.name: AMCP7AION-HT
dmi.board.vendor: ASRock
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrP1.20:bd08/17/2010:svnToBeFilledByO.E.M.:pnToBeFilledByO.E.M.:pvrToBeFilledByO.E.M.:rvnASRock:rnAMCP7AION-HT:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.name: To Be Filled By O.E.M.
dmi.product.version: To Be Filled By O.E.M.
dmi.sys.vendor: To Be Filled By O.E.M.

Revision history for this message
Stephen Thirlwall (l-sdt) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for bisecting and finding the commit that introduced this regression. It would also be good to know if this bug is fixed upstream.

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.7 kernel[0] (Not a kernel in the daily directory) and install both the linux-image and linux-image-extra .deb packages.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7-raring/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
tags: added: kernel-da-key
Revision history for this message
Stephen Thirlwall (l-sdt) wrote :
tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Precise test kernel with commit 1cef533d4e688acbf048b5b32e6f3ca9a265ed4e reverted. The test kernel can be downloaded from:

http://people.canonical.com/~jsalisbury/lp1088733/

Can you test that kernel and report back if it fixes this bug?

Thanks in advance!

Changed in linux (Ubuntu):
status: Triaged → Incomplete
status: Incomplete → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Stephen Thirlwall (l-sdt) wrote :

I can confirm that this kernel fixes the bug.

I can also confirm that the bug is present in the stock 3.2.0-35 kernel.

Thanks for looking into this Joseph!

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Before I submit a request to have that commit reverted, can you test the latest mainline kernel, which is v3.8-rc2? Just to confirm the bug isn't already fixed upstream. The kernel is available from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8-rc2-raring/

Revision history for this message
Stephen Thirlwall (l-sdt) wrote :

The problem still occurs in the v3.8-rc2 kernel.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@Stephen,

I've reported this issue upstream. The upstream developers would like to see what happens in both a failed and a successful recording attempt, while collecting USB trace data.

Would it be possible for you to collect some USB Tracedata for both cases? There are some web pages that describe how this can be done:

https://wiki.ubuntu.com/Kernel/Debugging/USB#Getting_USB_Tracedata
http://www.kernel.org/doc/Documentation/usb/usbmon.txt

Revision history for this message
Stephen Thirlwall (l-sdt) wrote :

Happy to collect that data, but I'm on vacation right now and won't be back until the first week of February.

Shall gather some usb tracedata when I return.

Thanks again.

Revision history for this message
Stephen Thirlwall (l-sdt) wrote :

These are the results of making recordings with mythtv, capturing the output from /sys/kernel/debug/usb/usbmon/1u.

It is looking likely from this testing that usually only the first recording after boot fails. Subsequent recordings succeed.

Captures of both working and failed recordings are included for comparison.

First run is with patched kernel 3.2.0-36-generic-pae #57-Ubuntu.
The subsequent runs are using the stock linux-image-3.2.0-36-generic-pae kernel. The system was rebooted between runs.

The third run (run2) had no recording failures - I seem to get this about 20% of the time.

Revision history for this message
Alan Stern (stern) wrote :

Stephen, please try this debugging patch. Let's see if the output it produces can be matched up to the usbmon data for both working and non-working cases.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Stephen,

I'll build a test kernel for you to test with this patch applied.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I created a Raring test kernel with the patches in comment #12. Can you test this kernel and collect data similar to what you did in comment #11?

Thanks again for all the help testing!

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The test kernel can be downloaded from:
http://people.canonical.com/~jsalisbury/lp1088733/

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Also, for the 3.8 kernel, you will need to install both the linux-image and linux-image-extra packages.

Revision history for this message
Stephen Thirlwall (l-sdt) wrote :

Thanks for the kernel Joseph - that's a big help.

Attached is another set of usbmon captures, as well as the dmesg output for each session.

Unlike last time, I was not able to get a recording to work the first time through. I'll keep trying though, and post the results if I get one to happen.

Revision history for this message
Alan Stern (stern) wrote : Re: [Bug 1088733] Re: Regression in dib0700 dvb-t driver

On Wed, 6 Feb 2013, Stephen Thirlwall wrote:

> Attached is another set of usbmon captures, as well as the dmesg output
> for each session.

I get the impression that this is a hardware bug, but there's not enough
data to be sure. Attached is a revised version of the diagnostic patch;
it will provide a lot more detail about what's going on. Once again, I'll
need to see both the dmesg logs and the usbmon traces.

Before starting each run, do "dmesg -C" to clear the kernel's log buffer.
(The boot-up messages aren't particularly relevant.) It's not necessary
to have more than one recording attempt per run.

Alan Stern

P.S.: Let me know if the attached patch doesn't survive the transition
from email.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the updated patch, Alan.

Stephen, I build a new test kernel with Alan's latest patch. The test kernel can be downloaded from:
http://people.canonical.com/~jsalisbury/lp1088733/

Can you test this kernel per Alan's and collect the data Alan requested in comment #18?

Thanks again!

Revision history for this message
Stephen Thirlwall (l-sdt) wrote :

Sorry for the delay.

Attached is new tracedata with Alan's second patch.

Two failed recording runs were captured, as well as one successful run included for comparison.
All of these runs were the first recording after a reboot.

Notably, with this diagnostic patch, approximately 50% of the first-time recordings succeeded.

Revision history for this message
Alan Stern (stern) wrote :

I finally had a chance to take a look at the new trace data set. Unfortunately it appears that the kernel was built without CONFIG_USB_DEBUG, making the data useless.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'll rebuilt the kernel with CONFIG_USB_DEBUG enabled and post a link to it shortly.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Stephen, I build a new test kernel with Alan's latest patch and CONFIG_USB_DEBUG enabled. The test kernel can be downloaded from:
http://people.canonical.com/~jsalisbury/lp1088733/

Can you test this kernel per Alan's and collect the data Alan requested in comment #18? Sorry for asking to collect the data an additional time.

Thanks again!

Revision history for this message
Stephen Thirlwall (l-sdt) wrote :

Not a problem Joseph.

With any luck I'll be able to give that a run later today.

Revision history for this message
Stephen Thirlwall (l-sdt) wrote :

Here's another set of captures, using the kernel from comment #23.

There's only failed recording captures here.

Alan, let me know if you'd like me to try capturing a successful recording.

Revision history for this message
Alan Stern (stern) wrote :

This is good; I think I see what the problem is.

It's actually very complicated, involving two software bugs and a hardware bug. The commit you identified fixes one of the software bugs, so it can't simply be reverted. But in doing so, it exposed the other software bug. My attempts at fixing that one have run afoul of the hardware bug, which cannot be fixed but only worked around.

This will take more time. I'll get back to you next week.

Revision history for this message
Stephen Thirlwall (l-sdt) wrote :

Thanks for taking the time to look into this Alan, and for all your help Joseph.

Revision history for this message
Alan Stern (stern) wrote :

Here's a patch to try; it is based on vanilla 3.8. It contains an attempted fix for the second software bug plus a work-around for the hardware bug.

If this doesn't work, I'd like to see another combined usbmon/dmesg output from a kernel built with this patch plus the most recent debugging patch (which should apply on top of this one with only minor offsets).

tags: added: patch
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the updated patch, Alan.

Stephen, I build a new test kernel with Alan's latest patch. The test kernel can be downloaded from:
http://people.canonical.com/~jsalisbury/lp1088733/

Can you test this kernel per Alan's and collect the data Alan requested in comment #28?

Thanks again!

Revision history for this message
Stephen Thirlwall (l-sdt) wrote :

Early indications are good!

I've installed this kernel and have done a few manual tests. Three reboots, and multiple recordings done, no failures as yet.

I'll now schedule a few days worth of wakeup-record-shutdown cycles and see what we get.

Revision history for this message
Stephen Thirlwall (l-sdt) wrote :

No recording failures so far, but see the next comment.

Revision history for this message
Stephen Thirlwall (l-sdt) wrote :

Joseph, could you please build me a kernel *without* Alan's patches?

I've been getting sporadic kernel panics during startup, and I want to make sure these aren't related to these patches.

I'm pretty sure they're not, as I've been getting them with all the 3.8 kernels, and have always assumed its because I'm running 12.04 with a much newer kernel. For example, my nvidia dkms modules don't build with 3.8 kernels.

I'd build one myself, but I'm only set up at the moment to build the ubuntu kernels (as per https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel), not the upstream mainline kernels.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Stephen,

I build a kernel withoug Alan's patches applied. This would be a stock Raring kernel based off v3.8.

The test kernel can be downloaded from:
http://people.canonical.com/~jsalisbury/lp1088733/

I've been seeing some random panics with the 3.8 kernel as well. If you are seeing what I am seeing they are related to the scheduler.

Revision history for this message
Stephen Thirlwall (l-sdt) wrote :

Thanks for that Joseph, but unfortunately that kernel's been built for amd64, and I'm on i386.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Sorry, Stephen. I'll build a 32 bit kernel now.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

A 32 bit kernel without Alan's patches can now be downloaded from:

http://people.canonical.com/~jsalisbury/lp1088733/

Revision history for this message
Stephen Thirlwall (l-sdt) wrote :

Thanks again Joseph.

Alan - I'm quite certain your patch fixes the problem for me.

I've now tried at least ten first-after-reboot recordings, and every single one has worked. I've also tried plenty of other combinations, including multiple simultaneous recordings, and no failures there either.

The kernel panics also happen in the unpatched kernel - this appears to be some unrelated issue.

Revision history for this message
Alan Stern (stern) wrote :

Good. I will submit the changes for inclusion in an upcoming stable kernel release.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for all your hard work on this bug, Alan. And thank you, Stephen, for finding this bug and all the help testing.

Revision history for this message
Stephen Thirlwall (l-sdt) wrote :

I'll second that - thanks Alan for the great detective work.

And Joseph, thanks for building all those kernels, and most importantly, for acting on this bug report in the first place.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Fixed by upstream commit:

commit feca7746d5d9e84b105a613b7f3b6ad00d327372
Author: Alan Stern <email address hidden>
Date: Fri Mar 1 10:51:15 2013 -0500

    USB: EHCI: don't check DMA values in QH overlays

Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.