raring-updates: regression: 3.8.0-24.35 causes sporadic failure of USB audio devices

Bug #1191603 reported by Steve Markgraf on 2013-06-16
92
This bug affects 15 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned

Bug Description

The 3.8.0-24.35 kernel update in raring introduced sporadic failure of USB audio devices. The symptoms are that the device suddenly stops working and you get the following dmesg output:

[ 7532.838274] cannot submit urb (err = -18)
[ 7532.838283] cannot submit urb (err = -18)
[ 7532.838291] cannot submit urb (err = -18)
[ 7532.842224] delay: estimated 288, actual 96
[ 7532.922160] delay: estimated 432, actual 96

This bug has been introduced by the following commits (UHCI: implement new semantics for URB_ISO_ASAP, OHCI: implement new semantics for URB_ISO_ASAP):
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c44b225077bb1fb25ed5cd5c4f226897b91bedd4
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6a41b4d3fe8cd4cc95181516fc6fba7b1747a27c

A fix has been released (USB: fix latency in uhci-hcd and ohci-hcd):
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e1944017839d7dfbf7329fac4bdec8b4050edf5e

The fix hasn't been backported to 3.8 yet, but as this bug was introduced to raring by pulling upstream kernel changes, it would be nice to backport the fix to raring as soon as possible.

For further reference:
http://markmail.org/message/gt554vejrjkutsfw#query:+page:1+mid:gt554vejrjkutsfw+state:results
https://bbs.archlinux.org/viewtopic.php?id=163891
http://comments.gmane.org/gmane.linux.usb.general/85343
http://lists.opensuse.org/opensuse-bugs/2013-05/msg01617.html

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1191603

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: raring
papukaija (papukaija) on 2013-06-16
tags: added: kernel-sound regression-update
removed: audio backport raring regression usb
tags: added: cherry-pick
tags: added: raring
Steve Markgraf (steve-m) wrote :

With regard to #1: There is no additional log output other than the kernel messages I already posted.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Steve Markgraf (steve-m) on 2013-06-17
tags: added: kernel-bug

I am seeing this as well, and it seems similar to what is described in http://comments.gmane.org/gmane.linux.usb.general/86939

There's a suggestion of a patch (already in 3.10-rc) that might help there, I'll try out one of the 3.10 saucy kernels (http://kernel.ubuntu.com/~kernel-ppa/mainline/) to see if they help first.

FWIW, I am on raring and this started happening only recently.

Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
status: Confirmed → In Progress
Joseph Salisbury (jsalisbury) wrote :

I performed a cherry pick of commit e194401 into Raring. I created a test kernel, which can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1191603/

Can you test that kernel and report back if it resolves this bug?

Thanks in advance!

Steve Markgraf (steve-m) wrote :

Yes, that kernel solves the issue. I tested recording and playback from an USB audio adapter for 7 hours, and the problem did not appear. It didn't last that long before the fix.

Julian Wiedmann (jwiedmann) wrote :

This commit just landed in raring-proposed (3.8.0-26.38).

Changed in linux (Ubuntu):
status: In Progress → Fix Committed
papukaija (papukaija) on 2013-06-18
tags: removed: kernel-bug
Leander Koornneef (bugs-h) wrote :

I was affected by this issue and installed kernel 3.8.0-26.38 from raring-proposed. This did not appear to have fixed the problem.

Steve Markgraf (steve-m) wrote :

I've seen the issue appear again with 3.8.0-26.38 now as well, but it happens much less often. Commit 6a41b4d added a new codepath that returns -EXDEV (which is -18), which seems to get triggered sometimes...
So I still consider this a kernel bug and we should let the guys at the lkml know.

See: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/host/ohci-hcd.c#n247

Changed in linux (Ubuntu):
status: Fix Committed → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Steve and Leander, does the kernel I posted in comment #4 eventually exhibit this bug, only sometimes?

Steve Markgraf (steve-m) wrote :

Yes, it just happened again with the kernel from #4:

Linux sleeper 3.8.0-25-generic #37~lp1191603v1 SMP Mon Jun 17 18:19:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

[46066.221371] cannot submit urb (err = -18)
[46066.222047] cannot submit urb (err = -18)
[46066.222055] cannot submit urb (err = -18)
[46066.222060] cannot submit urb (err = -18)
[46066.222065] cannot submit urb (err = -18)
[46066.222070] cannot submit urb (err = -18)
[46066.222074] cannot submit urb (err = -18)
[46066.222090] cannot submit urb (err = -18)
[46066.261031] delay: estimated 240, actual 96
[46066.301006] delay: estimated 240, actual 96
[46066.372960] delay: estimated 384, actual 96
[46066.428941] delay: estimated 384, actual 96
[46066.484890] delay: estimated 288, actual 96
[46066.524886] delay: estimated 336, actual 96
[46066.636879] delay: estimated 432, actual 96

Joseph Salisbury (jsalisbury) wrote :

Can you test the v3.10-rc6 kernel which is available at:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.10-rc6-saucy/

We should notify upstream if the bug still exists in v3.10-rc6.

Thanks in advance.

Leander Koornneef (bugs-h) wrote :

Hi Joseph,

I had not tried your kernel from #4 yet, but I will give 3.10-rc6 a go and let you know the result.

Leander

Anton Kosarev (a-p-kosarev) wrote :

I'm also getting these messages when using jackd, on both 3.10-rc6 and 3.8~lp1191603v1 kernels as well:

[ 124.118600] cannot submit urb (err = -18)
[ 124.118617] cannot submit urb (err = -18)
[ 156.537839] cannot submit urb (err = -18)
[ 156.542296] cannot submit urb (err = -18)

Steve Markgraf (steve-m) wrote :

I can confirm that the issue still appears with 3.10-rc6.

Leander Koornneef (bugs-h) wrote :

I can also confirm that the problem still exists with 3.10-rc6

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Alan Stern (stern) wrote :

How easy is it to duplicate the problem? If it isn't too hard, can somebody provide a usbmon trace showing what happens when the problem occurs?

Joseph Salisbury (jsalisbury) wrote :

I'd like to confirm that commits c44b225 and 6a41b4d are the cause of this bug. I created a test kernel, based off of v3.8-rc1, with those two commits reverted. This test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1191603/reverted/

Can folks affected by this bug confirm that kernel resolves this bug? Also, just for completeness, can the stock v3.8-rc1 kernel also be tested, to confirm it exhibits the bug? It can be downloaded from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8-rc1-raring/

Joseph Salisbury (jsalisbury) wrote :

Please place priority on collecting the data requested by Alan in comment #16 versus the testing I requested in comment #17.

Steve Markgraf (steve-m) wrote :

@jsalisbury: Did you revert https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c75c5ab575af7db707689cdbb5a5c458e9a034bb too? You need to do that to restore the old codepath, since the old code checks for (urb->transfer_flags & URB_ISO_ASAP) as well.

Steve Markgraf (steve-m) wrote :

@stern: Since the latency was changed back to 1ms the bug has become quite rare for me (one or twice in 12 hours of using the audio device), so the usbmon file will get quite big. Maybe it happens more often for someone of the other users affected. Also, what happens is that when the bug occurs just the URBs to the input endpoint (microphone) fail for me, the audio output continues to work. How is this for the other affected users?

I was only using the USB microphone (well, a webcam), so that was the only thing that stopped working. I haven't seen the problem resurface since I upgraded to 3.8.0-26-generic from raring-proposed:

Linux avetinja 3.8.0-26-generic #38-Ubuntu SMP Mon Jun 17 21:43:33 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

I am, however, not using the device more than one hour a day, and it doesn't happen in that time anymore (it used to happen 5 minutes in of a video call).

Also, would a usbmon log show any sensitive data like passwords being typed on a USB keyboard (it's connected to the same bus as my webcam)? I'd have to hit the problem again for it to be any use, of course.

Just got this to happen again for me, I'll be trying out relevant kernels.

FWIW, one thing that does happen for me is that after some time (a couple hours usually, though I haven't tracked it carefully, sometimes it was mere ~10 minutes), USB microphone starts working again for me.

Alan Stern (stern) wrote :

usbmon traces can include sensitive data. However, I don't need to see the entire trace. Just the portion in the vicinity of when the error occurred would be enough.

Also, it would be best if during the test, no other devices on the same bus were being used. Just the microphone.

I couldn't reproduce it with 3.8.0-26 from raring-proposed (as mentioned in #6) and restricted use of the keyboard and mouse, and no camera use in a test of 30 minutes. I'll go back to an unpatched kernel since it happens much more easily there anyway.

James Stone (jamesmstone) wrote :

I am having the same problem - particularly with trying to use low latency applications (jackd). It was reported in bug #1185563 but I think the two bugs are related.

I am still getting the same performance with the 1ms patch applied (3.8.0-26-lowlatency).

I am attaching the output from lsusb

James Stone (jamesmstone) wrote :

.. I mean usbmon!

Steve Markgraf (steve-m) wrote :

For a quick test I patched my kernel to print more info, maybe it's of some help: http://steve-m.de/files/ohci-hcd_log.txt

Alan Stern (stern) wrote :

James, your problem appears to be completely different,because the usbmon data you attached shows a device running at high speed and therefore not attached to a UHCI controller. In this case, the problem is caused by your application -- it is attempting to use a latency of 1 ms for its audio input, and the USB subsystem just isn't fast enough for that. (By contrast, the audio output uses a latency of 7 ms, and the sync channel uses a latency of 4 ms).

Steve, your output merely shows that the system fell behind, which we already know. It would help to see a usbmon trace, even if it doesn't show the problem. The first few seconds of a session would be a good start; at least I would be able to tell what the expected latencies are.

In all, cases, it should be possible to configure the application to avoid these problems by making the latency requirements less restrictive.

James Stone (jamesmstone) wrote :

Thanks for looking at my usbmon output and sorry if i am confusing this bug report with a different issue. I would be surprised if mine is a completely different bug as it appeared at the same time (in terms of kernel releases) and gives the same error message. I am obviously using lower latency but that is required for realtime audio. Sure I can increase the latency but then the system is no longer usable for realtime audio tasks (while 3.5 is fine)

Alan Stern (stern) wrote :

It's hard to believe that a latency of 1 ms is usable but a latency of 2 ms (or even 1.5 ms) is not.

The difference between the current kernel and 3.5 is that the driver in 3.5 violated the hardware requirements in the EHCI spec. It may have worked for you, but the result was by no means guaranteed.

James Stone (jamesmstone) wrote :

The difference is on 3.5 I can start jack with 64 frames/period, which is a round trip latency of 2.7 ms. On 3.8.0-19 and later kernels, the device will not start with jack with less than 512 frames/period, which is a round trip latency of 21.3 ms - an order of magnitude different. This is not really usable for some audio applications where realtime monitoring of the processed signal is required.

Alan Stern (stern) wrote :

I'm not sure what you mean by frames and periods, but 21.3 ms is much larger than the minimum latency is supposed to be -- it is supposed to be around 1 - 2 ms. The usbmon trace attached to comment #25 showed three channels in use:

   The audio-out channel had a pipeline of 8 transfers, each containing 7 packets worth of data, with an interval between packets of 1 microframe (a microframe = 1/8 ms). Thus the latency was 7 ms.
(Each packet contained 96 bytes of data.)

   The audio-sync channel had a pipeline of 4 transfers, each containing 1 packet with an interval of 8 microframes, for a latency of 4 ms.

  The audio-in channel had a pipeline of 8 transfers, each containing 1 packet with an interval of 1 microframe, for a latency of 1 ms -- which was too small and resulted in the errors you saw. (Oddly, each packet had room for 1024 bytes of data but the device sent only 48 bytes.)

Maybe you can figure out from this how your frames and periods are related to the transfers, packets, and microframes used on the USB bus. In any case, if the number of packets in each of the audio-in transfers was increased to 2, the latency would go up to 2 ms and everything should work.

If it doesn't, please attach another usbmon trace showing the problem.

Steve Markgraf (steve-m) wrote :

Alan, here are my usbmon traces with both the old working 3.8 kernel before your changes, and 3.10-rc7:
http://steve-m.de/files/ohci/

Note that those are just the first few seconds of a VoIP-call, and the problem didn't occur. I'll try to get traces of the problem appearing.

Alan Stern (stern) wrote :

The two traces are very similar. They show that both the audio-out and audio-in channels used latencies of 8 ms, which should have been absolutely fine. The only reasons I can think of for a problem to occur would be a bug in the OHCI hardware (which is not unheard of) or something disabling interrupts for more than 8 ms (which seems unlikely).

A usbmon trace showing the problem would help.

James Stone (jamesmstone) wrote :

I'm not really sure what to do, because the next step down from 512 in terms of jack latency is 256, and with that I get the errors I posted above. Would it be worth me posting a usbmon output for the 3.5 kernel at that latency?

James Stone (jamesmstone) wrote :

Just in case it's useful, here is the output from 3.5 with jack running at 256. It generated a lot more data in a short period of time (12M vs 1.5M in about the same time).

Alan Stern (stern) wrote :

Of course you got a lot more output. That's because the audio-in worked under 3.5 but not under 3.8. It generated something like 7 times as much data from usbmon as the audio-out, partly because of the ridiculously low latency.

I assume the 3.8 usbmon trace was also done with jack set to 256. Can you provide a short trace (1000 lines would be more than enough) showing jack at 512?

Part of the difficulty here is that I don't understand your terminology. What do you mean by a "frame" and a "period"? You wrote above than 256 "frames/period" gives a round-trip time of 21.3 ms, implying that one "frame" is 21.3 ms / 512 = 41.6 us. Where does that number come from and what does it mean? Is it a coincidence that this value is 1 / (24 KHz)?

For that matter, what sampling parameters did you use for the audio-in? That is, bits per data value, data values per sample (i.e., mono vs. stereo), and samples per second?

Finally, it's worth mentioning that what I wrote earlier wasn't quite right. For the audio-in channel, the total pipeline length was indeed 1 ms. But for incoming data, the latency is the time between transfers, not the pipeline length, so the latency was actually 125 us. Neither value is at all close to your implication that 256 "frames/period" gives a round-trip time of 10.7 ms.

James Stone (jamesmstone) wrote :

Here is 3.8 usbmon with jackd set at a period of 512.

This is what the jackd manual page has to say about these things:

      -p, --period int
              Specify the number of frames between JACK process() calls. This
              value must be a power of 2, and the default is 1024. If you
              need low latency, set -p as low as you can go without seeing
              xruns. A larger period size yields higher latency, but makes
              xruns less likely. The JACK capture latency in seconds is
              --period divided by --rate.

(this is what is set to 256 or 512 etc..)

-- rate is set to 44100 (in the last capture - sometimes it is set to 48000)

Her e is some more (user-level) documentation: https://help.ubuntu.com/community/HowToJACKConfiguration

As you can see from this Ubuntu documentation, a normal/good setting is considered to be a period of 128 (and many people try to use 32 or 64). 512 is really too high for realtime audio purposes.

What is strange is that 3.5 starts fine down to these low period sizes, even though performance might be degraded (with xruns or whatever), but 3.8 does not even start - the device seems to lock up with lights flashing on and off.

Alan Stern (stern) wrote :

You didn't say what your data value size was. From looking at the data, it appears to be 4 bytes (i.e., 32-bit audio -- maybe only 24 of those 32 bits are actually used).

You didn't say how many data values per sample you were using. From looking at the data, it appears to be one (i.e., mono).

The documentation file you mentioned talks about frames, periods, and buffers. The manual page explains what a period is but not a buffer or a frame. It appears that a frame is the same as what I've been calling a "sample".

You didn't say what you were using for periods/buffer. From what you wrote in an earlier comment, it appears to be 2.

Maybe you can understand why I'm feeling a little frustrated. It would be nice to have real answers to my questions instead of having to guess.

The documentation file doesn't explain how the latency is calculated. From looking at the example, it appears to be frames/period times periods/buffer divided by the sample rate. If I understand correctly, this is the right formula for the output latency, but it is wrong for the input latency. The input latency should be frames/period divided by the sample rate.

Putting together the numbers I guessed for your session, the calculated latency would be 256 * 2 / 44100 = 11.6 ms. But the calculated value is wrong; the correct value is 256 / 44100 = 5.8 ms. Raising the frames/period to 512 will double these values.

Why are input latencies computed differently from output latencies? To explain, let's consider a simple example. Suppose we use 256 samples/period and 8 periods/buffer. If we try to change the output right now, the changed value goes at the end of the current buffer and it doesn't reach the hardware until 256*8 sample times have elapsed. Hence the output latency is 2048 sample times (and of course, the sample time is one over the sampling rate).

Now suppose we use the same parameters for input. If the input changes right now, the changed value will get stored in the current period, and we will see it when the current period elapses. Thus the latency will be 256 sample times, not 2048.

Part of our problem here is that the settings you specify affect how jack communicates with the kernel's audio driver, but they don't directly reflect how the kernel's audio driver communicates with the USB driver. The usbmon trace for 3.5 shows that the audio driver was really using the equivalent of 5.5 frames/period and 8 periods/buffer. This means the actual latency (as far as the kernel is concerned) was 5.5 / 44100 = 0.125 ms, as I mentioned earlier. The usbmon trace for 3.8 shows that the audio driver was really using the equivalent of 11 frames/period and 8 periods/buffer, for a latency of 11 / 44100 = 0.25 ms. Undoubtedly, either of these is a value you could easily live with.

I don't know how the audio driver converts the settings it gets from jack to the settings it sends to the USB driver, but that appears to be where the problem lies.

James Stone (jamesmstone) wrote :

Hi, sorry for not providing all the required info. You were correct in all your guesses except the audio is stereo. My understanding is the latency is double what you calculated as "correct" because the input and output latencies (as jack sees them) have to be added to get the "round trip" latency.

Thanks so much for your help with this so far. What is the next step? Is there anything more I can do?

Alan Stern (stern) wrote :

I guess it's a matter of what you mean by "latency".

For the next step, I'm going to have to ask for help from somebody who understands the snd-usb-audio driver. In the meantime, there is one more piece of information you can provide. We should see what lsusb has to say about your sound device; please attach the output from "lsusb -v -s 1:2".

Alan Stern (stern) wrote :

By the way, what about the audio-out stream? The usbmon traces show that it uses about twice the bandwidth of the audio-in. Is it 4-channel instead of stereo? Or does it have a sampling rate of 88 KHz?

James Stone (jamesmstone) wrote :

lsusb -v -s 1:2 report attached.

(It is actually a Focusrite Scarlett 2i4)

And it has 4 channels of output.
Thanks!

James, your device reports that it could send 8192000 bytes per second:
> wMaxPacketSize 0x0400 1x 1024 bytes

The driver believes this, gets confused, and uses URB sizes that are much smaller than appropriate.

I'll fix this in the driver.

Steve Markgraf (steve-m) wrote :

Alan, here's a capture of the issue happening:
http://steve-m.de/files/ohci/3.10.0-031000rc7-generic_problem.tar.gz

Note that the microphone is opened a few seconds later than the output.
I also can post the output of lsusb if it helps.

Kernel output:

[12908.902956] delay: estimated 240, actual 96
[12921.326969] cannot submit urb (err = -18)
[12921.327741] cannot submit urb (err = -18)
[12921.327761] cannot submit urb (err = -18)
[12921.327776] cannot submit urb (err = -18)
[12921.327790] cannot submit urb (err = -18)
[12921.327802] cannot submit urb (err = -18)
[12921.327835] cannot submit urb (err = -18)
[12921.327849] cannot submit urb (err = -18)
[12921.645567] delay: estimated 336, actual 96
[12938.884972] delay: estimated 240, actual 96

Steve Markgraf (steve-m) wrote :

See the attachment for my lsusb output.

Alan Stern (stern) wrote :

The error shows up clearly in the usbmon trace. It resulted from the fact that no events occurred during an 8-ms time period; normally an audio-in transfer would finish up every ms.

Why were there no events during that time? I can't tell; the two most likely explanations are a hardware bug in your OHCI USB controller, or some other driver leaving interrupts disabled for far too long.

The attached patch may help. It will attempt to print some information to the system log when one of these gaps occurs. This ought to pin down where the source of the problem lies.

Alan Stern (stern) wrote :

Oops, I attached the wrong patch file by mistake. Here's the right one.

tags: added: patch
Steve Markgraf (steve-m) wrote :

Here's the output with your patch applied:
http://steve-m.de/files/ohci/3.10.0-2.9_alan-debug.tar.gz

Steve Markgraf (steve-m) wrote :

And attached to this comment you'll find the output of lspci -v and /proc/interrupts.

Alan Stern (stern) wrote :

Something went wrong with that last test. I'm not sure why -- maybe the patch was tracking the wrong endpoint. As just one example, the output should have stopped after those "cannot submit urb" errors.

Anyway, here's a revised version of the patch. It will provide slightly different output, and hopefully not so much as before. When running the test, you should try to avoid using any other USB devices as much as possible. In fact, unplug as many of them as you can.

The first line of output from the patch will be labelled "Alan start"; I'll need to see it to verify that it is working properly. After that, there should not be much more output until the problem occurs. If there is a lot of output in the first few seconds of the test, you might as well stop right there and post the results.

Steve Markgraf (steve-m) wrote :

Unfortunately your latest patch causes a kernel lockup when attaching the USB audio device:
http://steve-m.de/files/ohci/alan_patch2_hang.txt

Alan Stern (stern) wrote :

Oops, my fault. This one should work better.

SnakeEyes1105 (sfischer3) wrote :

What's the current status?
I'm also affected by this bug. I'm running Ubuntu Studio with the latest raring-proposed low latency kernel but same problem here.
Using the Focusrite 2i2 Interface

James Stone (jamesmstone) wrote :

SnakeEyes1105 - there seem to be at least 2 problems with the Focusrite devices, which Allan and Clemens have been working with me on. One is that the Focusrite Scarlett devices report wMaxPacketSize of 0x0400 1x 1024 bytes, which is too ridiculously large. Fixing this bug has got rid of the cannot submit URB error messages, but the performance is still quite degraded compared to the 3.5.x series of kernels, with increased xruns and inability to start the device at very low latencies. I am not sure if it is best to report further info on this bug, or on bug #1185563 (I suspect the latter, as bug #1191603 also occurs with non-realtime uses, which Allan is continuing to investigate). However, progress on the degraded performance for realtime purposes on Focusrite Scarlett devices seems to have ground to a halt for now.

For practical recording/music-making purposes, I am sticking with the 3.5.x low latency kernels from UbuntuStudio.

James

Steve Markgraf (steve-m) wrote :

Alan, unfortunately your debug patch is spamming my klog, even if the audio device isn't opened or I disconnect it again. See the attachment.

Alan Stern (stern) wrote :

Sorry about that. This new version has been tested on my own machine, so it definitely won't crash or otherwise mess up.

Steve Markgraf (steve-m) wrote :

Okay, it seems to work this time. See the attached file for the klog output of the issue happening.

Alan Stern (stern) wrote :

The debugging output indicates that interrupts were disabled for something like 9 ms, which means the problem is caused by some other part of the kernel, not by ohci-hcd.

To debug farther, use the ftrace facility. Complete instructions are in Documentation/trace/ftrace.txt, if you're interested. For our purposes here, you should start the irqsoff tracer just before running a test:

   cd /sys/kernel/debug/tracing
   echo 0 > options/function-trace
   echo irqsoff > current_tracer
   echo 1 > tracing_on
   echo 0 > tracing_max_latency
   ... run the test ...
   echo 0 > tracing_on
   cat trace

The contents of the "trace" file are what we need to see. You can do this with the regular ohci-hcd driver; the diagnostic patch isn't necessary.

James Stone (jamesmstone) wrote :

Some more info - just got a hard lockup while playing a game (F4 BMS on wine) due to what looks like the same bug. This is the last entry in syslog before the lockup happened:

Jul 22 23:35:02 blueberry rtkit-daemon[2850]: Supervising 3 threads of 3 processes of 1 users.
Jul 22 23:36:37 blueberry pulseaudio[2846]: [alsa-sink] alsa-sink.c: ALSA woke us up to write new data to the device, but there was actually nothing to write!
Jul 22 23:36:37 blueberry pulseaudio[2846]: [alsa-sink] alsa-sink.c: Most likely this is a bug in the ALSA driver 'snd_usb_audio'. Please report this issue to the ALSA developers.
Jul 22 23:36:37 blueberry pulseaudio[2846]: [alsa-sink] alsa-sink.c: We were woken up with POLLOUT set -- however a subsequent snd_pcm_avail() returned 0 or another value < min_avail.
Jul 22 23:39:49 blueberry kernel: [32399.785562] cannot submit urb (err = -18)
Jul 22 23:39:49 blueberry kernel: [32399.785576] cannot submit urb (err = -18)
Jul 22 23:39:49 blueberry kernel: [32399.785585] cannot submit urb (err = -18)
Jul 22 23:39:49 blueberry kernel: [32399.785592] cannot submit urb (err = -18)
Jul 22 23:46:27 blueberry kernel: imklog 5.8.11, log source = /proc/kmsg started.

Changed in linux (Ubuntu):
assignee: Joseph Salisbury (jsalisbury) → nobody
Joseph Salisbury (jsalisbury) wrote :

It appears bug 1103018 could be a duplicate.

James Stone (jamesmstone) wrote :

Reading #1103018, I realised my realtime bug with Scarlett 2i4 is definitely not the same. The bug I reported is present in 3.7.0 (which #1103018 is not) - but not on 3.6.11 so I will continue my reporting on #1185563. Still not sure which commit lead to the bug on my device. - Still, please ignore my posts above, as they are related to a separate issue.

Alan Stern (stern) wrote :

Steve, any progress on the irqsoff tracing?

Steve Markgraf (steve-m) wrote :

I gave it a short try last week, but failed. After taking a look at the default Ubuntu kernel configuration I found out why:
# CONFIG_IRQSOFF_TRACER is not set

Since I'm currently on vacation and don't have the hardware with me, I'll give it a try next week with the irqsoff-tracer enabled.

Paul Tagliamonte (paultag) wrote :

This bug hit me real hard, and *is* present up to Debian's 3.10~rc7-1~exp1

Note: This *INCLUDES* e1944017839d7dfbf7329fac4bdec8b4050edf5e as far as I can tell.

Alan Stern (stern) wrote :

Paul, what kind of USB controller is your sound card attached to (UHCI, OHCI, or EHCI)?

Paul Tagliamonte (paultag) wrote :

Alan, thanks for the quick response

I'm using either EHCI or XHCI (I *think* it's XHCI? It's plugged into my Lenovo dock), I'm at work and can't figure out if usbtree will work with modern kernels :))

Bus 001 Device 003: ID 17ef:100a Lenovo ThinkPad Mini Dock Plus Series 3

(is the dock)

Bus 001 Device 010: ID 1395:0020 Sennheiser Communications

(is the device)

I just switched this machine to Debian, so this is just a data point, I found this bug researching the issue -- I can test with a saucy CD later tonight to verify it's present in Ubuntu. (in case someone else stumbled upon this as I did)

Alan Stern (stern) wrote :

If it's xHCI then there's nothing I can do. If it's EHCI then the attached patch ought to help. For debugging purpose it writes a warning message to the kernel log whenever an underrun occurs, but it prevents the underrun from causing a failure.

It turns out there are two bugs in the snd-usb-audio driver, either of which may be affecting you. There's a patch for one of the bugs at <http://marc.info/?l=linux-usb&m=137288343602140&w=2>, but there's no fix for the other bug (not enough URBs submitted).

I'll test this out tonight, thanks!

On Thu, Aug 1, 2013 at 2:58 PM, Alan Stern <email address hidden> wrote:
> If it's xHCI then there's nothing I can do. If it's EHCI then the
> attached patch ought to help. For debugging purpose it writes a warning
> message to the kernel log whenever an underrun occurs, but it prevents
> the underrun from causing a failure.
>
> It turns out there are two bugs in the snd-usb-audio driver, either of
> which may be affecting you. There's a patch for one of the bugs at
> <http://marc.info/?l=linux-usb&m=137288343602140&w=2>, but there's no
> fix for the other bug (not enough URBs submitted).
>
> ** Attachment added: "Accept URB submissions during underruns, with a warning"
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1191603/+attachment/3757297/+files/ehci-iso-accept
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1191603
>
> Title:
> raring-updates: regression: 3.8.0-24.35 causes sporadic failure of USB
> audio devices
>
> Status in “linux” package in Ubuntu:
> Triaged
>
> Bug description:
> The 3.8.0-24.35 kernel update in raring introduced sporadic failure of
> USB audio devices. The symptoms are that the device suddenly stops
> working and you get the following dmesg output:
>
> [ 7532.838274] cannot submit urb (err = -18)
> [ 7532.838283] cannot submit urb (err = -18)
> [ 7532.838291] cannot submit urb (err = -18)
> [ 7532.842224] delay: estimated 288, actual 96
> [ 7532.922160] delay: estimated 432, actual 96
>
> This bug has been introduced by the following commits (UHCI: implement new semantics for URB_ISO_ASAP, OHCI: implement new semantics for URB_ISO_ASAP):
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c44b225077bb1fb25ed5cd5c4f226897b91bedd4
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6a41b4d3fe8cd4cc95181516fc6fba7b1747a27c
>
> A fix has been released (USB: fix latency in uhci-hcd and ohci-hcd):
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e1944017839d7dfbf7329fac4bdec8b4050edf5e
>
> The fix hasn't been backported to 3.8 yet, but as this bug was
> introduced to raring by pulling upstream kernel changes, it would be
> nice to backport the fix to raring as soon as possible.
>
> For further reference:
> http://markmail.org/message/gt554vejrjkutsfw#query:+page:1+mid:gt554vejrjkutsfw+state:results
> https://bbs.archlinux.org/viewtopic.php?id=163891
> http://comments.gmane.org/gmane.linux.usb.general/85343
> http://lists.opensuse.org/opensuse-bugs/2013-05/msg01617.html
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1191603/+subscriptions

--
All programmers are playwrights, and all computers are lousy actors.

#define sizeof(x) rand()
:wq

Daniel Stöckner (da567l) wrote :
Download full text (5.3 KiB)

This bug also affects the Lexicon Omega USB audio interface on Precise with linux-generic-lts-raring 3.8.0.27.27. This happens during ordinary mp3 playback with pulseaudio/alsa.

kernel: [ 1139.189653] usb 1-1.3.3.1: new full-speed USB device number 16 using ehci-pci
kernel: [ 1139.332530] usb 1-1.3.3.1: New USB device found, idVendor=1210, idProduct=0002
kernel: [ 1139.332552] usb 1-1.3.3.1: New USB device strings: Mfr=1, Product=2, SerialNumber=7
kernel: [ 1139.332556] usb 1-1.3.3.1: Product: Lexicon Omega
kernel: [ 1139.332559] usb 1-1.3.3.1: Manufacturer: Lexicon
kernel: [ 1139.332561] usb 1-1.3.3.1: SerialNumber: Lex1.00
kernel: [ 1139.346377] 16:1:1: cannot get freq at ep 0x1
kernel: [ 1139.350325] 16:1:2: cannot get freq at ep 0x1
kernel: [ 1139.355423] 16:2:1: cannot get freq at ep 0x83
kernel: [ 1139.359895] 16:2:2: cannot get freq at ep 0x83
kernel: [ 1139.375996] 16:4:1: cannot get freq at ep 0x2
kernel: [ 1139.380042] 16:4:2: cannot get freq at ep 0x2
kernel: [ 1139.384874] 16:5:1: cannot get freq at ep 0x85
kernel: [ 1139.389146] 16:5:2: cannot get freq at ep 0x85
kernel: [ 1139.407356] 16:9:1: cannot get freq at ep 0x1
kernel: [ 1139.411508] 16:9:2: cannot get freq at ep 0x1
kernel: [ 1139.416279] 16:10:1: cannot get freq at ep 0x83
kernel: [ 1139.420440] 16:10:2: cannot get freq at ep 0x83
mtp-probe: checking bus 1, device 16: "/sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3/1-1.3.3/1-1.3.3.1"
mtp-probe: bus: 1, device: 16 was not an MTP device
AptDaemon: INFO: Quitting due to inactivity
AptDaemon: INFO: Quitting was requested
kernel: [ 1940.846601] 16:1:1: cannot get freq at ep 0x1
kernel: [ 1940.850312] 16:1:1: cannot get freq at ep 0x1
kernel: [ 1957.006708] delay: estimated 266, actual 1
kernel: [ 1966.935031] delay: estimated 353, actual 1
kernel: [ 1966.941558] delay: estimated 90, actual 1
kernel: [ 2001.866327] delay: estimated 353, actual 0
kernel: [ 2066.780996] cannot submit urb (err = -18)
kernel: [ 2066.781004] delay: estimated 354, actual 1
kernel: [ 2066.781028] cannot submit urb (err = -18)
kernel: [ 2066.781032] delay: estimated 353, actual 0
kernel: [ 2066.781037] cannot submit urb (err = -18)

On linux-generic-lts-quantal 3.5.0.23.35 to 3.5.0.37.43 the log is different, because of the different source code:

kernel: [ 1471.409908] delay: estimated 353, actual 0
kernel: [ 1501.400538] delay: estimated 353, actual 0
kernel: [ 1501.406980] delay: estimated 89, actual 0
kernel: [ 1541.283393] delay: estimated 353, actual 1
kernel: [ 1551.324519] delay: estimated 354, actual 1
kernel: [ 1551.324555] delay: estimated 354, actual 1
kernel: [ 1553.385436] delay: estimated 309, actual 1
pulseaudio[2456]: [alsa-sink] alsa-sink.c: ALSA weckte uns auf, um neue Daten auf das Gerät zu schreiben, doch es gab nichts zum Schreiben!
pulseaudio[2456]: [alsa-sink] alsa-sink.c: Dies ist höchstwahrscheinlich ein Fehler im ALSA-Treiber 'snd_usb_audio'. Bitte melden Sie diesen Fehler den ALSA-Entwicklern.
pulseaudio[2456]: [alsa-sink] alsa-sink.c: Wir wurden durch das POLLOUT-Set geweckt, allerdings lieferte ein anschliessender snd_pcm_avail() den Wert 0 oder einen anderen Wert < min_avail.
kernel: [ 1556.317540] delay: estimat...

Read more...

### I seem to be affected by this bug with the current raring kernel:

david@skyhawk:~$ uname -r
3.8.0-27-generic

Aug 4 21:24:32 skyhawk kernel: [21591.992070] usb 1-6: new high-speed USB device number 15 using ehci-pci
Aug 4 21:24:33 skyhawk kernel: [21592.220296] usb 1-6: New USB device found, idVendor=041e, idProduct=4095
Aug 4 21:24:33 skyhawk kernel: [21592.220306] usb 1-6: New USB device strings: Mfr=3, Product=1, SerialNumber=2
Aug 4 21:24:33 skyhawk kernel: [21592.220313] usb 1-6: Product: Live! Cam Sync HD VF0770
Aug 4 21:24:33 skyhawk kernel: [21592.220319] usb 1-6: Manufacturer: Creative Technology Ltd.
Aug 4 21:24:33 skyhawk kernel: [21592.220325] usb 1-6: SerialNumber:
Aug 4 21:24:33 skyhawk kernel: [21592.225765] uvcvideo: Found UVC 1.00 device Live! Cam Sync HD VF0770 (041e:4095)
Aug 4 21:24:33 skyhawk kernel: [21592.237697] input: Live! Cam Sync HD VF0770 as /devices/pci0000:00/0000:00:1d.7/usb1/1-6/1-6:1.0/input/input18
Aug 4 21:25:15 skyhawk kernel: [21634.298726] cannot submit urb (err = -18)
Aug 4 21:26:32 skyhawk kernel: last message repeated 7 times

### The video is not affected, but the audio recording by the microphone just stops when the "cannot submit urb" error shows
### up. It only works for a short before it stops. It sounds like Alan's patch above might workaround the problem in this case.
### I'll try it when I get a chance and get back to you...

Alan Stern (stern) wrote :

Will the people who are experiencing trouble with audio devices attached to EHCI controllers please try the attached patch? It won't fix the underlying problems (underruns in the data stream) but it will prevent them from causing fatal errors.

I can provide a similar patch for OHCI controllers, if anyone needs it.

Daniel Stöckner (da567l) wrote :

Results for post #17 from Joseph and #72 from Alan for Lexicon Omega:

http://kernel.ubuntu.com/~jsalisbury/lp1191603/reverted/
ok for 10 hours, test aborted

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8-rc1-raring/
ok for 3.5 hours, test aborted

git://kernel.ubuntu.com/ubuntu/ubuntu-raring.git 3.8.13.5 WITHOUT patch
audio stopped after 9 minutes:
cannot submit urb (err = -18)

git://kernel.ubuntu.com/ubuntu/ubuntu-raring.git 3.8.13.5 WITH patch
ok for 9 hours, still running

So Alans patch works for me, thanks!

I have applied Alan's patch to the 3.8.0-27-generic version of the kernel. It appears to alleviate the problem so far. I am able to use the microphone on my webcam for more than a few seconds. This is what appears in dmesg and the kernel log, but the mic still works:

[ 460.699992] retire_capture_urb: 48 callbacks suppressed

I'll let you know if the patch stops working or causes any other problems I haven't seen yet. Looking forward to a permanent fix...

Thanks!

Alan Stern (stern) wrote :

Steve: Since you're using an OHCI controller, the fixes for EHCI won't help you. Please try the attached patch and let me know how it works. If you can run the irqsoff tracer too, that would be great.

Federico M. Pires (fpires82) wrote :

I've been following this, since I was affected using Arch Linux too. It seems a commit has been included in 3.10.8 for the workaround of accepting late URBs submissions:

https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.10.8

commit c72a0e036f9d80c609e608a723751343f1f5e9fc

USB: EHCI: accept very late isochronous URBs

Alan Stern (stern) wrote :

Steve, as the original bug reporter, it's up to you to verify whether or not the patch attached to comment #75 fixes your problem.

Daniel Stöckner (da567l) wrote :

Tested kernel 3.10.13 with EHCI sound adapter, commit c72a0e036f9d80c609e608a723751343f1f5e9fc included. Works perfectly. Thank you all!

Joseph Salisbury (jsalisbury) wrote :

The following commit is now in Raring:
07b7319 - USB: EHCI: accept very late isochronous URBs

I'll move this bug to Incomplete for now. Steve Markgraf, can you confirm this bug is resolved by testing with the latest updates? If it is, please change the status to "Fix Released".

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Steve Markgraf (steve-m) wrote :

As this issue was very annoying, I bought a headset-adapter and switched away from using a USB audio device to using the built-in audio adapter of my Thinkpad X121e, so this issue was off my radar. Sorry for the lack of feedback. I've just given the latest kernel a test and it still happens, which is to be expected, given that my device is connected to an OHCI-controller, not EHCI. I'll give Alan's latest OHCI-patch a try.

Steve Markgraf (steve-m) wrote :

It's looking good so far, I couldn't reproduce the issue with Alan's OHCI-patch (#75) applied.

Minsu Park (kingssa) wrote :

Hi,
I got a side-effect because of this patch. My webcam use EHCI not OHCI or UHCI, And I'm using linux-3.8.xx. After patching this issue, the failure of urb submit problem was gone. But I got a side-effect. Too many warning messages makes my system to be hang.
A lot of kernel warning messages were appeared.
How can I solve this problem??
Thanks,
Minsu

Alan Stern (stern) wrote :

Please be more specific. Which patch causes the side effect?

Also, please attach a portion of your system log showing the warning messages.

Minsu Park (kingssa) wrote :

I applied below patch in my linux-3.8.13.
https://lkml.org/lkml/2013/8/18/203
<"[ 35/45] USB: EHCI: accept very late isochronous URBs">

After then, I got a lot of warning messages like as below.
[ 133.760978] ehci 100f0000.ehci: iso underrun e07c1840 (3375+0 < 5472)
[ 133.760999] ehci 100f0000.ehci: iso underrun e07c1d80 (3383+0 < 5473)
[ 133.761023] ehci 100f0000.ehci: iso underrun e07c1e40 (3391+0 < 5473)
[ 133.761046] ehci 100f0000.ehci: iso underrun e07c16c0 (3399+0 < 5473)
[ 133.761067] ehci 100f0000.ehci: iso underrun e07c10c0 (3407+0 < 5473)
[ 133.761088] ehci 100f0000.ehci: iso underrun e07c1f00 (3415+0 < 5473)
[ 133.761110] ehci 100f0000.ehci: iso underrun e07c1000 (3423+0 < 5474)
[ 133.761132] ehci 100f0000.ehci: iso underrun e07c1900 (3431+0 < 5474)
[ 133.761152] ehci 100f0000.ehci: iso underrun e07c1840 (3439+0 < 5474)
[ 133.761172] ehci 100f0000.ehci: iso underrun e07c1d80 (3447+0 < 5474)
[ 133.761191] ehci 100f0000.ehci: iso underrun e07c1e40 (3455+0 < 5474)
[ 133.761211] ehci 100f0000.ehci: iso underrun e07c16c0 (3463+0 < 5474)
[ 133.761233] ehci 100f0000.ehci: iso underrun e07c10c0 (3471+0 < 5475)
[ 133.761253] ehci 100f0000.ehci: iso underrun e07c1f00 (3479+0 < 5475)
[ 133.761271] ehci 100f0000.ehci: iso underrun e07c1000 (3487+0 < 5475)
[ 133.761292] ehci 100f0000.ehci: iso underrun e07c1900 (3495+0 < 5475)
[ 133.761312] ehci 100f0000.ehci: iso underrun e07c1840 (3503+0 < 5475)
[ 133.761331] ehci 100f0000.ehci: iso underrun e07c1d80 (3511+0 < 5475)
.......

When audio process was dead, my webcam does not recognize about process dead. So webcam continuously send audio data to my system.
At that time, continuously above warning messages were occured, and my system was slowed.
As I knew, this message is in irq routine. If system print a lot of messages, system will be slowed.
How can I solve this problem?

Thanks
Minsu

Alan Stern (stern) wrote :

First, those are not warning messages; they are debugging messages. Therefore they should not be stored in your system log or printed out.

Second, if you are using a USB webcam then the webcam can't send audio data to the system if no processing is asking for it.

Third, the messages indicate that the audio I/O was blocked for more than 0.25 second. That's a very long time for this sort of thing. You might try to find out what caused that blockage and fix it.

Minsu Park (kingssa) wrote :

Thanks for your kind response.
Yes, the printing level of my system was "debug", I did not know about it.-);
And the point of my question is. When these debug messages were printed, Is system possible to be slowed?
When my audio process was dead, these log were printed and the time of making coredump was delayed.
Anyway, thanks a lot.

Alan Stern (stern) wrote :

Printing the debug messages shouldn't slow down your system very much. However, the audio driver could indeed slow down your system when a delay occurs like this.

If you want to check whether printing the debug messages slows down your system, all you have to do is change the system's printing level and see what happens.

tags: added: kernel-da-key
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.