HID controller thread hangs randomly on Windows

Bug #1426925 reported by shalty
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mixxx
Triaged
Medium
Unassigned

Bug Description

Hi

This is a problem i have seen/suffered since some months, first with a Hercules Dj Control MP3 and now with a Gemini G2V (always on windows environment, w7 or greater, both x86 and x64), and it's a random loss of the controller (Mixxx works fine, but controller seems off, no input or output, and hangs at Mixxx exit waiting for the controller thread to stop)

To keep things simple, what i think i've found looking at the source and doing my own tests, is that you can't (with lastest hidapi and Mixxx 1.12 from two days ago) send and retrieve data at the same time from a HID controller (under windows, at least!). Removing the hid_write() calls the code doesn't hangs ever (or so it seems).

The call to hid_read_timeout(), at least in windows, seems to return almost immediatly, this generates also a lot of traffic for the mapping script (incomingData unnecessary calls).

So i have done some changes on HidReader::run() to work around this:

-The first thing is to make a copy of the incoming data, to see if the next call is any different. If not, don't emit incomingData signal.
-Next, put a simple mutex to control access to the HID device: on the HidReader, i modify a variable while reading, and before calling hid_write(), i look up if i can or not. If i cannot i simply "drop" the packet (output is not as "vital").

With those changes, i could see that the hid_write call doesn't get a lot of opportunities to run (almost none), because the hid_read_timeout() call returns inmediatly on a continous loop.

So, i put a little usleep(1) after hid_read_timeout().

With those changes, debug build and all, seems to go fine (i have still to test it more time, but have sense): low latency, responsive, 20-30% cpu while performing... good! :). And as it doesn't hang, Mixxx can quit just fine.

The controller thread is shared between midi and HID devices, that explains why when i was using both at the same time (nanokontrol with midi and hercules with hid), i will always lost both at the same time.

I don't know if it's an issue with hidapi on windows (i would think so if it's not a know problem), but with those changes seems workable from the outside.

I also don't know much about multithread programming, so probably my workaround it's not the best for this case ;)

PS1: I'm doing now a HID mapping for the G2V, and because it has 4 vumeters, generates more feedback/output than the old Hercules, so now the probabilty of locks is worst than ever. This bug happens equally on 1.11 and 1.12.
PS2: I have posted about this on the forums some months ago : http://www.mixxx.org/forums/viewtopic.php?f=3&t=6461, but i think have better info now ;)

Tags: hid windows
Revision history for this message
shalty (neogeo-dc) wrote :

Well, after those changes, it has been autodj'ing for almost 7 hours with no problems (i have stopped the test :) ).... more than 578k packets sent (i only send when data changes), and 817 dropped ones (so, 817 possibly chances to hang "saved"). I don't count the incoming ones, but should be almost two orders of magnitude greater.

Revision history for this message
shalty (neogeo-dc) wrote :

Well, after some days, i can confirm that's the case.

With that kludge i haven't had any problem with this again :)

Revision history for this message
shalty (neogeo-dc) wrote :

After all those days, the issue it's totally solved with my kludge code ;) , so i want to make an "official" source patch to be integrated into Mixxx (seriously, nobody uses HID under windows or what? :))

I have taken a look at hidapi tests (from outside Mixxx), and the issue it's the same: they just not block, ever, the reads (with or without controller movements). I don't have more HID devices right now, so i can't report if it's a "hardware thing", but reading about it gives me the impression that i'm not alone:

https://msdn.microsoft.com/en-us/library/windows/hardware/ff542426%28v=vs.85%29.aspx

"Using ReadFile

An application uses the open file handle it obtained by using CreateFile to open a file on the collection. When the application calls ReadFile, it does not have to specify overlapped I/O because the HID Client Drivers buffers reports in a ring buffer. However, an application can use overlapped I/O to have more than one outstanding read request."

Here it says clearly that ReadFile will return inmediatly ever. So, there is a need (always speaking about the possibility that this is a "windows only" thing) of a little delay on the controller thread, to avoid collapsing the CPU reading the same data over and over. Apart from that, there is the thing of emit only changes to javascript (an optimization), and controlling the access to the hid file to avoid simultaneous read/write (avoid lockups).

Another thing reading the Mixxx code, it's the call to ..

hid_set_nonblocking(m_pHidDevice, 1); // Quit blocking

...on closing the hid device. That does absolutelly nothing on windows, as it's used only for hid_read, and hid_read it's not called throught the code (Mixxx uses hid_read_timeout directly instead). Always speaking about the hidapi windows code, don't know about the linux implementation (or mac or whatever)

Well, i'll try to look into this "git" thing and make a patch...(i'm not at all used to version controlling systems :(, too old for that)

Revision history for this message
jus (jus) wrote :

Hi shalty,
you might want to try http://www.sourcetreeapp.com/
It takes the scary out of git.

Revision history for this message
Sean M. Pappalardo (pegasus-renegadetech) wrote :

Shalty: thank you for your detailed work! We have been in contact with the HIDAPI maintainer in the past and he's very friendly and receptive to patches. Since you said you suspect the root problem is in that library, I'd rather you work with him to resolve it, since it will help other users of HIDAPI as well (and I'm afraid your changes may have negative effects for Mixxx users on other platforms.) The maintainer's E-mail address is at the bottom of this page: http://www.signal11.us/oss/hidapi/ Feel free to CC mixxx-devel on your conversation.

tags: added: hid windows
summary: - HID controller thread hangs randomly
+ HID controller thread hangs randomly on Windows
Changed in mixxx:
assignee: nobody → shalty (neogeo-dc)
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Sean M. Pappalardo (pegasus-renegadetech) wrote :

As for Mixxx using hid_read_timeout, that's because Mixxx would stall on shutdown waiting for some input from the controller with hid_read so it could continue and send any shutdown messages. Using hid_read_timeout avoids that problem, but has the side effect of using more CPU (and apparently triggering your problem.) Mr. Ott of HIDAPI had mentioned a possible alternative solution that I haven't had the chance to investigate. Perhaps it's time for me to do so. :)

Revision history for this message
shalty (neogeo-dc) wrote :

I have sent a mail to Alan refering to this bug, i will update accordingly.

In the meantime, i have also cleaned the code and created (i hope so ;) a patch that i'm using now.

I have though about it, and the use of QTAtomics to avoid active locking (in fact i could let the reading have all the priority to the device and then only send packets if available.. the way it is on my mapping, that's not an issue), but as i'm not proficient with multithread programming and i don't know what others do on their mappings, i have taken the safest route (than i know).

Revision history for this message
blak (joan-ardiaca) wrote :

I have a similar problem when using two controllers simultaneously (a Pioneer DDJ-SB using MIDI and an Hercules DJ Control MP3 LE over HID) on Linux using the latest Mixxx master. The DDJ-SB stops working randomly after some time, but only when both controllers are connected, and happens even if the controllers are not being used. But your patch seems to solve this issue, thanks a lot for that!

I can't judge if this patch is an appropriate solution to this problem, but it definitely improved my experience with Mixxx.

Revision history for this message
blak (joan-ardiaca) wrote :

Damn, forget it, it just happened again :( I did not wait long enough.

Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

Due to lack of progress, marking Triaged and clearing assignee. Feel free to revert if it is in fact still in progress :).

I think this might be fixed though? Shalty do you know?

Changed in mixxx:
assignee: shalty (neogeo-dc) → nobody
status: In Progress → Triaged
Revision history for this message
Swiftb0y (swiftb0y) wrote :

Mixxx now uses GitHub for bug tracking. This bug has been migrated to:
https://github.com/mixxxdj/mixxx/issues/7877

lock status: Metadata changes locked and limited to project staff
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.