EngineBufferE2ETest.KeylockReverseTest is flapping

Bug #1740697 reported by Be
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mixxx
Fix Released
Critical
Unassigned

Bug Description

EngineBufferE2ETest.KeylockReverseTest fails randomly. It has been failing on the Jenkins build servers but not on Travis and AppVeyor. Most of the time when I run tests locally it works, but occasionally it fails and the output buffer is all 0s.

Be (be.ing)
Changed in mixxx:
status: New → Confirmed
importance: Undecided → High
milestone: none → 2.1.0
Revision history for this message
Be (be.ing) wrote :

I'm guessing this has something to do with changing the rateRange ControlObject to a ControlPotmeter with a finite range in https://github.com/mixxxdj/mixxx/pull/1377 , but I'm not really sure what's going on...

Revision history for this message
Be (be.ing) wrote :

Ah-ha, I think I found the problem. I see this above the dumping of the buffer samples when the test fails:

SoundSourceProxy - Opening file "file:///home/be/sw/mixxx/src/test/sine-30.wav" with provider "libsndfile" using mode Strict
CachingReader - Cache miss for chunk with index 0
didn't get what we wanted 0 1038
CachingReader - Cache miss for chunk with index 0
didn't get what we wanted 0 1038

I see similar errors after running the "continue" command in GDB when debugging Mixxx.

Revision history for this message
Be (be.ing) wrote :

I presume this has to do with the recent changes to the SoundSources? It seems the test fails more often when the machine is under high load.

Revision history for this message
Be (be.ing) wrote :

Raising this to Critical priority because it is preventing us from getting builds published reliably.

Changed in mixxx:
importance: High → Critical
Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

Cache misses only occur after seeking to an uncached position. The CachingReader tries twice to fetch the result before giving up. This should have nothing to do with the failing test which never occurred in this branch during development, neither locally nor on the build server.

The failing test appeared during the final preparations of the 2.1 beta release. I was able to reproduce it locally, but currently all tests are succeeding again.

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

Since decoding is done by a background race conditions might occur depending on the usage patterns. Slight changes in timing might cause unexpected behavior. I will check that.

Revision history for this message
Be (be.ing) wrote :

I can reproduce this when my machine is under high load. For example, if I compile Mixxx and run the test while compiling, the test will fail more often.

Might there be a way to force an order of how things are executed only for testing purposes to avoid the race condition?

Changed in mixxx:
assignee: nobody → Uwe Klotz (uklotzde)
Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

This commit slightly changed the behavior of CachingReader::read() by propagating cache misses properly to the caller instead of implicitly filling the remaining sample buffer with silence:

844cf3505c6512e5c62d8c01a2b99f18464cb8fc

Test results should be unaffected. Now the buffer will be filled with silence in one of the EngineBufferScale classes and tests will fail if the decoded sample data is not available in time.

The CachingReader API has no proper way to propagate errors. Currently both permanently unreadable and temporarily unavailable sample data cannot be distinguished by the caller. I already mentioned that this layer above the SoundSources is the next candidate for an overhaul, 2nd round.

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

Various EngineBufferE2ETest test cases reliably fail on my development system when adding a sleep of about 4 ms at the beginning of SoundSourceSndFile::readSampleFramesClamped().

Drop outs due to delayed input data under heavy load might occur and are expected. The test results would only be reliable and independent of the host system when executed in a non-real-time configuration that prevents race conditions. It's a conceptual problem, not bug that could be fixed easily.

Changed in mixxx:
assignee: Uwe Klotz (uklotzde) → nobody
Revision history for this message
Be (be.ing) wrote :

So I think our options here are:
1. Make the tests execute in a non-real-time configuration that prevents race conditions
2. Disable the test
3. Ensure the build servers are not under heavy load when running the test

I don't know which makes most sense.

Revision history for this message
RJ Skerry-Ryan (rryan) wrote : Re: [Bug 1740697] Re: EngineBufferE2ETest.KeylockReverseTest is flapping

IMO option 1 is most desirable.

On Mon, Jan 1, 2018, 1:40 PM Be <email address hidden> wrote:

> So I think our options here are:
> 1. Make the tests execute in a non-real-time configuration that prevents
> race conditions
> 2. Disable the test
> 3. Ensure the build servers are not under heavy load when running the test
>
> I don't know which makes most sense.
>
> --
> You received this bug notification because you are a member of Mixxx
> Development Team, which is subscribed to Mixxx.
> https://bugs.launchpad.net/bugs/1740697
>
> Title:
> EngineBufferE2ETest.KeylockReverseTest is flapping
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mixxx/+bug/1740697/+subscriptions
>

Revision history for this message
Be (be.ing) wrote :

Maybe we could implement a mock CachingReader that keeps everything in one thread instead of starting a new thread?

Revision history for this message
Be (be.ing) wrote :

Taking a quick look at the code, it does not seem that would be very easy. We still need to decide what to do here to get the Ubuntu builds working on the build server.

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

I thought about a temporary hack by repeating each failing test and only compare buffers after the second run. Hoping to get cached input in-time, if the cached results have not been discarded in the meantime. I might give it a try, since I already have some experience with CachingReader. I don't have any other ideas for a quick fix.

Changed in mixxx:
assignee: nobody → Uwe Klotz (uklotzde)
Revision history for this message
Sébastien BLAISOT (sblaisot) wrote :

quickest fix is to disable this test until we have a good solution.

Revision history for this message
Be (be.ing) wrote :

If it's still possible for the test to fail, I don't know if it makes much sense to add complexity to the tests instead of just disabling it.

I'm quite confused why this is only happening for KeylockReverseTest when there are other tests that rely on EngineBuffer reading from a test file.

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

Ok, let's disable the test until we have a reliable solution:

https://github.com/mixxxdj/mixxx/pull/1478

I've added some comments that explain why.

Be (be.ing)
Changed in mixxx:
status: Confirmed → Fix Committed
Changed in mixxx:
status: Fix Committed → Fix Released
Revision history for this message
Swiftb0y (swiftb0y) wrote :

Mixxx now uses GitHub for bug tracking. This bug has been migrated to:
https://github.com/mixxxdj/mixxx/issues/9040

lock status: Metadata changes locked and limited to project staff
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.