(win) very long start when network shares are unavailable

Bug #1487138 reported by Sébastien BLAISOT
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mixxx
Fix Released
Undecided
Sébastien BLAISOT

Bug Description

Under Windows, using 1.12-beta1, mixxx startup takes a very long time (>30 seconds) when you have network shares mounted (drive letter assigned) that are unavailable at the time you launch mixxx (which happens if you put your laptop on sleep at home with a network share mounted and start it when you arrive at a gig for example)

Step to reproduce :
1. assign a drive letter to a network share
2. Change the IP address of your network card to be in another network without routing capabilities (simply shutting down the interface won't work, you must have some network but not the possibility to access network share)
3. launch mixxx

Revision history for this message
Sébastien BLAISOT (sblaisot) wrote :

It's like Mixxx somehow try to scan all drives when launching.

Changed in mixxx:
status: New → In Progress
assignee: nobody → Sébastien BLAISOT (sblaisot)
Revision history for this message
Sébastien BLAISOT (sblaisot) wrote :

Also tested on windows 10, Mixxx takes a little more time to launch that normal when mounted network shares are unavailable, but far less than under windows seven.

Can anyone test under linux/macos with an NFS share mounted that is no more accessible (using soft mount of course) when Mixxx launch ?

Revision history for this message
Sébastien BLAISOT (sblaisot) wrote :

I just tested on windows 10 and couldn't reproduce. It seems that this problem is only shown on windows XP to seven.

Revision history for this message
Sébastien BLAISOT (sblaisot) wrote :

ok, the time is taken here :
https://github.com/mixxxdj/mixxx/blob/1.12/src/mixxx.cpp#L327

I managed to reproduce it with 52 seconds taken by this sole line.

adding some debug print before/after gives me in the log :
Debug [Main]: "19:52:23.803" Start progress: 32
Debug [Main]: Available QtSQL drivers: ("QSQLITE")
Debug [Main]: DB status: "C:/Users/Sébastien/AppData/Local/Mixxx/mixxxdb.sqlite" = true
Debug [Main]: SchemaManager::upgradeToSchemaVersion already at version 24
Debug [Main]: TrackDAO::initialize QThread(0x415cd0, name = "Main") "qt_sql_default_connection"
Debug [Main]: CrateDAO::initialize()
Debug [Main]: CueDAO::initialize QThread(0x415cd0, name = "Main") "qt_sql_default_connection"
Debug [Main]: DirectoryDAO::initialize QThread(0x415cd0, name = "Main") "qt_sql_default_connection"
Debug [Main]: LibraryHashDAO::initialize QThread(0x415cd0, name = "Main") "qt_sql_default_connection"
Debug [Main]: Default quick links: ("D:/Documents/Sébastien/music/soirées/", "D:/Documents/Sébastien/music/", "C:/Users/Sébastien/Downloads/", "D:/Documents/Sébastien/Desktop/", "D:/Documents/Sébastien/Mes Documents/")
Debug [Main]: Appending Quick Link: "soirées" --- "D:/Documents/Sébastien/music/soirées/"
Debug [Main]: Appending Quick Link: "music" --- "D:/Documents/Sébastien/music/"
Debug [Main]: Appending Quick Link: "Downloads" --- "C:/Users/Sébastien/Downloads/"
Debug [Main]: Appending Quick Link: "Desktop" --- "D:/Documents/Sébastien/Desktop/"
Debug [Main]: Appending Quick Link: "Mes Documents" --- "D:/Documents/Sébastien/Mes Documents/"
Debug [Main]: Committing transaction on "qt_sql_default_connection" result: true
Debug [Main]: Traktor Library Location=[ "C:/Users/Sébastien/collection.nml" ]
Debug [Main]: Checking for access to user's library directories:
Debug [Main]: Checking for access to "D:\Documents\Sébastien\music\soirées" : true
Debug [Main]: "19:53:15.791" Start progress: 33

I have to dig a bit more inside this block but it seems that somewhere in it there is a drive scan that happens and waits for a system timeout.

Revision history for this message
Sébastien BLAISOT (sblaisot) wrote :
Revision history for this message
Owen Williams (ywwg) wrote :

The left-hand pane includes a "browse" item that enumerates all of the devices on the system. If there are network drives, it will scan them. Perhaps we should lazily populate that list and not enumerate the devices until that part of the tree is twirled down? And then it could populate them in the background, or show them one at a time, making it more clear to the user what's taking so long.

I considered suggesting that we skip network devices, but I could imagine a radio station setup that uses a NAS to store music, and we'd want to support that. So finding some way to do this work in the background seems like the best option.

Revision history for this message
Owen Williams (ywwg) wrote :

(by scan, I don't mean it will scan all the directories on the remote device, I believe it just sees that it exists. But something in that process is taking a long time)

Revision history for this message
Sébastien BLAISOT (sblaisot) wrote :

I'm trying to find the line of code that takes a long time.

It seems that QDir::drives() is not reponsible.

It's probably drive.canonicalPath() of drive.filePath() that is taking extra delay here. So with a little change, I can find a way to avoid trying to read to the unavailable drive while populating the left-hand pane.

on the other hand, I also have to check what happens if you click in the left-hand pane on an unavailable drive to be sure we don't hang mixxx. I'll report here what I find.

Revision history for this message
Sébastien BLAISOT (sblaisot) wrote :

sorry, I meant drive.canonicalPath() OR drive.filePath()

Revision history for this message
Sébastien BLAISOT (sblaisot) wrote :

If I click in the left-hand pane on a drive that is not accessible, I sometimes managed to hang mixxx interface (waveform doesn't move anymore, vu-meter hang, nothing move), but the playback continue without any problem.

the interface becomes responsive again after some seconds. When you are in the middle of a mixx, this seconds can appear very long ;)

Revision history for this message
Sébastien BLAISOT (sblaisot) wrote :

ok, finally got it.

the delay is taken here : https://github.com/mixxxdj/mixxx/blob/1.12/src/library/browse/browsefeature.cpp#L70

and the responsible is drive.canonicalPath()

on my system (win seven); the two functions return the same thing : X:/

only the first call to drive.canonicalPath() takes a long time (waiting for the system timeout). If I add a second call, this one is fast.

I'm considering changing drive.canonicalPath() for drive.filePath(). This, if a drive hang mixxx interface for a few seconds after a user clicked on it, he knew which drive if having access problems.

what do you think ?

Revision history for this message
Sébastien BLAISOT (sblaisot) wrote :

I made a try.

replacing drive.canonicalPath() with drive.filePath() make mixxx launch much more rapidly. but on the other hand, this delays the first faulty drive acces to when expanding the "peripheral" item on the left-hand pane, which totally freezes mixxx interface for some loooong seconds. playback continue, but the interface is totally unresponsive. After system timeout, it comes back.

so I think it's better to have a long time at startup than during a mix...

the real bug is you have a system with a unaccessible drive.
What about setting a 1 second timeout on drive.canonicalPath() and skipping the drive that is causing this timeout with an warning message in the log (something like "drive X:/ is not accessible") ? I mean not showing it in the peripheral list of the left-hand pane.

We should also have some timeout like this when browsing with the left-hand pane to avoid interface freeze.

Revision history for this message
Sébastien BLAISOT (sblaisot) wrote :

ok, finally, no matter what time it gets on launch and if launching tries to access unavailable shares or not, you will have sometimes interface freezes when expanding the "drives" part of the left pane when network shares are unavailable.

So I made a PR to have a quicker launch, which should not have any side effect on the interface freeze when expanding drives, but will accelerate launching time.

https://github.com/mixxxdj/mixxx/pull/686

Revision history for this message
Owen Williams (ywwg) wrote :

The problem is that canonicalpath and filepath are not interchangeable. By forcibly using the filepath there could be new issues with symbolic links and ".." in the filename. Or, we have to allow for more time to do testing to discover if filepath is a suitable replacement. But the best solution would be to come up with a more robust fix that still uses canonicalpath() and does the work in a worker thread.

Since this is more of a bug with the user's system and not Mixxx, I think we should target master instead of 1.12 for working around this issue. But for 1.12 we should definitely print out an error in the log if a drive is broken (can we trap the error somehow?), and we should not try to access that drive again for that session if possible.

Revision history for this message
Owen Williams (ywwg) wrote :

If the user is seeing slow startup, we'll be able to look at the log and see that they have a broken network drive, and tell them to fix it.

Revision history for this message
Sébastien BLAISOT (sblaisot) wrote :

Owen, there can not be any problem with .. and symlinks :

1_ We are in an ifdef __WINDOWS__ so there is no symlink
2_ we only apply canonicalPath/filePath to foreach drive in QDir::drives(), so we only work on a list of drive letters

so we are in a special case here where filePath and canonicalPath are exactly the same.

a more robust fix should target master, but a quick fix for 1.12 is a good option I think.

We cannot trap and error, because there is no error, only a long time to access broken drive. The only way to trap the error is to add some sort of timeout.
Even if we don't populate the "devices" rootItem with the broken drive, we rebuild the drives list on each item expansion in onLazyChildExpandation:300

Revision history for this message
Sébastien BLAISOT (sblaisot) wrote :

and last but not least, in this portion of code, canonicalPath is only used for display purposes. Used only as the first parameter of treeItem constructor. So there is indeed absolutely no problem using something else.

Currently, the comment on https://github.com/mixxxdj/mixxx/blob/1.12/src/library/browse/browsefeature.cpp#L72 is inaccurate : canonicalPath returns C:/ and not C:
so canonicalPath is exactly the same than filePath in this particular situation.

using filePath and removing trailing / is IMHO the best way to show the user a list of X: drive letters.

Revision history for this message
Owen Williams (ywwg) wrote :

Although the string is only for display, I believe the display string is used to determine what filename to load. But even so, the ifdef that applies this only to windows makes this safer. I just have a couple questions on the PR

Revision history for this message
Sébastien BLAISOT (sblaisot) wrote :

Owen,

this display string is not used to determine what filename to load, only for left-hand pane item name as far as I understand.

copying https://github.com/mixxxdj/mixxx/blob/1.12/src/library/browse/browsefeature.cpp#L95
    // Just a word about how the TreeItem objects are used for the BrowseFeature:
    // The constructor has 4 arguments:
    // 1. argument represents the folder name shown in the sidebar later on
    // 2. argument represents the folder path which MUST end with '/'
    // 3. argument is the library feature itself
    // 4. the parent TreeItem object

we are talking here about the 1st argument only.

Revision history for this message
Owen Williams (ywwg) wrote :

Ah I was thinking of the library table view, where the "location" item is used as the canonical location of the file and is not just a display value.

Changed in mixxx:
status: In Progress → Fix Committed
RJ Skerry-Ryan (rryan)
Changed in mixxx:
status: Fix Committed → Fix Released
RJ Skerry-Ryan (rryan)
Changed in mixxx:
milestone: none → 2.0.0
Revision history for this message
Swiftb0y (swiftb0y) wrote :

Mixxx now uses GitHub for bug tracking. This bug has been migrated to:
https://github.com/mixxxdj/mixxx/issues/8201

lock status: Metadata changes locked and limited to project staff
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.