Multiple IndexedDB connections (via tabs) can result in data inconsistency

Bug #1775719 reported by Bill Erickson
36
This bug affects 14 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
High
Unassigned
3.0
Fix Released
High
Unassigned

Bug Description

Spinning this off from bug #1768947

Recap:

https://github.com/google/lovefield/issues/172#issuecomment-256284033 (and reading down)

https://github.com/google/lovefield/blob/master/docs/spec/03_life_of_db.md#32-multi-process-connection

Multiple connections across tabs can lead to data inconsistency.

Consider moving lovefield actions into a shared webworker / serviceworker.

---

While we may want to apply other stop-gap measures, moving the DB connection handling into a shared worker is ultimately the safest data integrity solution because it ensures there can only be one connection. I have a fairly extensive proof of concept of moving the lovefield code into a shared worker to resolve this issue. Will post a branch when I have something that plays nicely with the browser client.

Revision history for this message
Bill Erickson (berick) wrote :

Code push'd:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/berick/lp1768947-offline-db-webworkers

This moves the DB access logic into a shared worker. The egLovfield API is unchanged, but all DB reads/writes are now shuttled off to the shared worker.

I also moved the full download block list logic into the web worker since it's so heavy. That way it does not lock the browser up when it's building. Consistent with bug #1727557, I am also displaying an (indeterminate) progress dialog as the block list builds.

As a reminder, shared workers run in their own process, so debugging them means finding the right console logs.

In Chrome, navigate to chrome://inspect/#workers -- Look for the offline-db-worker.js entry and click "Inspect".

In FF, they're under about:debugging#workers

Eyes and testing appreciated. I think I got all the bugs -- the behavior I know to test is working -- but it's big enough others could be lurking.

tags: added: offline pullrequest
Changed in evergreen:
milestone: none → 3.1.3
status: In Progress → New
assignee: Bill Erickson (berick) → nobody
Revision history for this message
Bill Erickson (berick) wrote :

Marking as webstaffblocker since its parent ticket was marked as such.

tags: added: webstaffblocker
Revision history for this message
Jeff Davis (jdavis-sitka) wrote :

Bill's branch is working well for me. The test from bug 1768947 comment #12 no longer produces a white screen and I haven't noticed the offline DB error elsewhere in testing yet.

I do see the following console error on the login page:

lovefield.js:55 worker request failed with createSchema(offline) call required

Functionality seems unaffected, though -- I can still login, and offline seems the same as before. The error doesn't seem to show up anywhere but on the login page.

Revision history for this message
Bill Erickson (berick) wrote :

Thanks, Jeff. Researching the error now.

Revision history for this message
Bill Erickson (berick) wrote :

Fix to previously noted error pushed to same branch.

The issue was the login page wants to report when any pending offline transactions exist. However, the new code only initializes the offline transaction and block tables in the offline UI to minimize the speed/load issues noted in bug #1727557.

My solution was to maintain an entry to the object date cache table indicating the time of the most recent offline transaction entry. This data is used on the login page to determine if offline transactions exist, so the staff logging in can be notified.

As a bonus, since we know the last transaction add time, the date of the most recent transaction is now displayed in the login page offline xact alert panel.

When offline transactions are cleared, the last xact date is removed as well.

Changed in evergreen:
milestone: 3.1.3 → 3.1.4
Changed in evergreen:
status: New → Confirmed
Revision history for this message
Jeff Davis (jdavis-sitka) wrote :

After some testing, I put this in production at Sitka a few days ago and haven't had any complaints yet. Signoff for both commits here:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/jeffdavis/lp1768947-offline-db-webworkers-signoff

tags: added: signedoff
Revision history for this message
Mike Rylander (mrylander) wrote :

Bill, I think insertOfflineBlocks has some reversed logic ... the inside of this if-block:

+ if (currentChunk.length >= chunkSize) {
+ currentChunk = [];
+ chunks.push(currentChunk);
+ }

looks like it should be the other way around. Otherwise we're pushing an empty chunk onto the chunks array. IOW, we process but don't store the block list.

Separately, do we need to bump the database version (currently 2)? I did a visual scan in an attempt to confirm that there were no schema changes at all, and that seems to be true. If that's correct then no version change is necessary.

Revision history for this message
Bill Erickson (berick) wrote :

Thanks for the eyes, Mike.

I can confirm no schema changes were made. We do change when the offline xacts/block tables are built, but as I understand it IndexedDB will ignore tables that are not explicitly initialized within the page, so that alone should not prompt a version bump.

I believe the currentChunk logic is doing what's it's supposed to. Once the current array is full, create a new one and toss it onto the array of arrays. The block list data is added to the new array just below there.

Revision history for this message
Mike Rylander (mrylander) wrote :

Re tables, right, as long as the structure is the same, we're fine.

And pardon the noise. I wasn't taking the surrounding code into account, and expecting a chunk to be added once full (and then one final addition after) rather than pre-adding a reference to the current chunk.

Thanks!

Changed in evergreen:
milestone: 3.1.4 → 3.1.5
Kathy Lussier (klussier)
Changed in evergreen:
importance: Undecided → High
assignee: nobody → Kathy Lussier (klussier)
Revision history for this message
Kathy Lussier (klussier) wrote :

I've loaded this branch on a test VM with Concerto. When I try to download the offline patron block list, I'm just getting a progress bar with no download success message. Typically, I have not trouble downloading the block list with Concerto data.

I inspected the service worker via chrome://inspect/#workers and saw the following error message.

Uncaught TypeError: Cannot read property 'then' of undefined
    at dispatchRequest (offline-db-worker.js:392)
    at MessagePort.<anonymous> (offline-db-worker.js:403)

Kathy Lussier (klussier)
Changed in evergreen:
assignee: Kathy Lussier (klussier) → nobody
Revision history for this message
Bill Erickson (berick) wrote :

Thanks, Kathy. I'll rebase and re-test.

Changed in evergreen:
assignee: nobody → Bill Erickson (berick)
Revision history for this message
Bill Erickson (berick) wrote :

Kathy, I am unable to reproduce this issue unfortunately. The only scenario where that error can happen is if the Download button were clicked while a download was already in progress. Any chance a previous download failed and the Download button was clicked again to re-test (or errant double-click)?

To rule that out, I have rebased the signoff branch to master and pushed one more commit which disables the Download button as soon as it's clicked, for the duration of the download. I also improve the error reporting should that happen.

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/berick/lp1768947-offline-db-webworkers-signoff-rebase1

Changed in evergreen:
assignee: Bill Erickson (berick) → nobody
Revision history for this message
Kathy Lussier (klussier) wrote :

Yes, I did have a previous download that failed. I'll take another look.

Thanks Bill!

Revision history for this message
Kathy Lussier (klussier) wrote :

I've replicated the initial failed download that preceded the issue with the stuck progress bar.

After adding the top commit from your branch to the test server, I exited the browser, cleared out my IndexedDB database and logged back in. When I click the button to download blocked patrons, I get a message telling me the download failed.

I see this in the main Console:

core.bundle.js:1 worker request failed with Error fetching offline block list
(anonymous) @ core.bundle.js:1

When I inspect the service worker via chrome://inspect/#workers, I see the following:

offline-db-worker.js:340 shared worker replying with error Error fetching offline block list
replyError @ offline-db-worker.js:340

The Download block list button is NOT disabled. If I click it again, I do now get a Console message telling me that a download is already in progress.

Revision history for this message
Bill Erickson (berick) wrote :

Thanks, Kathy.

Can you confirm a file exists at https://HOSTNAME/standalone/list.txt that your logged in account has access to?

Revision history for this message
Bill Erickson (berick) wrote :

I push-squashed another commit to address the Download button not getting disabled, but not exactly how you might expect. If the download fails outright, the button should be re-enabled, but that bit was only partially working, which lead you down this path in the first place.

If you confirm a file exists (see my previous comment), then I can add more debugging to the file retrieval code. As it stands, it looks like the file's not accessible to the browser (i.e. does not return HTTP 200).

Kathy Lussier (klussier)
Changed in evergreen:
assignee: nobody → Kathy Lussier (klussier)
Revision history for this message
Kathy Lussier (klussier) wrote :

Thanks Bill!

I finally was able to replicate the white screen issue by following the steps previously suggested by Jeff Davis. I confirmed that it fixed the problem and that my earlier download problems were not a result of the new code but of a missing standalone/list.txt file.

I've merged the branch to master and 3.1.

There is were a few conflicts when trying to backport the branch to 3.0, and I'm not comfortable with resolving all of them. If you resolve those, I have time today to take a look at it and backport.

Changed in evergreen:
assignee: Kathy Lussier (klussier) → nobody
status: Confirmed → Fix Committed
Revision history for this message
Bill Erickson (berick) wrote :

Here's a 3.0 backport branch:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/berick/lp1775719-indexeddb-3.0-backport

I installed this on a 3.0 system and everything is working for me. I'll leave the branch here for a bit in case anyone would like to test.

Revision history for this message
Kathy Lussier (klussier) wrote :

I installed the 3.0 branch, but I'm getting karma unit test failures that are similar to the web client build test failures that arose earlier tonight during our regular test runs. I didn't see those errors when I tested the lp1768947-offline-db-webworkers-signoff-rebase1 branch earlier today.

Revision history for this message
Kathy Lussier (klussier) wrote :

Actually, I suspect the failures were there when I tested the branch on master, but I missed them. The failures were more obvious in the 3.0 test because the grunt all step fails unless I force it to continue.

Revision history for this message
Bill Erickson (berick) wrote :

Thanks for catching that, Kathy.

Here's a branch that fixes the PhantomJS problems, including a bit to avoid logging errors when PhantomJS is unable to connect to the offline shared worker to limit unit testing log noise.

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/berick/lp1775719-phantomjs-repairs

Revision history for this message
Bill Erickson (berick) wrote :

Note, the above fixes tests in master (and likely 3.1). I'm still getting an error in 3.0 with this patch, though. Grunt is unhappy. Still digging.

Revision history for this message
Bill Erickson (berick) wrote :

However, I am getting the same errors in rel_3_0, without any of this code applied, so I suspect that's a different issue.

Revision history for this message
Bill Erickson (berick) wrote :

My issues with rel_3_0 are very likely related to my use of Node v8 and not an issue specifically with rel_3_0. So, my patch above should resolve test issues in 3.0, 3.1, and master.

Revision history for this message
Kathy Lussier (klussier) wrote :

Thanks Bill! The tests pass for me on master. I've merged the phantomjs repairs branch to master and 3.1.

I'm having a bit of trouble with my 3.0 testing that I think may be unrelated to your code. I'll report back on that one shortly.

Revision history for this message
Kathy Lussier (klussier) wrote :

And the 3.0 branch looks good too.

Thanks again!

Changed in evergreen:
status: Fix Committed → Fix Released
Revision history for this message
Scott Thomas (scott-thomas-9) wrote :

We just installed 3.0.11 and tried to download the Block List. It continues to hang.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.