pcrud search can fail to retrieve rows that the user has access to

Bug #1847805 reported by Galen Charlton
42
This bug affects 9 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
High
Unassigned

Bug Description

PCRUD search works by constructing an executing a query, then scanning the results and skipping any rows that the user does not have permission to retrieve.

In cases where the user is likely to have permission to access all of the rows retrieved by the query, this isn't a problem as far as accuracy of the results are concerned.

However, it's often the case that the user may not have permission to retrieve a significant number of rows in the results. This can lead to incomplete results in AngularJS and Angular grids, which directly or (via Flattener) indirectly run broad PCRUD queries with offsets and limits. It is particularly problematic for grids (e.g., the EDI messages viewer) that don't have an OU filter that's automatically set to the user's workstation.

For example, imagine a consortium where one system generates 95% of the EDI message traffic and another generates only 5%. A user from the second system whose provider retrieval permissions are restricted to only their own system might open the EDI message page... and not see any results because all of the rows in the initial query window belong to the first system.

It would be better if PCRUD search could construct the base SQL query so it includes the joins and filter conditions necessary to restrict the results to rows that the user has permission to access. While this probably wouldn't obviate the need for post-query permissions checks for each row in the results, it would make it much less likely that the user wouldn't see results that they are entitled to.

Evergreen 3.2+

Galen Charlton (gmc)
Changed in evergreen:
importance: Undecided → High
tags: added: pcrud
Revision history for this message
Galen Charlton (gmc) wrote :

Some brainstorming Mike Rylander and I had about this:

<miker> gmcharlt: off my call now, looking at your bug. yeah, empty first page... gotcha. another option is to 1) move to cursors (so we don't retrieve ALL THE COPIES into pcrud RAM up front) 2) fetch a row or 10 at a time 3) implement limit/offset in pcrud, post-perm-check
<miker> some pros/cons for that:
<miker> pros: generally PG picks a "fast-start" plan for cursors (conditions apply), first page of results never empty, first page generally faster, lower RAM use (no chance of blowing out the backend osrf object slab allocator)
<miker> cons: subsequent pages /might/ take longer to return, especially when perm+location caching is defeated by a very broad object ownership spread
<gmcharlt> as well as the degenerate case that only a handful of rows meet the permissions conditions and you've effectively done a full scan of the table, though at least not without pooching pcrud's memory usage unnecessarily
<gmcharlt> so the cursor-based approach might be a win regardless
<miker> biggest problem I see with embedding perm checking in the query is plan stability/optimization ... we're introducing a significant variable to PG planning that we haven't really done before
<gmcharlt> yeah, at minimum I figure we'd turn up cases where we would want additonal indexes on context OU columns
<miker> aye. and user-ownership is tricky to OR into that...
<miker> see: subscription buckets :)
<gmcharlt> and as yet another edge case, object-owership
<gmcharlt> not, fortunately, that we have a ton of it in active use, but we do have some
<gmcharlt> of course, for specific cases, a workaround is to embed the additional OU-ownership conditions in the client's source pcrud query
<gmcharlt> but I'd hate for that to be done more than absolutely necessary
<miker> yeah, not a fan of that... we can make the tools better

Revision history for this message
Mike Rylander (mrylander) wrote :

I have a branch implementing what Galen and I chatted about in IRC, available at:

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/lp-1847805-pcrud_cursor_implementation

Time for more eyeballs, please!

tags: added: pullrequest
Revision history for this message
Mike Rylander (mrylander) wrote :

I've updated the above-linked branch to avoid using cursors when we know we're only going to retrieve one row. This should reduce any measurable, new overhead.

From the first commit message, in case it will produce some interest:

Because we check authz for rows retrieved by open-ils.pcrud calls after the underlying query is executed, including any limit/offset-based paging, we can fail to return results when the first "page" is full of rows that the calling user cannot see.

With this commit, we switch to SQL cursors and page through results in the application logic, which guarantees that the situation above will not impact legitimate row visibility.

More generally, this will should improve memory use by pcrud (along with cstore and other similar services) by avoiding initial full retrieval of results. Only those that are needed for the final result will consume memory for the full lifetime of the request.

Additionally, this may prove to increase speed of some queries, especially because Postgres has the option to optimize for "fast-start" plans.

Changed in evergreen:
milestone: none → 3.next
status: New → Confirmed
Changed in evergreen:
milestone: 3.next → 3.5-alpha
Revision history for this message
Galen Charlton (gmc) wrote :

I've done some initial testing and have some comments:

- using the following query on a Concerto database with a BR3 circulator, I got the following results:

  request open-ils.pcrud open-ils.pcrud.search.au "$SES", {"id":{"!=":null}}, {}

  * the number of records returned were the same both pre- and post- patch
  * pre-patch, it took ~2 seconds on my test system; post-patch, ~3.5 seconds
  * bumping the default cursor window size from 5 to 50 brought it back down to ~2.2 seconds
  * it made no difference to the timing whether or not the query to fetch each row for permissions checks used an interior cursor or not; the additional time it took seemed to be caused strictly by the outer cursor

- I have a major concern about opening the cursor "WITH HOLD". Per https://www.postgresql.org/docs/9.6/sql-declare.html, "In the current implementation, the rows represented by a held cursor are copied into a temporary file or memory area so that they remain available for subsequent transactions." I've verified that creating such cursors does indeed create files in pgsql_tmp for large files; a glitch (or attack) that fired off a number of cstore or pcrud searches of large tables could cause disk usage and/or I/O problems on a database server.

Creating a cursor WITHOUT HOLD inside of a transaction should work without creating temporary files as often as WITH HOLD could, but if that were done, cursors either become something that gets used only within atomic cstore and pcrud calls or making all searches effectively atomic, which would have its own tradeoffs.

- as a minor point, I think it's leaking the cursor_name string

Revision history for this message
Mike Rylander (mrylander) wrote :

Thanks for testing, Galen.

I've pushed an update that addresses your WITH HOLD concern by making sure that we're either inside an explicit transaction initiated by the caller, or creating a transaction to wrap the DECLARE/FETCH/CLOSE portion of the logic. With a transaction in place, we now use WITHOUT HOLD.

I also confirmed that cursor_name was leaking, and fixed that. The overall simplest thing to do is make it a global, since we don't currently interleave fetching and fleshing, though cursors do open that option, and could mean a faster TTFB with immediate streaming responses. That's a patch for another day, though.

If testing the two commits works well, I'd recommend we squash them into one before merging to master.

Thanks!

Michele Morgan (mmorgan)
Changed in evergreen:
milestone: 3.5-beta → 3.5.0
Changed in evergreen:
milestone: 3.5.0 → none
Changed in evergreen:
milestone: none → 3.5.1
no longer affects: evergreen/3.5
Changed in evergreen:
milestone: 3.5.1 → 3.5.2
Revision history for this message
Mike Rylander (mrylander) wrote :

Just a note about the forced atomic-ness Galen mentioned above (a transaction is required for WITHOUT HOLD cursors), we should actually be fine there.

If we're already inside a transaction, we're already OK, and if we're not, we'll start one before the main query and roll it back after. That will be a read-only transaction, which is allowed on readonly replicas (if anyone's using those for load balancing) and won't consume transaction IDs (XIDs) in modern PG (using XIDs here could lead to premature forced full vacuums, but that only happens for read-write transactions).

The only overhead that the "internal" transactions will create is the round-trip to issue the BEGIN.

Changed in evergreen:
milestone: 3.5.2 → 3.6.1
Changed in evergreen:
milestone: 3.6.1 → 3.6.2
Revision history for this message
Mike Rylander (mrylander) wrote :

I've added a commit to the above branch that protects the cursor from recursive calls used in open-ils.pcrud to check permissions on retrieved objects.

Changed in evergreen:
milestone: 3.6.2 → 3.6.3
Changed in evergreen:
milestone: 3.6.3 → 3.6.4
Changed in evergreen:
milestone: 3.6.4 → 3.7.2
no longer affects: evergreen/3.2
no longer affects: evergreen/3.3
no longer affects: evergreen/3.4
no longer affects: evergreen/3.5
Changed in evergreen:
milestone: 3.7.2 → 3.7.3
Changed in evergreen:
milestone: 3.7.3 → 3.9-beta
Changed in evergreen:
assignee: nobody → Jason Stephenson (jstephenson)
Changed in evergreen:
assignee: Jason Stephenson (jstephenson) → nobody
tags: added: needswork
removed: pullrequest
Revision history for this message
Jason Stephenson (jstephenson) wrote :

I was able to verify the bug with produciton data using the following in srfsh:

request open-ils.pcrud open-ils.pcrud.search.au "authtoken", {"id": {"!=": null}}, {"limit": 10}

I get 0 results back. If I change the search to use the home_ou of patrons equal to the work/permission ou of the logged in user, then I get 10 results.

After I apply the branch, neither prcrud search works for me. I get a database error:

Apr 6 09:12:46 egdev open-ils.pcrud: [ERR :16277:oils_sql.c:6134:16492507311661
12] open-ils.pcrud: Error retrieving actor::user with query [DECLARE "kaSXSIwPNO
2lNblD" NO SCROLL CURSOR WITHOUT HOLD FOR SELECT "au".active, "au".alert_messag
e, "au".barred, "au".billing_address, "au".card, "au".claims_returned_count, "au
".claims_never_checked_out_count, "au".create_date, "au".credit_forward_balance,
 "au".day_phone, "au".dob, "au".email, "au".evening_phone, "au".expire_date, "au
".family_name, "au".first_given_name, "au".home_ou, "au".id, "au".ident_type, "a
u".ident_type2, "au".ident_value, "au".ident_value2, "au".last_xact_id, "au".mai
ling_address, "au".master_account, "au".net_access_level, "au".other_phone, "au"
.photo_url, "au".prefix, "au".profile, "au".second_given_name, "au".standing, "a
u".suffix, "au".super_user, "au".usrgroup, "au".usrname, "au".alias, "au".juveni
le, "au".last_update_time, "au".pref_prefix, "au".pref_first_given_name, "au".pr
ef_second_given_name, "au".pref_family_name, "au".pref_suffix, "au".guardian, "a
u".name_keywords, "au".name_kw_tsvector, "au".deleted FROM actor.usr AS "au" WHERE "au".id = '424484';]: 6844179 6844179: ERROR: cursor "kaSXSIwPNO2lNblD" already exists

and

Apr 6 09:13:18 egdev open-ils.pcrud: [ERR :16277:oils_sql.c:6257:1649250731166113] open-ils.pcrud: Error retrieving actor::user with query [DECLARE "rXfnIqKJukEGhAaM" NO SCROLL CURSOR WITHOUT HOLD FOR SELECT "au".active, "au".alert_message, "au".barred, "au".billing_address, "au".card, "au".claims_returned_count, "au".claims_never_checked_out_count, "au".create_date, "au".credit_forward_balance, "au".day_phone, "au".dob, "au".email, "au".evening_phone, "au".expire_date, "au".family_name, "au".first_given_name, "au".home_ou, "au".id, "au".ident_type, "au".ident_type2, "au".ident_value, "au".ident_value2, "au".last_xact_id, "au".mailing_address, "au".master_account, "au".net_access_level, "au".other_phone, "au".photo_url, "au".prefix, "au".profile, "au".second_given_name, "au".standing, "au".suffix, "au".super_user, "au".usrgroup, "au".usrname, "au".alias, "au".juvenile, "au".last_update_time, "au".pref_prefix, "au".pref_first_given_name, "au".pref_second_given_name, "au".pref_family_name, "au".pref_suffix, "au".guardian, "au".name_keywords, "au".name_kw_tsvector, "au".deleted FROM actor.usr AS "au" WHERE "au".id IS NOT NULL;]: 5225472 5225472: ERROR: cursor "(null)" does not exist

The patch was applied to both Evergreen 3.7.2 and 3.5.3 with production data on a PostgreSQL 10 database. The above errors are from the Evergreen 3.7.2 installation, but 3.5.3 gave similar errors.

Changed in evergreen:
milestone: 3.9-beta → none
no longer affects: evergreen/3.6
Revision history for this message
Galen Charlton (gmc) wrote :

Mike and I had a discussion about an alternative implementation and landed on teaching open-ils.pcrud how to automatically filter by permissions along these lines:

- Add an option that's peer to flesh (et al) in PCRUD search call to specify that the query should get filter conditions automatically added to whatever the client has supplied.
- This would include a dedicated function that can generate an appropriate where clause addition ( {$context_field : { 'in' : $perms_at_array }} or { -exists : { $linked_context_subquery }} or, for no-perms required, just TRUE ) to be unconditionally ANDed by inserting {'-and':[$generated,$user_supplied}] above the user-supplied filter hash.
- All this should require is the core class hint and the user context.
- The list of OUs could be generated by a version of permission.usr_has_perm_at_all() that knows how to check multiple permissions at once. Depending on how this performs, it could either be invoked right before running the main query or directly included in the where conditions.
- The most complication will come from the few linked perm contexts that use "jump". hold notification, acq admin, payments/billings, surveys, link checking, and serials are the users of that construct. acq LID in particular specifies 2 "jump"ed context fields and 4 possible perms, so a general solution would have to support multiple OR'd generated filters.
- We should also support the owning_user perm construct, but that seems a natural extension to the above.
- Even the user-object perm map should be easy to support, and since the table behind that in normally empty or tiny, it shouldn't have a performance impact in practice.
- It would be good, rather than doing all the work on every call, to invent pcrud-mode startup logic that generates an appropriate set of check templates for each class. The problem with that is knowing how to stick the user id and org list into all the right places of the cloned perm-checking templates at call time. Likely we can record the jsonPath for those when needed and use jsonObjectFindPath() or findMultiPath[Recurse]() from libosrf_json (osrf_json_tools.c) at call time.

Another thing to look at during the implementation would be to see if adding this permission-based filtering logic can be cheaply done when retrieving rows for fleshing.

Revision history for this message
Galen Charlton (gmc) wrote :

Noting that doing permissions-based prefiltering in open-ils.pcrud could obviate a need for a patch for bug 2007322, but I can imagine client-side implementations coming sooner, as adding this to PCRUD will require at least medium-sized surgery.

Revision history for this message
Mike Rylander (mrylander) wrote :

Coming back to this because it's making new dev harder than it should be, I've addressed the issue Jason pointed out in comment 8. The branch is squashed and rebased to main at:

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/lp-1847805-pcrud_cursor_implementation-updated-squashed

While I generally still agree that pushing permission testing closer to the data is worth investigating and attempting, the size of the PCRUD surgery necessary to support that is significantly larger than what this branch does, so I'm advocating for this route in the mean time.

tags: added: pullrequest
removed: needswork
no longer affects: evergreen/3.7
no longer affects: evergreen/3.8
Changed in evergreen:
milestone: none → 3.14-beta
Revision history for this message
Mike Rylander (mrylander) wrote (last edit ):

I've pushed one more commit to make the LIMIT syntax be more forgiving in cursor mode. Now you can pass -1 or a JSON null for the limit and get the "no limit" behavior as before.

That should be squashed into the main commit for merge into main.

Revision history for this message
Jason Etheridge (phasefx) wrote :

Mike's branch has my blessing/signoff. Thanks!

tags: added: signedoff
Revision history for this message
Michele Morgan (mmorgan) wrote :

I took a look at this patch and was able to reproduce the issue Galen described in the description. On a system with production data, logged in as a 'less busy than some' acq library, navigating to Administration -> Acquisitions Adminstration -> EDI Messages does not load all expected rows.

After applying the patch, viewing EDI Messages does appear to retrieve the expected number of rows.

This patch merged cleanly for my testing, but now needs a rebase since Angular Buckets (bug 2063146) has been merged.

Also adding a note that I was hoping this patch would address bug 1960674. My testing, however, showed no difference in Acquisitions -> General Search results.

In addition to a rebase, this branch also needs release notes. Once those are taken care of, I'll be happy to review it again.

tags: added: needsrebase needsreleasenote
Revision history for this message
Mike Rylander (mrylander) wrote :

Angular Buckets (bug 2063146) actually contains the code from my initial branch, with some additional improvements (and some other pcrud work).

I'm not sure if we want to just note that the cursor-based approach is in now as an improvement and leave this bug open for future work to get permission testing closer to the data, or open a new bug for that...

Changed in evergreen:
milestone: 3.14-beta → 3.14-rc
Revision history for this message
Jason Stephenson (jstephenson) wrote :

I'm removing the signedoff and pullrequest tags on this bug since code for one approach has gone in as part of another bug and this bug will therefore need to be rethought if it is necessary to continue working on it.

I'm not going to change the status or anything else at this time, since I have not had time to review the new behavior.

tags: removed: pullrequest signedoff
Changed in evergreen:
milestone: 3.14-rc → none
Revision history for this message
Jason Stephenson (jstephenson) wrote :

Oh! I decided to remove the milestone as well.

Revision history for this message
Michele Morgan (mmorgan) wrote :

I lost about an hour to this bug today, which could have been much better spent bugsquashing!

Still willing to review it!

Revision history for this message
Mike Rylander (mrylander) wrote :

Here is a branch that implements the larger goal of pushing permission checking down into the initial query used for search or retrieve. It also adds support for a permission-validated count variant of search, and uses the in-query permission testing for update and delete calls, as well.

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/multi-bug-pcrud-and-record-buckets-improvements

tags: added: pullrequest
removed: needsrebase
Revision history for this message
Mike Rylander (mrylander) wrote :

I've added one additional commit to the above branch to handle XSD validation for the new permission-related logic.

As a note for other devs/committers, so they can avoid bumping into a (minor, but surprising-at-the-time) corner like I did: currently the way the XSD is structured, with an <xs:sequence> over the permacrud action verb elements, the order of those matters for validation. Thus there's a change to the IDL here. Evergreen itself does not assign any meaning to the element order in that particular case, and but for validation (and, marginally, style conformity) I wouldn't have touched fm_IDL.xml.

Revision history for this message
Mike Rylander (mrylander) wrote :

And, finally, I've added one more commit to address the need of an internal helper query used to find the top of the org tree. The new commit allows that to proceed in all cases.

NOTE: This is currently on the festivus feedback fest test server, at: https://festivus.evergreencatalog.com/eg2/en-US/staff/cat/bucket/record/user ... every user has 6 buckets with various visibility options, all containing every concerto bib.

If this goes in soon (as in, before another rebase is required due to merge conflicts) then I ask the committer to please squash the XSD validation commit into the "Angular bucket updates" commit, and the newest "Always allow" commit into the previous "in-query pcrud" commit. I'm not forcing a rebase or pushing a new branch in order to reduce churn while feedback-fest testing.

Thanks!

Revision history for this message
Mike Rylander (mrylander) wrote :

And FINALLY finally, I have added one more commit to address complex open-ils.fielder pcrud-backed queries that use join structures for up-front filtering. Same branch. That commit (as of this writing) is NOT on fesitvus. I will work to make that happen tomorrow morning.

Revision history for this message
Jane Sandberg (sandbergja) wrote :

Hi Mike, thanks for the branch! Many of the livetests failed for me on this branch, and I cannot save a MARC record in the marc editor. It also seems that the new "count" method would be worth adding a test for. Could you please take a look?

Revision history for this message
Jane Sandberg (sandbergja) wrote :

...I also cannot check materials out on this branch, so it definitely seems that the test failures are pointing to some serious regressions.

Revision history for this message
Mike Rylander (mrylander) wrote :

Hi Jane,

There's a specific bug that I'm dealing with that, I believe, is the root cause of the issues you're seeing.

I will add some tests once I've squashed that, and push an update.

*sad face*

Revision history for this message
Mike Rylander (mrylander) wrote :

Note to self: do not leave small chunks from commit-early-and-often just lying about, you will stub your toe on them.

I've rebase and squashed the branch down to 2 commits, one for the backend infrastructure and one for the buckets UI. It fixes the issues (they weren't happening on the dev box, which is REALLY annoying, because 2 fixes from several days ago got lost in a storm of small commits) and adds 9 concerto-focused live perl tests of both cstore (count function, and data gathering for pcrud comparison) and pcrud (perm logic, count function).

It's happy now, and the 2 commits should replace all that came from any previous version of the branch.

Revision history for this message
Mike Rylander (mrylander) wrote :

I've rebased the code above and improved support for modern Postgres functionality (in particular, RETURNING clauses on UPDATE and DELETE statements) so that the changes here are not only transparent, but also remove some situational roadblocks -- and make use of those newly-opened code paths.

The commit labeled with this bug has a lot of detail in the commit message, more than needs to be repeated here. A fresh branch is available at:

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/lp-1847805-pcrud-and-record-buckets-improvements

Revision history for this message
Galen Charlton (gmc) wrote :

I did some testing with one of my "favorite" use cases: acq.edi_message having hundreds of thousands of rows, only two of which are visible (via the provider's OU) to a given user.

This branch by itself didn't improve the speed a PCRUD fetch of those two rows, but two additional changes did make performance acceptable:

* Marking permission.usr_has_perm() as stable. That makes sense anyway, since in the context of a given query run that function changes no data and is guaranteed to always return the same value for any combination of user, permission, and target OU.
* Adding the ignore_object_perms="true" decorator to the IDL. I think this is justifiable since there's currently no business logic that would activate per-object permission grants for rows in that table; I'd go so far as to say that ignore_object_perms should be sprinkled liberally in the IDL, as it looks like it's an easy optimization for any large tables where we know in advance aren't going to have object-level permissions.

Mike, if both of those make sense to you, the follow-up is straightforward.

Also, I'm identifying this as a bugfix that, once it's fully tested, should be considered for backporting given the performance issues it addresses.

Revision history for this message
Galen Charlton (gmc) wrote :

(And in particular, something that may be worth considering doing a 3.16 beta2 to give it some broader testing.)

Revision history for this message
Mike Rylander (mrylander) wrote :

Unsurprisingly, I'm happy with both of those changes. I'm going to go a bit further and mark all the various permission.usr_has_* functions as stable.

Galen, did you happen to test with a stable usr_has_object_perm, but without the ignore_object_perms addition to acq.edi_message in the IDL in your large dataset?

Branch coming soon...

Revision history for this message
Mike Rylander (mrylander) wrote :

I've pushed a commit to the same branch with the updates suggested (including my additional changes to the other permission functions). The additional commit can definitely just be squashed into the existing commit labeled with this bug's number.

Thanks, Galen!

Galen Charlton (gmc)
Changed in evergreen:
milestone: none → 3.16-rc
Revision history for this message
Mike Rylander (mrylander) wrote :
Revision history for this message
Galen Charlton (gmc) wrote :

Committed for inclusion in 3.16-rc. I consider this a candidate for backporting to 3.15, but some additional testing is probably in order before we do that.

Thanks, Mike!

Changed in evergreen:
status: Confirmed → Fix Committed
Galen Charlton (gmc)
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.