Bug #1638295 “Propose SQL ACLs and filtered queries” : Bugs : KARL4

Paul Everitt (paul-agendaless) on 2016-11-01

summary:

- Propose SQL ACLs and filteried queries
+ Propose SQL ACLs and filtered queries

Revision history for this message

Paul Everitt (paul-agendaless) wrote on 2016-11-01:

#1

Jim, one thing I wasn't clear about...I pointed you at https://github.com/pauleveritt/pyramid_sqltraversal/tree/master/docs which uses the Adjacency List approach for hierarchies. There are other approaches, e.g. materialized path. I used adjacency list because it was well supported and seemed to have good performance.

Point is, you can use whatever you want. These things all have deep implications for which I'm not a good arbiter.

Revision history for this message

Jim Fulton (jim-zope) wrote on 2016-11-03:

#2

OK, I have something working and it seems to provide reasonable performance.

I tested with queries as I believe they'd be generated by the pgtextindex. With a query for text containing "zuma", and also limiting to the african-advocacy community, the basic search takes around 10ms and yields 519 rows on the staging server.

Adding security filtering adds 10-20ms, depending on the filter.

If I don't filter for community, the raw search is ~18ms and the security-filtered search takes around 105ms. Note though that I only added ACE data for the african-advocacy community. It's possible/likely the filtered search would be faster for the larger data set if the full ACE data were present.

I added 2 (temporary for my testing) tables:

parents (docid int, parent_docid int)")

ace (docid int, allowed bool, who varchar, permission varchar, ord int)

I used a fairly straightforward recursive query, which I won't include here. Currently, I have a Python function that generates the query. I have some basic tests using psycopg2 and that connect to a (unspecified, so local, or controlled via environment variables) postgres database.

Note that pgtextindex doesn't use sqlalchemy, so that shouldn't be an issue. :)

Would you like me to check what I have in somewhere? Wanna get together online to discuss?

Revision history for this message

Jim Fulton (jim-zope) wrote on 2016-11-03:

#3

BTW, I chose an approach that required minimal database updates.

In particular, ACEs are only stored on the objects for which there were defined in the app. This requires the recursive queries.

Another option would be to flatten them, to avoid the recursive query. This would require a lot of updates when ACLs are updated on nodes with lot's of descendents.

The materialized path idea sounds interesting. I didn't explore that, but would be happy to. Let me know if you want me to pursue that.

Revision history for this message

Paul Everitt (paul-agendaless) wrote on 2016-11-03: Re: [Bug 1638295] Re: Propose SQL ACLs and filtered queries

#4

Download full text (3.4 KiB)

Some questions:

1) How would you envision getting class ACL information into the query?

2) Let’s say you do a search that says, give me the BlogEntry types in SomeCommunity with a text search of “africa”, ranked by relevance, that I have permission to see, and just the first 20. Are you confident that the PG optimizer does the right thing at the right time? Meaning, the expensive step (ACL lookup) is done late?

3) Did you manage to think of any kind of index, e.g. an expression index, that could be used?

4) When you say “the raw search is ~18ms”, does that include the relevance ranking ordering? (FWIW, that’s the part that kills us in PG, requires a table scan.)

5) For the materialized path, I think we might have you look at it, just to have in our back pocket, but after we’ve gone through some other things.

6) Does your query do anything like “cache” ACL lookups on intermediate nodes? (Or that might be my stupid procedural thinking still kicking around.)

7) Re: your idea about flattening…rather than store the actual ACL on all the children, could you store a single ACL, and point all the children at it?

- parent1 has ACL with an ACL of root-parent1
- child1, child2, grandchild3 all point at ACL root-parent1
- Later, parent1 changes its ACL information, but child1, etc. don’t have to change anything and just have 1 join to do, no recursion

Yep, I can chat any time this afternoon. I just hopped on #freenode, poke zopepaul when you’re around.

—Paul

> On Nov 3, 2016, at 1:54 PM, Jim Fulton <email address hidden> wrote:
>
> OK, I have something working and it seems to provide reasonable
> performance.
>
> I tested with queries as I believe they'd be generated by the
> pgtextindex. With a query for text containing "zuma", and also limiting
> to the african-advocacy community, the basic search takes around 10ms
> and yields 519 rows on the staging server.
>
> Adding security filtering adds 10-20ms, depending on the filter.
>
> If I don't filter for community, the raw search is ~18ms and the
> security-filtered search takes around 105ms. Note though that I only
> added ACE data for the african-advocacy community. It's possible/likely
> the filtered search would be faster for the larger data set if the full
> ACE data were present.
>
> I added 2 (temporary for my testing) tables:
>
> parents (docid int, parent_docid int)")
>
> ace (docid int, allowed bool, who varchar, permission varchar, ord
> int)
>
> I used a fairly straightforward recursive query, which I won't include
> here. Currently, I have a Python function that generates the query. I
> have some basic tests using psycopg2 and that connect to a (unspecified,
> so local, or controlled via environment variables) postgres database.
>
> Note that pgtextindex doesn't use sqlalchemy, so that shouldn't be an
> issue. :)
>
> Would you like me to check what I have in somewhere? Wanna get together
> online to discuss?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1638295
>
> Title:
> Propose SQL ACLs and filtered queries
>
> Status in KARL4:
> New
>
> Bug description:
> Jim will do some R&D...