Evergreen

Dynamic Weighting for INDB Matchpoints

Bug #677074 reported by Thomas Berezansky on 2010-11-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Evergreen	Fix Released	Wishlist	Unassigned	Evergreen 2.1.0
	2.1	Fix Released	Undecided	Unassigned	Evergreen 2.1.0

Bug Description

Currently INDB circ and holds use a pre-defined weighting set for rule ordering. This can be changed via replacing the relevant "find" functions in the database, but this is not easily done for most people.

The attached patch is my first attempt at making INDB circ and holds use dynamic weighting in trunk.

The weight set for circ matchpoints is obtained based on the context ou of the circ (aka, where the circ is happening).

The weight set for hold matchpoints is obtained based on the item's circ library (aka, where the item lives).

Optionally, add an enabled circ.holds.weight_owner_not_circ internal flag to have the weight set for hold matchpoints be obtained based on the item's owning library (owner of the call number).

Warnings about this patch:

1 - I did not (yet) figure out how to add a permission (or set thereof) for editing, even though I added GUIs for doing so and links to them from the staff client.
2 - It is "lightly" tested, as in things still seemed to work and basic checks for the new functionality taking effect worked as expected, but that is about all I did.

I may write dokuwiki formatted documentation on how the weights end up working if it looks like this will be or is accepted.

Revision history for this message

Thomas Berezansky (tsbere) wrote on 2010-11-18:

First (lightly tested) attempt Edit (72.1 KiB, text/plain)

Revision history for this message

Thomas Berezansky (tsbere) wrote on 2010-11-18:

DCO visible by quick page search:

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

Signed-off-by: Thomas Berezansky <email address hidden>

Revision history for this message

Mike Rylander (mrylander) wrote on 2010-11-29:

My personal plan is to give this a thorough reviewing as soon as EG2.0RC1 is cut. This is expected in the first half of December.

Revision history for this message

Mike Rylander (mrylander) wrote on 2011-01-20:

This is an initial review, focusing on obvious (to me) issues. Not final, by any means, just intended to move things along. I'm working from Open-ILS/src/sql/Pg/upgrade/0XXX.config.matrix_weights.sql, which should include the full changes needed to apply the patch to a live db.

* permission.grp_ancestors_distance and friends should use the prevailing "explicit JOIN" style instead of WHERE-clause joins in order to match the general style for maintenance purposes.

* Need an fkey on config.weight_assoc.org_unit (perhaps it should be called "owner"?) and all fkeys on this table should have cascade clauses -- CASCADE on org_unit and SET NULL on the weights links.

* action.find_circ_matrix_matchpoint and action.find_hold_matrix_matchpoint
- Instead of using COALESCE to supply the default weighting, perhaps just set the fields of the "weights" variable if no weighting is found for the context_ou. Should be simpler (or as simple) to maintain, and removes the cost of many COALESCEs.
- Apply a LIMIT to the "Grab the closest set circ weight setting." Not strictly necessary, but more self-documenting

* Should probably go ahead and surface the user_home_ou bits in this patch, since refactoring it later seems, now, like make-work, when it's clear that users want it.

So mostly just style cleanup. The wait seems anti-climatic now, eh?

If you'll polish those bits I will go ahead and commit to trunk.

Revision history for this message

Dan Wells (dbw2) wrote on 2011-01-20:

Just to add another perspective, I think the "fall through" idea is a definite improvement, and sorely needed, but I am still wondering about this dynamic weight idea. Not having giving it nearly as much thought as what is shown here, I am left thinking we might be better served by simply providing for ordering the rules in a more manual fashion. The logic would then be something like:

1. Process the rules for the context org_unit in the order as set.
2. If a rule matches, set whichever attributes (circulate, duration, etc.) it sets. These are now frozen.
3. Proceed to following rules, climbing the org_unit tree if needed.
4. When you have a complete set of attributes, stop.

It also seems to me there are cases where you might (ideally) want the columns weighted differently depending on the actual values of the rule. At that point the simplicity of manually ordering the rules seems to become more worth it.

Revision history for this message

Thomas Berezansky (tsbere) wrote on 2011-01-20:

On "manual ordering", I assume by virtue of specifying a value for each row, and then using that for ordering (once rows are eliminated from conditions).

Is there a desire for an additional "weight adjustment" column? Combined with the provided "all equal" weight set you could (with proper values) effectively manually order rows.

I would likely implement it as a "take the weighting and add the adjustment value", which should allow for saying a given rule is much more (or less) important than others as long as it matches. Likely on the same scale as the existing weightings. Thus, a rule could be considered excessively important by giving it a very large weight adjustment value, ensuring it will come first whenever it matches.

And take note that this patch will not be implementing the fall through, but the fall through will be built upon this later.

Any thoughts?

Revision history for this message

Mike Rylander (mrylander) wrote on 2011-01-20:

I'm not opposed to a "weight" or "importance" column on the matchpoint tables, to be used as a Big Black Knob, where turning a matchpoint up to eleven would give it higher value, but ... I'm trying to image a case where that wouldn't simply force that matchpoint to be the only one ever chosen, all else being equal. In which case, why have more than one matchpoint like that -- the ordering of one matchpoint will always be the same. :)

Dan, could you provide a circ example (user, location, copy) where you have multiple matchpoints that could be chosen and a weighting field would be a better choice than, say, adding more detail to one or the other matchpoint (or both), or where more detail can't be provided (allowing better match choice) and a weighting field would /not/ simply force the same matchpoint to be chose every time?

Revision history for this message

Thomas Berezansky (tsbere) wrote on 2011-01-20:

Assuming you had two all but entirely filled in rulesets that differ in their duration rules and copy circ lib/user home lib setting. One has the copy circ lib, the other has the user home lib. Normally the two rows are equal in weight, but if they both match the user/item in the same way (lets say exactly) then the two rules are in an ambiguous state.

The code as I presented it would order the two rules by the database id to ensure consistent ordering every time. But, without knowing that, you have no way of determining which is which. And the database id may be lower on the one you care less about.

With a small weight adjustment you could push one to be slightly more important than the other, but perhaps not enough to change things when they don't match quite so exactly.

When fallthrough comes along I could see a major adjustment being used for something like a hard due date setting. Turn the dial to eleven for a specific permission group but only specify the hard due date option, not the duration or fine rule options. That rule will trump other rules for the hard due date, but regular ordering would resume afterward.

Revision history for this message

Dan Wells (dbw2) wrote on 2011-01-20:

I'm probably way in left field here, but let me start by saying I am thinking primarily in fallthrough terms, which I realize now are not necessarily part of the picture yet. If our goal is finding a single row to supply all the circ attribute values, then you are right, my comment doesn't make much sense. Instead, for a given circ, I am thinking along the lines of:

1) select all the rules which do not exclude the item in question
2) process the rules in a given order, filling in the circ attributes as you go

I am bringing this up now because I think this simplified setup might render the column weighting moot (the only evaluation is pass/fail), and also that it would translate pretty directly into an understandable interface. It probably doesn't meet a 'least possible number of rules' design goal, but it would certainly allow one to reduce some of the redundancy in the current matrix.

Revision history for this message

Thomas Berezansky (tsbere) wrote on 2011-01-20:

#10

I question how your "given order" would be defined.

If by a column in the rules, then each rule needs to be evaluated, and distinctions between different org unit based settings become harder to decide on overall. Not to mention the headaches of keeping all the ordering values up to date.

If by what column(s) are filled in, that is a weighting (and thus this still applies).

If by DB order, then in most cases the system default entry (hopefully with all result columns filled in) will win every time.

Revision history for this message

Dan Wells (dbw2) wrote on 2011-01-20:

#11

Yes, I am talking about plain old ordering values. What they lack in elegance they make up for in simplicity (which is a kind of elegance all its own :) It is also a common enough requirement in software that I can't think handling it would be a barrier.

I am not sure what you mean when you say "distinctions between different org unit based settings become harder to decide on overall". I'd expect that any rule set in the current ou will trump rules in its parent, and so on up the line.

If an administrator can simply set the order in which the rules are processed, then rules which are either important or unusual will be near the top and rules which are less important or more broad will be closer to the bottom. I think it provides the necessary flexibility while still keeping it pretty easy and natural to understand.

I'm well aware that I may be exposing my ignorance, as I haven't worked out every possible detail, but consider the following really simplified example. Say a library (BR1) has the following in mind. They want to circulate books for 2 weeks, cdroms for 1 week, DVDs for 3 days, and reference items (of any type) for 1 hour. However, they don't have DVD players, so reference DVDs are not subject to the one hour rule. If using rule order to determine importance, we might have a matrix like:

CIRC_MOD REFERENCE DURATION_RULE

dvd ANY 3 days
ANY TRUE 1 hour
book ANY 2 weeks
cdrom ANY 1 week
...

(ANY simply means the value is not set, and therefore matches everything)

We know that, no matter what, DVDs circ for 3 days, so we put that at the top. We also know that, other than that case, all other reference items circ for 1 hour, so we put that next. The last two lines then catch the other non-reference cases. For any given item, we find the first 'true' rule which sets duration rule, set it, then presumably proceed to other rules which will each set one or more of the other needed attributes while ignoring any specified durations.

Yes, I am talking about plain old ordering values.  What they lack in elegance they make up for in simplicity (which is a kind of elegance all its own :)  It is also a common enough requirement in software that I can't think handling it would be a barrier.

I am not sure what you mean when you say "distinctions between different org unit based settings become harder to decide on overall".  I'd expect that any rule set in the current ou will trump rules in its parent, and so on up the line.

If an administrator can simply set the order in which the rules are processed, then rules which are either important or unusual will be near the top and rules which are less important or more broad will be closer to the bottom.  I think it provides the necessary flexibility while still keeping it pretty easy and natural to understand.

I'm well aware that I may be exposing my ignorance, as I haven't worked out every possible detail, but consider the following really simplified example.  Say a library (BR1) has the following in mind.  They want to circulate books for 2 weeks, cdroms for 1 week, DVDs for 3 days, and reference items (of any type) for 1 hour.  However, they don't have DVD players, so reference DVDs are not subject to the one hour rule.  If using rule order to determine importance, we might have a matrix like:

CIRC_MOD        REFERENCE         DURATION_RULE

dvd             ANY               3 days
ANY             TRUE              1 hour
book            ANY               2 weeks
cdrom           ANY               1 week
...

(ANY simply means the value is not set, and therefore matches everything)

We know that, no matter what, DVDs circ for 3 days, so we put that at the top.  We also know that, other than that case, all other reference items circ for 1 hour, so we put that next.  The last two lines then catch the other non-reference cases.  For any given item, we find the first 'true' rule which sets duration rule, set it, then presumably proceed to other rules which will each set one or more of the other needed attributes while ignoring any specified durations.

Revision history for this message

Dan Wells (dbw2) wrote on 2011-01-20:

#12

The system apparently collapses spaces, so here is a second attempt at that table (fingers-crossed):

CIRC_MOD----REFERENCE----DURATION_RULE

dvd---------ANY----------3 days
ANY---------TRUE---------1 hour
book--------ANY----------2 weeks
cdrom-------ANY----------1 week

Revision history for this message

Thomas Berezansky (tsbere) wrote on 2011-01-21:

#13

I see where you are coming from, and I think that would work wonderfully in a small system with one or two libraries.

I don't think it would scale to 5, 10, 25, 50, 100, 200, or more libraries easily, especially as you have included a weighting to begin with. Your proposal would walk up the org tree, which means that the org tree has just become weighted, ignoring all else in the process.

That is the kind of weighting I want to get rid of, and both permission group and org unit do that without my patch.

Speaking of walking up the org tree, there are a number of trees involved in the matchpoints, and my patch will add one more to the circ matchpoints.

Ignoring holds for the moment, circ has or will have:
The "Context" OU, where the circ is happening
The Copy Owning Library, where the call number is located
The Copy Circ Library, where the item circulates from
The User Home Library, where the user is based
The User's Permission group, as assigned in the user profile

Each of those can match a rule exactly or by being a parent to the actual value. More than one can be set, and they may match differently. Static ordering can't easily take into account distance on those values, especially if they refer to a different part of the trees than the library owning the rules. Walking up all the trees would be very difficult to do code-wise for static ordering, and thus you have to pick and choose which trees to walk up first or only pick certain trees to walk up (and in what order). This is, again, partially how the current code works, walking up the permission tree and then the org tree. Copy owning and circ libraries are considered equally, but walked up together in a way.

All of this amounts to something being weighted before static numbering comes into play, unless I have overlooked something about how static numbering would allow for all of this to be accounted for without driving an admin or set of admins insane.

Revision history for this message

Dan Wells (dbw2) wrote on 2011-01-21:

#14

You are right, the ideas I have expressed to this point do not account for prioritizing all the different stakeholders which might be involved in any given circ. That is a problem that needs to be addressed, and maybe that is primarily what we are trying to do here, and I certainly support that.

My bigger concern is that we should be doing so in the context of simplifying the setup in general. After all, circulation policies are written by people and need to be understandable by patrons. Because of these natural limits, they tend to not be overly complicated, and we should take advantage of as many reasonable assumptions as we can, then provide fallback mechanisms only when the simple ones fail.

The primary questions we are trying to answer are:

1) Whose policies count in this situation?
2) What are those policies?

Right now the matrix tries to answer both questions at once, but maybe we could be better served by asking and answering them separately.

Revision history for this message

Thomas Berezansky (tsbere) wrote on 2011-01-21:

#15

I can't think of a good way to answer them separately without putting a hard-coded weighting in place somewhere or introducing administrative headaches.

If we ask "which org unit's policies apply?" then we may be limiting ourselves to weighting on org units.

If we ask "which permission group's policies apply?" then we may be limiting ourselves to weighting on permission groups.

If we remove some checks from the circ matchpoint table and instead place them in a rules table we are calling those checks less important, across the board, than the checks left in the circ matchpoint table.

I am willing to think about alternate ways of doing it, but I can't think of a decent way that doesn't put more emphasis on some pieces compared to others.

The entire goal of this patch is to ensure that any given consortium, or even any given library system or branch, can decide for themselves what fields in the matchpoint tables are most important to them.

I know there are some groups that consider the user's home library to be more important than their permission group or the current checkout location, for example. Others may think the owning or circ library should trump when possible because the library who owns the item should be the one deciding the policies on it going out.

If you look at some of the commented out portions of the patch you may notice I even have commented out code for making the permission group and org unit pieces optional as well. I wanted to make sure that if someone wanted to do that they had the code in front of them, rather than having to adapt other parts of the function, even if those two being optional may not work out right now in the base code. Some minor schema/IDL changes (I should document those) could make it so that permissions are based on the user home library field instead of the checkout location, for example, for those consortia that put more weight on that.

Revision history for this message

Dan Wells (dbw2) wrote on 2011-01-21:

#16

When it comes down to it, as long as we have some sort of "weight" column when fallthrough comes along, there is nothing to prevent me from zeroing any column weights which do not make sense for my situation and use this new column to sort as needed. Maybe down the road a 'basic' interface could be added which puts a pretty face on similar setup.

Along different lines, I am wondering if we could right now benefit from a 'global' value setting in the matchpoint rows. It would enter consideration like a null column value (i.e. all the time), but sort like a direct match.

So, in my previous example, we could have something like:

dvd---------GLOBAL----------3 days

so if reference was weighted higher than circ_mod, this would keep it above the generic ANY reference rule, and thereby not require redundant dvd/true and dvd/false rows. Make sense? Worth it?

Revision history for this message

Thomas Berezansky (tsbere) wrote on 2011-01-21:

#17

Second (lightly tested) version Edit (78.0 KiB, text/plain)

Discussion about future changes can happen on the dev list (or in new wishlist bug entries), for now here is an updated patch.

Again, lightly tested. Appears to go in cleanly and upgrade a pre-patch database, functionality appears to still be working.

Revision history for this message

Mike Rylander (mrylander) wrote on 2011-01-28:

#18

Lightly edited and committed to trunk. Thanks, Thomas!

Changed in evergreen:
status:	New → Fix Committed

Jason Stephenson (jstephenson) on 2011-10-05

Changed in evergreen:
milestone:	none → 2.1.0
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Patches

Add patch

Remote bug watches

Bug watches keep track of this bug in other bug trackers.