Evergreen

Bug #1251394
Comment #38

Comment 38 for bug 1251394

Revision history for this message

Mike Rylander (mrylander) wrote on 2017-06-10:

#38

Hi Bill,

First, thanks for continually pushing this forward. I personally really appreciate it.

One of the near-term future benefits of display fields, which I'm hoping to work on in the next few months, is the ability to highlight search terms in the fields that were actually searched and matched by the user query. If we add display-centered adornments to the existing searched/browsed/faceted fields, then we know, for certain, that a matching field for search (or browse, or facet) is what we need to highlight inside of for display. A separate display class, though, would keep us from having a 1-to-1 correspondence between search and display.

We could add a different, external mapping table to tie search and display fields together, but then we're back to having a mapping table and we lose the direct knowledge.

I don't actually agree that display and search fields will be mostly orthogonal. In fact, display will be primarily a subset of search. We may certainly display less than we search (think: subfield h of 245), but I don't think we'd ever show more than we'd allow searching on, except in circumstances that would likely be covered by existing (or useful) facet or browse fields. For instance, facets are a step toward display fields already, in that the values we extract for facet limiting are the same we use for display of facet data. The only facet field in stock that is not also a search field is the authority id facet. Browse is in a similar position: there are 5 of 17 browse fields that are not search fields, but one is because of a sorting need special to browse and could probably be addressed without the separate field, and the others are driven by how cataloging is performed rather than separate needs for search/browse as performed by a user per se.

I do see where you're coming from WRT reingest, but I think we have much bigger fish to fry in that area. Separating out display fields for that reason will not make any of the existing TODOs any easier, and they still need design and doing. (I think queued ingest along with a wholesale rewrite of the ingest process will be needed, and I hope that starts sooner rather than later, but that's a separate concern and I don't think it should have much weight on this bug's direction.) That said, I do think special care should be taken to make sure that display-only reingest is as efficient as possible on day one.

WRT config, I don't see the UI issue the way you describe. It may be just a difference in design philosophy, but I don't see why we need to have a UI per table. I think the existing (or, perhaps, angularized) CMF config UI can be made to deal with display mapping. When configuring a field, if display_field is true, offer a modal action that creates or changes the mapping. We save a UI, and lead the user to what they need at the time they need it. And as far as having a mapping table in place, it will be tiny and not factor into performance (not that you claimed it would, I just want to make that clear.)

Also, I'm glad you mention the code/maintenance effect of adding a new class. Currently, the classes are not special -- they can all be used for any purpose, and their semantic importance is only defined by the data mapping that humans require. That is, we need (at least) 6 "slot types" and then an arbitrary set of named "keys" within each slot. Elevating the display fields to a slot type, rather than a way of using the data (is it a facet? can I search it? can I browse it?) significantly changes the where the "specialness" lies. (Of course, there is higher level code that cares about classes, but it's fine for higher level code to care. Code at the level of CMF should /not/ treat classes differently, only do with the data what the configuration tells it to do.)

Put another way, instead of saying "I can search, browse, facet, and display values in key=FOO of slot=BAR" as we do today, we'd be saying "I can do those things for these 6 slots, but slot 7 is special and I need to act differently with that one."

I think one of the strengths of the Evergreen codebase is that we have managed to maintain a pretty high degree of conceptual layering purity, and once you understand the /concept/ of, say, "there are classes into which we separate data from bib records, and within those classes there are named fields that you can configure for use in search/browse/faceting", then you can both configure the system and, potentially, extend the system in new ways that don't break old assumptions. And, yes, there are tons of details to CMF and friends configuration, and we haven't been perfect in this regard (we assume USMARC), but those details are /the same/ for all cases within the problem domain. I see making a new class for a special purpose that breaks those assumptions as increasing the total maintenance cost and (more importantly) conceptual burden on current and future devs.

So, those are my thoughts. Thanks, if you read this far... :)

Hi Bill,

First, thanks for continually pushing this forward.  I personally really appreciate it.

One of the near-term future benefits of display fields, which I'm hoping to work on in the next few months, is the ability to highlight search terms in the fields that were actually searched and matched by the user query.  If we add display-centered adornments to the existing searched/browsed/faceted fields, then we know, for certain, that a matching field for search (or browse, or facet) is what we need to highlight inside of for display.  A separate display class, though, would keep us from having a 1-to-1 correspondence between search and display.

We could add a different, external mapping table to tie search and display fields together, but then we're back to having a mapping table and we lose the direct knowledge.

I don't actually agree that display and search fields will be mostly orthogonal.  In fact, display will be primarily a subset of search.  We may certainly display less than we search (think: subfield h of 245), but I don't think we'd ever show more than we'd allow searching on, except in circumstances that would likely be covered by existing (or useful) facet or browse fields.  For instance, facets are a step toward display fields already, in that the values we extract for facet limiting are the same we use for display of facet data.  The only facet field in stock that is not also a search field is the authority id facet.  Browse is in a similar position: there are 5 of 17 browse fields that are not search fields, but one is because of a sorting need special to browse and could probably be addressed without the separate field, and the others are driven by how cataloging is performed rather than separate needs for search/browse as performed by a user per se.

I do see where you're coming from WRT reingest, but I think we have much bigger fish to fry in that area.  Separating out display fields for that reason will not make any of the existing TODOs any easier, and they still need design and doing.  (I think queued ingest along with a wholesale rewrite of the ingest process will be needed, and I hope that starts sooner rather than later, but that's a separate concern and I don't think it should have much weight on this bug's direction.)  That said, I do think special care should be taken to make sure that display-only reingest is as efficient as possible on day one.

WRT config, I don't see the UI issue the way you describe.  It may be just a difference in design philosophy, but I don't see why we need to have a UI per table.  I think the existing (or, perhaps, angularized) CMF config UI can be made to deal with display mapping.  When configuring a field, if display_field is true, offer a modal action that creates or changes the mapping.  We save a UI, and lead the user to what they need at the time they need it.  And as far as having a mapping table in place, it will be tiny and not factor into performance (not that you claimed it would, I just want to make that clear.)

Also, I'm glad you mention the code/maintenance effect of adding a new class.  Currently, the classes are not special -- they can all be used for any purpose, and their semantic importance is only defined by the data mapping that humans require.  That is, we need (at least) 6 "slot types" and then an arbitrary set of named "keys" within each slot.  Elevating the display fields to a slot type, rather than a way of using the data (is it a facet? can I search it? can I browse it?) significantly changes the where the "specialness" lies.  (Of course, there is higher level code that cares about classes, but it's fine for higher level code to care.  Code at the level of CMF should /not/ treat classes differently, only do with the data what the configuration tells it to do.)

I think one of the strengths of the Evergreen codebase is that we have managed to maintain a pretty high degree of conceptual layering purity, and once you understand the /concept/ of, say, "there are classes into which we separate data from bib records, and within those classes there are named fields that you can configure for use in search/browse/faceting", then you can both configure the system and, potentially, extend the system in new ways that don't break old assumptions.  And, yes, there are tons of details to CMF and friends configuration, and we haven't been perfect in this regard (we assume USMARC), but those details are /the same/ for all cases within the problem domain.  I see making a new class for a special purpose that breaks those assumptions as increasing the total maintenance cost and (more importantly) conceptual burden on current and future devs.

So, those are my thoughts.  Thanks, if you read this far... :)