authority.extract_headings & authority.heading_field.component_xpath not parsing headings as intended
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
New
|
Undecided
|
Unassigned |
Bug Description
I want to talk about this blob from the authority.
raw_text := NULL;
-- now iterate over components of heading
FOR component_node IN SELECT x FROM unnest(
-- XXX much of this should be moved into oils_xpath_
);
IF raw_text IS NOT NULL THEN
END IF;
END LOOP;
xfrm is the alias here for the database table "config.
For the unfamiliar, what the full authority.
Then, what it is supposed to do through the blob I've posted here, is to take each sequestered heading node, XPath out the child nodes (ie the subfields), then XPath out the text, clean up some whitespace, and then concatenate the text components together to create one heading string.
With the default settings, this is not what happens.
The XPath used to extract the component nodes from the heading is stored in the "component_xpath" column in authority.heading field. Currently, that value is "//mads21:*" where * in this case is not a wildcard but one of name, title, topic, temporal, geographic, or genre depending on what type of heading it is (so topical terms have "//mads21:topic"). But in both MARCXML and MADS/XML, subdivisions are not structured as child nodes of the component preceding it -- they're following-siblings, on the same level. So what actually happens is that only the components of a matching heading type are extracted and concatenated, while non-matching components are extracted separately. So if, for example, you have a Topical Term authority record with a built-in geographic subdivision, the function will split them up into separate strings.
This function is used mainly to populate authority.
Thankfully, there is already a viable solution as utilized elsewhere and explained here: https:/
I am suggesting that the default values of authority.
One somewhat significant change to this (aside from the function doing what it is supposed to do), is that wanting to include a particular name type would make the fixed component_xpath value a bit more cumbersome. //mads21: