MARC editor strips leading and trailing spaces from tabs

Bug #887263 reported by Jason Etheridge
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Undecided
Unassigned
Nominated for 2.0 by Jason Etheridge
Nominated for 2.1 by Jason Etheridge
Nominated for Main by Jason Etheridge

Bug Description

This basically happens anytime .toXMLString() is called on xml_record, such as when you save the record or toggle the flat-text editor, and is likely a consequence of our use of E4X (for having XML be a native datatype) and the XML standard not treating such space as significant by default. It's particularly heinous when a fixed field is thus stripped.

As a fix, I tried setting @xml:space="preserve" on the root node, and on the actual controlfield and datafield elements, but to no avail. Perhaps we can do something with CDATA?

This is what I last tried with xml:space:

diff --git a/Open-ILS/xul/staff_client/server/cat/marcedit.js b/Open-ILS/xul/staff_client/server/cat/marcedit.js
index 066a084..1e28639 100644
--- a/Open-ILS/xul/staff_client/server/cat/marcedit.js
+++ b/Open-ILS/xul/staff_client/server/cat/marcedit.js
@@ -492,6 +492,14 @@ function setFocusToNextTag (row, direction) {

 function createMARCTextbox (element,attrs) {

+ try {
+ var xml = new Namespace("xml", "http://www.w3.org/XML/1998/namespace");
+ element.addNamespace(xml);
+ element.@xml::space = "preserve";
+ } catch(E) {
+ alert('Error setting @xml:space: ' + E);
+ }
+
     var box = createComplexXULElement('textbox', attrs, Array.prototype.slice.apply(arguments, [2]) );
     box.addEventListener('keypress',function(ev) { netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect"); if (!(ev.altKey || ev.ctrlKey || ev.metaKey)) { oils_lock_page(
     box.onkeypress = function (event) {

Revision history for this message
Jason Etheridge (phasefx) wrote :

Dan pointed out the following:
http://old.nabble.com/E4X-and-whitespace-td2832695.html

Strangely, I get a syntax error whenever I try to include it in marcedit.js, though it executes without error in the Javascript Shell or in the debug toolbar in the MARC Editor context. But it doesn't seem to fix the behavior.

https://developer.mozilla.org/en/E4X_Tutorial/The_global_XML_object

Revision history for this message
Jason Etheridge (phasefx) wrote :

I was able to set ignoreWhitespace within each of the key handlers for the MARC textboxes without exceptions being thrown, but the spaces still get trimmed.

Revision history for this message
Thomas Berezansky (tsbere) wrote :

It would appear to be a combination of two issues:

XML.ignoreWhitespace defaulting to true, which can kill whitespace when parsing XML
XML.prettyPrinting, which kills whitespace on .toXMLString

I think this branch solves both problems:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/tsbere/fix_marc_edit_whitespace

Revision history for this message
Jason Etheridge (phasefx) wrote :

This fix also requires a modern xulrunner (that's what was causing my trouble earlier).

We decided in IRC it was to better to let the fix break older xulrunners than to come up with a workaround and allow folks to continue mangling MARC records, since we can't actually fix the problem with older xulrunners.

Committed Thomas's change to master, rel_2_1, and rel_2_0.

Thanks!

Changed in evergreen:
status: New → Fix Committed
Revision history for this message
Jason Etheridge (phasefx) wrote :

Clarification, read modern as latest 3.6.x/1.9.2.x series of xulrunner, since later versions cripple the use of remote XUL.

Revision history for this message
Jason Etheridge (phasefx) wrote :

Thomas committed an additional change to the branch, and I've cherry picked it into master, rel_2_1, and rel_2_0.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

We have found this seems to be a problem mostly with 008 tags because our marc templates start with 6 blanks in the 008. (We have since changed this.)

The attached script will find invalid 008 tags in your database. It will fix those that it thinks it can and will output to standard output a list of the record ids of those that it can't fix. This list could be used to manually fix the records with bad 008s.

You'll probably want to run this at night sometime, maybe via at, and capture the output in a file.

Revision history for this message
Jason Etheridge (phasefx) wrote :

In my case, it was the loss of trailing spaces in the 008 after language that got noticed.

Revision history for this message
Jason Stephenson (jstephenson) wrote : Re: [Bug 887263] Re: MARC editor strips leading and trailing spaces from tabs

My script fixes those, too.

If the length is 38 it appends 2 blanks on the end.

Ben Shum (bshum)
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.