Need function to canonicalize XML

Bug #1076919 reported by Chris Hillery
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zorba
Fix Released
High
Juan Zacarias

Bug Description

We need an XQuery function to canonicalize XML (that is, implement the "canonical XML" spec: http://www.w3.org/TR/xml-c14n ). libxml2 already provides this functionality, and it is used by the current Zorba testdriver. We just need a way to access it from XQXQ.

I propose the following signature:

  declare function xml:canonicalize($xml as xs:string) as xs:string;

That would allow people to feed the output of fn:serialize() directly into this function.

Another option would be to have a Zorba-specific option for fn:serialize(); however, I think this would be more work, and from what I can see of the libxml2 interface, it wouldn't necessarily allow for more efficient code anyway. But if it can be done that way, great!

Related branches

Revision history for this message
Chris Hillery (ceejatec) wrote :

This functionality needs to be in a core Zorba module. It would make sense for it to be in the core XML module http://www.zorba-xquery.com/modules/xml - although that module current uses the prefix "parse-xml" which perhaps should be changed.

However, implementing functions in a built-in module is difficult. I therefore propose the following plan:

1. Luis implements a core module to do this, http://www.zorba-xquery.com/modules/xml/canonicalize . Use the normal DECLARE_ZORBA_MODULE() method of having the C++ code in canonicalize.xq.src and so on.

2. After that is done, if we feel it is necessary, Matthias or someone else with Zorba runtime knowledge can either do the work to make it a true built-in module, or explain the necessary steps to Luis to do it.

Changed in zorba:
status: New → Confirmed
importance: Undecided → High
milestone: none → 2.8
Revision history for this message
Chris Hillery (ceejatec) wrote :

FYI, Luis, see zorba/test/commons/testdriver_comparator.cpp, function canonicalizeAndCompare(). This uses the libxml canonicalization interface to canonicalize two files on the filesystem by reading them into in-memory documents and then saving them back out in canonical form. I believe there are similar functions in libxml to accept XML from an in-memory string, and to canonicalize to an in-memory string, which is what you would need. In fact, it looks like canonicalizeAndCompare() also uses xmlReadMemory() to load a document from an in-memory string in some cases.

This actually should be a pretty small amount of code; I don't think the body of the main evaluate() function for xml:canonicalize() will need to be more than a couple dozen lines.

tags: added: fots-driver
Changed in zorba:
assignee: nobody → Luis Rodriguez Gonzalez (kuraru)
Changed in zorba:
status: Confirmed → In Progress
Revision history for this message
Chris Hillery (ceejatec) wrote :

Re-assigning to Juan as he is finishing up the implementation.

Changed in zorba:
assignee: Luis Rodriguez Gonzalez (kuraru) → Juan Zacarias (juan457)
Chris Hillery (ceejatec)
Changed in zorba:
milestone: 2.8 → 2.9
Chris Hillery (ceejatec)
tags: added: hotlist
Revision history for this message
Chris Hillery (ceejatec) wrote :

Juan - just added this to your hotlist. Please address Matthias' comments - definitely rewrite and re-organize the comment, and if it is not a huge amount of work, add the canonicalize#2 function with the arguments Matthias mentioned.

Chris Hillery (ceejatec)
Changed in zorba:
status: In Progress → Fix Committed
Changed in zorba:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.