Leap2A importer: improve robustness of <content> importing

Bug #984575 reported by Nigel McNie
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mahara
Won't Fix
Medium
Nigel McNie

Bug Description

In import/leap/lib.php, function fix_artefact_reference.

If a <content> tag in a Leap2A import has more than one child node, the importer gets confused and only imports one of them (either the first or the last, not sure which).

E.g.:

        <content type="html">&lt;p&gt;&amp;nbsp;&lt;/p&gt; &lt;p&gt;a paragraph&lt;/p&gt; &lt;p&gt;&amp;nbsp;&lt;/p&gt</content>

This is:

<content>
  <p>
  <p>
  <p>

Which isn't handled properly. I think the leap2A spec mentions that content like this is not a good idea, but the fix seems pretty easy. Patch attached.

Tags: import leap2a
Revision history for this message
Nigel McNie (nigel-mcnie) wrote :
Changed in mahara:
status: New → In Progress
importance: Undecided → Medium
milestone: none → 1.6.0
assignee: nobody → Nigel McNie (nigel-mcnie)
Revision history for this message
Hugh Davenport (hugh-davenport) wrote :
Revision history for this message
Kristina Hoeppner (kris-hoeppner) wrote :
Revision history for this message
Son Nguyen (ngson2000) wrote :

I have done several tests for this patch. However, it did not work as expected.
Some child HTML tags of <content> tag did not import such as <p>, <a>, <img>, <table>, <ol>, and <ul>.

See my import file in the attached file?field.comment=I have done several tests for this patch. However, it did not work as expected.
Some child HTML tags of <content> tag did not import such as <p>, <a>, <img>, <table>, <ol>, and <ul>.

See my import file in the attached file

Changed in mahara:
milestone: 1.6.0 → 1.7.0
Aaron Wells (u-aaronw)
Changed in mahara:
milestone: 1.7.0 → 1.8.0
Aaron Wells (u-aaronw)
Changed in mahara:
milestone: 1.8rc1 → 1.8.0
Aaron Wells (u-aaronw)
Changed in mahara:
milestone: 1.8.0 → 1.8.1
Revision history for this message
Robert Lyon (robertl-9) wrote :

According to the documentation here: http://www.leapspecs.org/2A/literals#content_or_description

We should not be trying to import <content></content> as HTML (all escaped) but rather as XHTML (not escaped)

eg, instead of this:
<content type="html">&lt;p&gt;&amp;nbsp;&lt;/p&gt; &lt;p&gt;a paragraph&lt;/p&gt; &lt;p&gt;&amp;nbsp;&lt;/p&gt</content>

it should be more like this:

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<p> </p> <p>a paragrapgh</p> <p> </p>
</div>
</content>

where the containing <div> gets stripped on import

So on import should we take the contents of <content type="html"> unescape it, then run it thru html tidy and import it as xhtml?

Changed in mahara:
milestone: 1.8.1 → 1.9.0
Aaron Wells (u-aaronw)
Changed in mahara:
milestone: 1.9.0 → 1.10.0
Aaron Wells (u-aaronw)
Changed in mahara:
milestone: 1.10.0 → 1.11.0
Changed in mahara:
status: In Progress → Won't Fix
Robert Lyon (robertl-9)
Changed in mahara:
milestone: 15.04.0 → 15.04.1
Robert Lyon (robertl-9)
Changed in mahara:
milestone: 15.04.1 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.