Leap2A import problem: "simplexml_load_file()... parser error : PCDATA invalid Char value..."
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mahara |
Fix Released
|
Medium
|
Aaron Wells | ||
15.04 |
Fix Released
|
Medium
|
Unassigned | ||
15.10 |
Fix Released
|
Medium
|
Aaron Wells |
Bug Description
We had a report of a Mahara-generated Leap2a file that caused this crash stack upon attempting to import it:
[WAR] 38 (import/
Call stack (most recent first):
log_
error(2, "simplexml_
simplexml_
PluginImpor
PluginImpor
PluginImpor
import_
call_
Pieform-
Pieform:
pieform(
print_
Upon investigation it turned out that the leap2a XML file had a Vertical Tab character (ASCII x0A) in one of the page titles. There is a whole range of ASCII control characters that will cause a parser error in SimpleXML, and if they're placed in a Mahara page title, they will be included in the output of the Leap2a file, which will cause Mahara to crash when it attempts to import the file.
Changed in mahara: | |
status: | Fix Committed → Fix Released |
Using a for-loop and the PHP chr() command, I individually tested each ASCII character in the middle of an otherwise acceptable XML file. I tested them plain, after passing through htmlspecialchars(), and after passing through htmlentities(). Here is the list of the decimal integer codes for the ASCII characters that cause SimpleXML to choke. None of them are escaped by htmlspecialchars() or htmlentities().
$baddies = array(0, 1,2,3,4, 5,6,7,8, 11,12,14, 15,16,17, 18,19,20, 21,22,23, 24,25,26, 27,28,29, 30,31);
To help with testing (because it's not always easy to generate these characters), I've attached a file controlcharacte rs.txt which contains all 29 of these characters, in between <bad> tags. Depending on your text editor, opening it you may just see "<bad></bad>". But if you select the whole thing and paste it into a Mahara page title, you should be able to replicate the problem
To replicate:
1. Create a Mahara page with one or more of the forbidden characters in its page title
2. Export the page to Leap2a
3. Import the Leap2a file back into Mahara
Expected result: You've imported a copy of the page
Actual result: You get an error stack with the SimpleXML parser error as part of it.