DOMDocument::loadHTML() expecting ';'

Bug #1997291 reported by Gold
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mahara
23.04
In Progress
High
Unassigned

Bug Description

While running /lib/cron.php I noticed a lot of PHP Warnings on the next page load. These are also in the error log.

My current suspicion is that these are triggered when trying to send e-mail about forum activity.

The actual error:

DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity.

While this is a PHP Warning it isn't causing crashes. However, it will be filling up error logs and may be causing unexpected behaviour in other places.

This error is occurring whenever html2text() is called. This is calling HtmltoText which calls DOMDocument and this is where the error happens. When DOMDocument::loadHTML() is called the errors are thrown whenever a non-encoded ampersand is found in the document. i.e. & rather than &

Showing the error in an interactive shell:

php > # Example 1:
php > $s = '<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/></head><p>Forum topic</p><p><img width="1024" height="" style="" alt="body_fire.jpg" src="https://dev.mahara.local/artefact/file/download.php?file=193&embedded=1&group=1&topic=1&post=1"></p>';
php >
php > # Example 2 is to demonstrate a working version of the string:
php > $t = '<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/></head><p>Forum topic</p><p><img width="1024" height="" style="" alt="body_fire.jpg" src="https://dev.mahara.local/artefact/file/download.php?file=193&amp;embedded=1&amp;group=1&amp;topic=1&amp;post=1"></p>';
php >
php > $doc = new DOMDocument;
php > $doc->loadHTML($s);
PHP Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 2 in php shell code on line 1
PHP Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 2 in php shell code on line 1
PHP Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 2 in php shell code on line 1
PHP Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 2 in php shell code on line 1
php > $doc->loadHTML($t);
php >

The examples I've been finding are in interaction_forum_post's with images in them.

The specific code that causes this to come about looks to be in in prepare_post_body() in htdocs/interaction/forum/lib.php. This is explicitly stripping out &amp; and leaving just the & character in any tags that have a call to download.php when a post is saved.

Revision history for this message
Gold (gold.catalyst) wrote :
Gold (gold.catalyst)
Changed in mahara:
milestone: none → 23.04.0
milestone: 23.04.0 → 22.10.1
Revision history for this message
Mahara Bot (dev-mahara) wrote : A patch has been submitted for review
Revision history for this message
Kristina Hoeppner (kris-hoeppner) wrote :

To replicate the issue:

1. Set up a group and add at least two members.
2. Make sure that everyone is subscribed to the standard forum. That should be the default setting.
3. Create a forum post that includes an embedded image. You can also attach an image to verify that this part is still working. However, since the & is involved, it's the image appearing in TinyMCE that is in question.
4. Run the cron and then refresh the page.

Results:
- Expected: The cron runs through without warnings and an email notification is sent to the subscribed people.
- Error messages referring to the DOM are displayed. The email is sent but doesn't display the image nor an active link to it, just a reference. The notification in the Mahara inbox has something like '![filename]'

Robert Lyon (robertl-9)
Changed in mahara:
milestone: 23.04.0 → none
no longer affects: mahara
no longer affects: mahara/22.10
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.