HTML elements not always fully striped

Bug #684922 reported by NIXin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Eventum
Triaged
Undecided
Unassigned

Bug Description

A lot of the times clients send us e-mails in HTML. The problem is, some e-mail software or web software add some default <style> to the e-mails. Eventum removes the tags, but everything inside remains. This causes that the issue reported starts with, for example:

blockquote {padding-left: 1ex; margin: 0px 0px 0px 0.8ex; border-left: #cccccc 1px solid;} p {margin: 0px;padding: 0px;}
...and then we get the proper text.

A lot of the times the issue would also have HTML characters unparsed, such as &nbsp; or &oacute; for example, in between of the text.
This makes the issues harder to read.

Revision history for this message
Elan Ruusamäe (glen666) wrote :

please attach such sample email as bug attachment (obfuscate for your privacy it first)

and which way of email integration are you using? download_emails.php? or route_emails.php?

Revision history for this message
Elan Ruusamäe (glen666) wrote :

We'd like to figure out what's causing this bug for you, but we haven't heard back from you in a while. Could you please provide the requested information? Thanks!

Changed in eventum:
status: New → Incomplete
Revision history for this message
NIXin (nixin) wrote :
Download full text (4.1 KiB)

Hey, sorry for not replying, I was quite busy lately.
I'm using download_emails.php.

Here's an example e-mail, that we received just from couple hours ago (there's loads of them with unfiltered html).
Another thing is that &nbsp; is also left as is, not converted to spaces.

Raw e-mail:

Return-Path: <email address hidden>
X-Original-To: <email address hidden>
Delivered-To: <email address hidden>
Received: from localhost (localhost.localdomain [127.0.0.1]) by eventum.eventumsite.com (Postfix) with ESMTP id E003EA7C304 for <email address hidden>; Tue, 18 Jan 2011 23:42:00 +0100 (CET)
X-Virus-Scanned: Debian amavisd-new at eventumsite.com
Received: from eventum.eventumsite.com ([127.0.0.1]) by localhost (eventum.eventumsite.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YBqDvPKNRmBY for <email address hidden>; Tue, 18 Jan 2011 23:42:00 +0100 (CET)
Received: from mx4.origin.com (mx4.origin.com [212.77.101.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by eventum.eventumsite.com (Postfix) with ESMTPS id 7DC19A7C2FC for <email address hidden>; Tue, 18 Jan 2011 23:42:00 +0100 (CET)
Received: (wp-smtpd smtp.origin.com 17802 invoked from network); 18 Jan 2011 23:41:59 +0100
Received: from out.poczta.origin.com (HELO localhost) ([212.77.101.240]) (envelope-sender <email address hidden>) by smtp.origin.com (WP-SMTPD) with SMTP for <email address hidden>; 18 Jan 2011 23:41:59 +0100
Date: Tue, 18 Jan 2011 23:41:58 +0100
From: "Some Guy" <email address hidden>
To: Us <email address hidden>
Subject: =?ISO-8859-2?Q?PD=3A_My=3A_Go=B3?= =?ISO-8859-2?Q?blahblah?=
Message-ID: <email address hidden>

In-Reply-To: <email address hidden>
References: <email address hidden>
MIME-Version: 1.0
Content-Type: multipart/related; boundary="part4d361736dd7165.17138776"
X-Mailer: Interfejs WWW nowej poczty Wirtualnej Polski
X-User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.237 Safari/534.10
Organization: Poznaj Poczte WP http://poczta.origin.com/info-start.html
X-WP-IP: 87.206.240.222
X-WP-AV: skaner antywirusowy poczty Wirtualnej Polski S. A.
X-WP-SPAM: NO 0000000 [ISME]

This is a multi-part message in MIME format.

--part4d36173add7165.17138776
Content-Type: text/html; charset=iso-8859-2
Content-Transfer-Encoding: 8bit
Content-Disposition: inline

<style>blockquote {padding-left: 1ex; margin: 0px 0px 0px 0.8ex; border-left: #cccccc 1px solid;} p {margin: 0px;padding: 0px;} </style>
<p><br />
<blockquote><!-- blockquote {padding-left: 1ex; margin: 0px 0px 0px 0.8ex; border-left: #cccccc 1px solid;} p {margin: 0px;padding: 0px;} -->
<p>Text.&nbsp;</p>
<p>&nbsp;</p>
<p>More Text<img title="Laughing" src="cid:smiley-laughing.gif" border="0" alt="Laughing" /></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<br /></blockquote>
<br /></p><br />

--part4d361736dd7165.17138776
Content-Type: image/gif; name="smiley-laughing.gif"
Content-Transfer-Encoding: base64
Content-ID: <smiley-laughing.gif>
Content-Disposition: attachment; filename="smiley-laughing.gif"

R0lGODlhEgASANQTAFI9Da6qpuPQHKOGBqGVjPXnLO/v7r+qTnJeSPPwWdK+H8OrGsjHxWpTEsS8
nfbYEPryR+DYurebE/r6+v79cf32N4h1WtjV07qfLsq2at7SMnljE9XDMcKmMv77W...

Read more...

Elan Ruusamäe (glen666)
Changed in eventum:
assignee: nobody → Elan Ruusamäe (glen666)
status: Incomplete → Triaged
Andre (andre-champagne)
Changed in eventum:
status: Triaged → Fix Committed
Revision history for this message
Joseph Overocker (joverocker) wrote :

Can someone tell me what was fixed and committed? I recently upgraded from 2.3 to 2.3.1 and I still have the same issue. Here is what I show in the Description field:

Testing 2.3.1 upgrade and to see if the HTML issues are gone or not.&nbsp;Testing 1 2 3.&nbsp;&nbsp;
Description is currently collapsed. Click to expand.

Looking around online I am not seeing much help in resolving the issue which is a huge burden b/c we are often unable to read the emails that are submitted for tickets.

Revision history for this message
Elan Ruusamäe (glen666) wrote :

nothing is done, i don't know why random people like Andre (andre-champagne) change ticket bug status to "fix commited"

you can help here by providing patch, or if that you can't do, then attach more sample emails, maybe if someday somebody tries to fix, he can include your email as testing too

Changed in eventum:
assignee: Elan Ruusamäe (glen666) → nobody
status: Fix Committed → Triaged
Revision history for this message
Gavin Foster (gavinleefoster) wrote :

I have experienced a similar issue and my workaround is below. I'm not proposing this as a patch as I have not tested this fully, but it resolves the issue for me.

Add an extra line to the function insertIssue in class.issue.php as below.

private function insertIssue($prj_id, $usr_id, $data)
    {

  // decode html-encoded entities
  $data['description'] = html_entity_decode($data['description']);

Revision history for this message
Elan Ruusamäe (glen666) wrote :

you probably want to specify app charset as well:

string html_entity_decode ( string $string [, int $quote_style = ENT_COMPAT [, string $charset = 'UTF-8' ]] )

i.e:
$data['description'] = html_entity_decode($data['description'], ENT_COMPAT, APP_CHARSET);

Revision history for this message
Gavin Foster (gavinleefoster) wrote :

Thanks.

Elan Ruusamäe (glen666)
Changed in eventum:
milestone: none → 2.3.4
Elan Ruusamäe (glen666)
Changed in eventum:
milestone: 2.3.4 → 2.4
Revision history for this message
Elan Ruusamäe (glen666) wrote :

can not reproduce with your sample email. i get empty text for issue description. tested with git-master and eventum 2.3 branch

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.