Email validation rejects top-level domains longer than 4 characters

Bug #1615280 reported by Mahara User on 2016-08-20
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mahara
Medium
Aaron Wells
15.04
Medium
Aaron Wells
15.10
Medium
Aaron Wells
16.04
Medium
Aaron Wells
16.10
Medium
Aaron Wells

Bug Description

This one has existed since 2006, but only become an issue with the opening up of TLDs over the past few years. (https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains)

Currently the email validation in pieform limits the TLD to between 2-4 characters (see pieform_rule_email() in htdocs/lib/pieforms/pieform/rules/email.php.)
That means people from .horse, for example, can't register. Changing the regex fixed my immediate problem, haven't tested how the other email validation points react. They use FILTER_VALIDATE_EMAIL and PHPMailer::ValidateAddress, so might be better.

Aaron Wells (u-aaronw) wrote :

Thanks for reporting this! I wasn't even aware that Mahara had code that was trying to validate email addresses via a simple regex like that. The code in question is pretty old, originally written in 2007 and last updated in 2009 (to allow a "+" sign in the alias part of the email address.)

Of course since then, the number of top-level domain names has exploded. There used to be only a few rare ones longer than four characters; now there are dozens: https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains#ICANN-era_generic_top-level_domains.

I find a few issues mentioned with PHP's "FILTER_VALIDATE_EMAIL" option, specifically:

1. Requires a "." in the domain name, so it fails on eg "root@localhost"
2. Doesn't work with emails containing non-ASCII characters. You can convert the domain part to punycode first, but I'm not sure if that works for the alias part.

On the other hand, both of those bugs are present in the current implementation! So switching to FILTER_VALIDATE_EMAIL would at least be an improvement.

Aaron Wells (u-aaronw) wrote :

Alternately, we could copy Moodle (again) and use a really liberal regex. Moodle 3.1 uses this one, which is nearly unchanged from 2001! The most recent functional update to it was in 2006, when it was changed to reject email addresses where the alias ends in a ".".

Basically, this regex checks for a list of acceptable characters for the alias, a "@", and a list of acceptable characters for the domain that includes at least one "."

/**
 * Validates an email to make sure it makes sense.
 *
 * @param string $address The email address to validate.
 * @return boolean
 */
function validate_email($address) {
    return (preg_match('#^[-!\#$%&\'*+\\/0-9=?A-Z^_`a-z{|}~]+'.
                 '(\.[-!\#$%&\'*+\\/0-9=?A-Z^_`a-z{|}~]+)*'.
                  '@'.
                  '[-!\#$%&\'*+\\/0-9=?A-Z^_`a-z{|}~]+\.'.
                  '[-!\#$%&\'*+\\./0-9=?A-Z^_`a-z{|}~]+$#',
                  $address));
}

Aaron Wells (u-aaronw) wrote :

After doing a quick survey of what other PHP projects are doing (including our own PHPMailer library), I think for now we should probably just use FILTER_VALIDATE_EMAIL throughout. As mentioned above, this is an improvement, but with the following known flaws:

1. No support for one-part domains (root@localhost)

2. No support for email addresses containing unicode characters (See https://en.wikipedia.org/wiki/International_email )

PHPMailer now includes a function to punycode the domain-part of an email address if it contains unicode, but it's not exposed as a static function, apparently because it's reliant on knowing the character set of the PHPMailer instance.

None of the big PHP projects currently support email addresses with unicode in the local part (before the "@"), although there are bugs raised with several of them, so we'll probably need to revisit this in a few years.

Aaron Wells (u-aaronw) wrote :

Hm, actually for constistency's sake, because all these email addresses ultimately get validate by PHPMailer, we should probably validate them via PHPMailer's validateAddress instead.

Aaron Wells (u-aaronw) wrote :
Aaron Wells (u-aaronw) wrote :

Test instructions:

It seems that not all email fields in Mahara are actually validate at the time of input. One that I have verified to be validated, is the one on the admin's "Account settings" screen when viewing the account settings for another user.

A useful utility for checking whether an email address is a valid format is https://isemail.info . For the specific use-case described in this bug, you can use the domain name "example.supplies", which uses the newish ".supplies" TLD.

1. Log in as admin
2. Create a new Mahara user with username "user1"
3. Go to Administration -> Users -> User search
4. Locate "user1" and click on their username, to bring up their "Account settings" screen.
5. Set the "Primary email" field to "<email address hidden>" (or another exotic but valid email address)
6. Click "Save changes"

Expected result: The email is updated
Actual result: The form is rejected with the message "Email address is invalid".

summary: - Email validation bug (long domains)
+ Email validation rejects top-level domains longer than 4 characters
Aaron Wells (u-aaronw) wrote :

Our testers found that the PHPMailer email validation accepts a few addresses that are actually invalid:

<email address hidden> (Joe Smith)
email@example
<email address hidden>
email@111.222.333.44444

However, it did not reject any valid addresses. So we could potentially tighten up the validation in the future, but I think preventing false-rejections is more important. Plus, Mahara sends out confirmation email messages whenever a user self-updates their email address, so that's a kind of "ultimate test" for validity.

Reviewed: https://reviews.mahara.org/6876
Committed: https://git.mahara.org/mahara/mahara/commit/1192c05d2a316dcb4f45cbca28b5a6f1f6acfdac
Submitter: Aaron Wells (<email address hidden>)
Branch: master

commit 1192c05d2a316dcb4f45cbca28b5a6f1f6acfdac
Author: Aaron Wells <email address hidden>
Date: Mon Aug 22 12:34:26 2016 +1200

Bug 1615280: More robust email validation

Because all of our emails need to pass PHPMailer's
validation method before they get sent (due to the way
PHPMailer is written) it makes the most sense to use
that for validation.

Change-Id: I232ab9496ce8fc295a49625c999b48215305216c
behatnotneeded: Covered by phpunit

Mahara Bot (dev-mahara) wrote :

Patch for "15.10_STABLE" branch: https://reviews.mahara.org/6909

Mahara Bot (dev-mahara) wrote :

Patch for "15.04_STABLE" branch: https://reviews.mahara.org/6910

Aaron Wells (u-aaronw) wrote :

The older version of phpmailer in 15.10_STABLE and 15.04_STABLE doesn't validate email addresses tightly enough. Since this is only a medium-priority bug, I'm going to kill the backport to those versions.

Anyone who's interested in fixing this for 15.04 or 15.10 yourself, you could probably start with my 15.04 patch, which centralizes all the email validation into the "sanitize_email()" function in htdocs/lib/mahar.php. Then just change the implementation of sanitize_email() to something that works for you, perhaps a regex or FILTER_VALIDATE_EMAIL.

Reviewed: https://reviews.mahara.org/6908
Committed: https://git.mahara.org/mahara/mahara/commit/9754e8d3b221c19f19f545338e9e8727a9aa82d0
Submitter: Aaron Wells (<email address hidden>)
Branch: 16.04_STABLE

commit 9754e8d3b221c19f19f545338e9e8727a9aa82d0
Author: Aaron Wells <email address hidden>
Date: Mon Aug 22 12:34:26 2016 +1200

Bug 1615280: More robust email validation

Because all of our emails need to pass PHPMailer's
validation method before they get sent (due to the way
PHPMailer is written) it makes the most sense to use
that for validation.

Change-Id: I232ab9496ce8fc295a49625c999b48215305216c
behatnotneeded: Covered by phpunit
(cherry picked from commit 1192c05d2a316dcb4f45cbca28b5a6f1f6acfdac)

Robert Lyon (robertl-9) on 2016-10-21
Changed in mahara:
milestone: 16.10.0 → none
Robert Lyon (robertl-9) on 2016-10-25
Changed in mahara:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers