Domain names containing emoji characters are not supported in console applications

Bug #1771109 reported by augeus
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
glibc (Ubuntu)
New
Undecided
Unassigned
libidn2 (Ubuntu)
New
Undecided
Unassigned
libidn2-0 (Ubuntu)
New
Undecided
Unassigned

Bug Description

Ubuntu release:
user@machine:~$ lsb_release -rd
Description: Ubuntu 18.04 LTS
Release: 18.04

Description:
What I did: Attempted to access a domain name containing an emoji character using curl and other terminal applications
What I expected to happen: Successfully interact with the server represented by said domain name
What happened instead: Got error saying that the domain contains a disallowed character.
Notes: Accessing such a domain using firefox works normally. Interacting with the domain using its punycode form works just fine. Accessing IDNs containing national characters (such as hxxp://ąćęłńóśźż.pl/) works as intended.

Steps to reproduce:
user@machine:~$ curl 📙.la
curl: (3) Failed to convert 📙.la to ACE; string contains a disallowed character

Disclosure: I own such a domain name, but it is not the domain provided in the example.

Tags: patch
Revision history for this message
augeus (augeus) wrote :

Curl is linked to libidn2.so.0, so tagging this as affecting this library.

affects: ubuntu → libidn2-0 (Ubuntu)
Revision history for this message
augeus (augeus) wrote :

Attached is a patch enabling resolution of emoji domain names on programs that use libidn2. It patches the script get-tables-from-iana.pl to set the state of emoji characters to PVALID, and uses the experimental perl "smartmatch" operator, which may make it unacceptable. Nevertheless, I believe that it provides a proof that a solution is possible, and might prove useful to people having to resolve such domains from a console application, so I am posting it here. It is provided with ABSOLUTELY NO WARRANTY. However, this patch does not affect applications using glibc to resolve domains. As such, I am tagging this as an issue in glibc, too.

Of course, I understand that IANA disallows the use of emojis in domain names. Perhaps a solution could be to resolve these domains, but print a warning on the user's console? I do not have time to think about this.

Revision history for this message
augeus (augeus) wrote :

> Of course, I understand that IANA disallows the use of emojis in domain names. Perhaps a solution could be to resolve these domains, but print a warning on the user's console? I do not have time to think about this.

I say this, because, although these domains are prohibited by IANA, some ccTLD registrars (such as .ws/.fm) allow their registration.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "gen-tables-from-iana.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
augeus (augeus) wrote :

Reuploading patch.

Revision history for this message
Robie Basak (racb) wrote :

Has this patch been accepted upstream? If not, I'm not sure it's appropriate for Ubuntu to diverge from upstream behaviour on this. Particularly because IDN type things are especially security sensitive.

Revision history for this message
augeus (augeus) wrote :

I opened an issue in upstream.

Revision history for this message
Simon Déziel (sdeziel) wrote :

wget has no problem with it:

root@b1:~# wget 📙.la
--2019-01-17 20:53:17-- http://xn--yt8h.la/
Resolving xn--yt8h.la (xn--yt8h.la)... 62.116.130.8
Connecting to xn--yt8h.la (xn--yt8h.la)|62.116.130.8|:80... connected.
...

root@b1:~# curl 📙.la
curl: (3) Failed to convert 📙.la to ACE; string contains a disallowed character

Yet, both link to libidn2:

root@b1:~# ldd $(which wget) | grep idn
 libidn2.so.0 => /usr/lib/x86_64-linux-gnu/libidn2.so.0 (0x00007fc8ce5ba000)
root@b1:~# ldd $(which curl) | grep idn
 libidn2.so.0 => /usr/lib/x86_64-linux-gnu/libidn2.so.0 (0x00007f295d476000)

So maybe it's a bug in curl itself?

Additional info:

root@b1:~# apt-cache policy curl wget
curl:
  Installed: 7.58.0-2ubuntu3.5
  Candidate: 7.58.0-2ubuntu3.5
  Version table:
 *** 7.58.0-2ubuntu3.5 500
        500 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
        100 /var/lib/dpkg/status
     7.58.0-2ubuntu3 500
        500 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages
wget:
  Installed: 1.19.4-1ubuntu2.1
  Candidate: 1.19.4-1ubuntu2.1
  Version table:
 *** 1.19.4-1ubuntu2.1 500
        500 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
        100 /var/lib/dpkg/status
     1.19.4-1ubuntu2 500
        500 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages

Revision history for this message
Simon Josefsson (simon-josefsson) wrote :

Which libidn2 version were you using?

I believe this has been fixed in libidn2 by now, but curl should use consider enabling Unicode TR46 from libidn2.

This is on a Ubuntu 22.04 clone (Trisquel):

jas@kaka:~$ idn2 📙.la
idn2: toAscii: string contains a disallowed character
jas@kaka:~$ idn2 -T 📙.la
xn--yt8h.la
jas@kaka:~$ curl --verbose 📙.la
* Trying 62.116.130.8:80...
* Connected to 📙.la (62.116.130.8) port 80 (#0)
> GET / HTTP/1.1
> Host: xn--yt8h.la
...

Revision history for this message
Simon Josefsson (simon-josefsson) wrote :

Sorry, to clarify: I think curl already changed here too, since the curl command now works.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.