regexec/regcomp fails on regular expression containing UTF-8 multi-byte characters

Bug #1428091 reported by Joep Jansen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
eglibc (Ubuntu)
New
Undecided
Unassigned

Bug Description

I want to do a regular expression match on UTF-8 formatted strings.
A simple example is matching a string consisting of 1 or 2 uppercase characters, including Ä,Ë,Ï,Ö,Ü.
The extended regular expression I use is:

'^[A-ZÄ-Ü]{1,2}$'

Expected behaviour:

Input Expect
------------------
Ä Match
ÄB Match
ABC Fail

Test using grep works OK:
$ echo Ä |grep -E '^[A-ZÄ-Ü]{1,2}$'
Ä
$ echo ÄB |grep -E '^[A-ZÄ-Ü]{1,2}$'
ÄB
$ echo ABC |grep -E '^[A-ZÄ-Ü]{1,2}$'

The same test using a simple test program using regex/regcomp:

$ ./regex Ä '^[A-ZÄ-Ü]{1,2}$'
MATCH (Ä) (^[A-ZÄ-Ü]{1,2}$)

$ ./regex ÄB '^[A-ZÄ-Ü]{1,2}$'
MISS (ÄB) (^[A-ZÄ-Ü]{1,2}$)

$ ./regex ABC '^[A-ZÄ-Ü]{1,2}$'
MISS (ABC) (^[A-ZÄ-Ü]{1,2}$)

It seems that the single symbol Ä counts as two symbols here, because this works:

$ ./regex Ä '^[A-ZÄ-Ü]{2}$'
MATCH (Ä) (^[A-ZÄ-Ü]{2}$)

Additional information:

$ lsb_release -rd
Description: Ubuntu 14.04.2 LTS
Release: 14.04

libc6:amd64 version2.19-0ubuntu6.5

Locale: en_US.UTF-8.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: libc6 2.19-0ubuntu6.5
ProcVersionSignature: Ubuntu 3.13.0-35.62-gatso 3.13.11.6
Uname: Linux 3.13.0-35-gatso x86_64
ApportVersion: 2.14.1-0ubuntu3.7
Architecture: amd64
CurrentDesktop: Unity
Date: Wed Mar 4 11:51:24 2015
Dependencies:
 gcc-4.9-base 4.9.1-0ubuntu1
 libc6 2.19-0ubuntu6.5
 libgcc1 1:4.9.1-0ubuntu1
 multiarch-support 2.19-0ubuntu6.5
InstallationDate: Installed on 2014-09-26 (158 days ago)
InstallationMedia: Ubuntu-Server 14.04.1 LTS "Trusty Tahr" - Release amd64 (20140722.3)
SourcePackage: eglibc
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Joep Jansen (joep-jansen) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.