proposal for optimization in Unicode comparisons

Bug #867369 reported by Sorin Marian Nasoi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zorba
New
Medium
Sorin Marian Nasoi

Bug Description

I had a look today into the ICU documentation and I found something that I think we can use to optimize our usage of ICU:

Starting with ICU 4.2 they introduced
http://icu-project.org/apiref/icu4c/classStringPiece.html
that handle UTF-8 strings comparisons

This means that we do not need to do conversions from
"const char*" (our UTF-8 encoded strings) to
"UnicodeString" (ICU's internal UTF-16 strings)
for comparing 2 strings using ICU.

So, instead of using:
Collator::compare(const UnicodeString & , const UnicodeString & )

we can use:

Collator::compareUTF8( const StringPiece & , cnst StringPiece & , UErrorCode & )

What I propose is this:
- if the user has linked against ICU version greater or equal to 4.2, use the "StringPiece" class
- if the ICU version is less then 4.2, keep current behaviour for example, in src/util/utf8_util.h starting at line 682, instead of:

 unicode::string us1;
 unicode::string us2;

 unicode::to_string(s1, &us1);
 unicode::to_string(s2, &us2);

 Collator::EComparisonResult result = ::Collator::EQUAL;

 result = static_cast<Collator*>(collation->getCollator())->compare(us1, us2);

 return result;

do

 ::UCollationResult result = ::UCOL_EQUAL ;
 UErrorCode status = U_ZERO_ERROR;

 result = static_cast<Collator*>(collation->getCollator())->compareUTF8(s1.c_str(),
s2.c_str(), status);

 if(U_FAILURE(status))
 {
   assert(false);
 }

 return result;

Chris Hillery (ceejatec)
tags: removed: runtime
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.