Zorba

proposal for optimization in Unicode comparisons

Bug #867369 reported by Sorin Marian Nasoi on 2010-12-15

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Zorba	New	Medium	Sorin Marian Nasoi

Bug Description

I had a look today into the ICU documentation and I found something that I think we can use to optimize our usage of ICU:

Starting with ICU 4.2 they introduced
http://icu-project.org/apiref/icu4c/classStringPiece.html
that handle UTF-8 strings comparisons

This means that we do not need to do conversions from
"const char*" (our UTF-8 encoded strings) to
"UnicodeString" (ICU's internal UTF-16 strings)
for comparing 2 strings using ICU.

So, instead of using:
Collator::compare(const UnicodeString & , const UnicodeString & )

we can use:

Collator::compareUTF8( const StringPiece & , cnst StringPiece & , UErrorCode & )

What I propose is this:
- if the user has linked against ICU version greater or equal to 4.2, use the "StringPiece" class
- if the ICU version is less then 4.2, keep current behaviour for example, in src/util/utf8_util.h starting at line 682, instead of:

unicode::string us1;
unicode::string us2;

unicode::to_string(s1, &us1);
unicode::to_string(s2, &us2);

Collator::EComparisonResult result = ::Collator::EQUAL;

result = static_cast<Collator*>(collation->getCollator())->compare(us1, us2);

return result;

::UCollationResult result = ::UCOL_EQUAL ;
UErrorCode status = U_ZERO_ERROR;

result = static_cast<Collator*>(collation->getCollator())->compareUTF8(s1.c_str(),
s2.c_str(), status);

if(U_FAILURE(status))
{
assert(false);
}

return result;

Chris Hillery (ceejatec) on 2013-04-02

tags:

removed: runtime

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.