Created attachment 112107
Remove combining characters from normalized text
This patch changes normalization so that combining characters are removed from the normalized text. This makes searching through TextPage::findText insensitive to these characters.
Also, renames unicodeNormalizeNFKC to unicodeNormalizeSearch to make it clear it's no longer doing a regular NFKC normalization.
Renames decomp_compat to decomp_compat_base because it now strips combing characters, leaving only base characters, in addition to compatibility decomposition.
Removes UnicodeCompTables.h and some compose functions. They're no longer needed since we're not recomposing the characters.
I'm not sure if UnicodeTypeTable.h and UnicodeCompTables.h are considered part of the public interface. They're included in the xpdf headers. Albert, is it OK to change these files in this way?
Created attachment 112107
Remove combining characters from normalized text
This patch changes normalization so that combining characters are removed from the normalized text. This makes searching through TextPage::findText insensitive to these characters.
Also, renames unicodeNormaliz eNFKC to unicodeNormaliz eSearch to make it clear it's no longer doing a regular NFKC normalization.
Renames decomp_compat to decomp_compat_base because it now strips combing characters, leaving only base characters, in addition to compatibility decomposition.
Removes UnicodeCompTables.h and some compose functions. They're no longer needed since we're not recomposing the characters.
I'm not sure if UnicodeTypeTable.h and UnicodeCompTables.h are considered part of the public interface. They're included in the xpdf headers. Albert, is it OK to change these files in this way?