Comment 25 for bug 233528

Revision history for this message
MichaƂ Sawicz (saviq) wrote :

Forgot to paste comments for reviewer:

This bundle implements three ways of subtitle encoding detection, in
order:
a) user-defined list of encodings
b) chardet for automatic encoding detection
c) i18n-defined list of encodings

The user can define a list of encodings to try in the config file, first
encoding that will succesfully load the file will be used.

On initial install a) won't be used because the default encoding list is
empty. Automatic detection is done by python-chardet module [1].
Currently chardet is used in try ... except blocks, so it's not a hard
dependency, although it's much encouraged.

If a) is not used or fails and b) fails or is less confident of it's
findings than 0.9 on a [0, 1] scale, c) is tried - a list of encodings
defined by the translator for current locale is used just as in a). If
this fails, the encoding detected in b) is used.

The careful reviewer will see that this bundle does not introduce any
regressions - all subtitles are loaded as usual. The routines
implemented in this bundle can be tested as follows:

* on initial install without user- or i18n- defined encoding and no
python-chardet installed, subtitles will be loaded with default
gstreamer locale. Then config-file support should be tried, both with
correct list of encodings and one that will fail (i.e. ['ascii']). In
both cases the subtitles will load, but will be displayed correctly only
in the first case;
* after updating the translation template (setup.py pot_update) file and
catalog file for your preferred locale (setup.py update_catalog) and
subsequent build of the catalogs (setup.py build_po), the two previous
tests for user-defined encodings should be repeated. In this case the
failing example should be corrected by i18n support as long as the right
encoding was added in the language catalog;
* it's now time to install python-chardet (packaged for most major
distros) and run your tests again. Empty the config encodings list and
remove the compiled catalog files (*.mo) and the encoding should still
be detected properly and the subtitles displayed correctly. There are
mostly issues with differentiating WINDOWS-1250 from ISO-8859-2 (Central
European).

IMPORTANT:
Applying this bundle should be followed by adding python-chardet [1] to
the windows build.

[1] http://chardet.feedparser.org/

Cheers