option to specify desired output file encoding
Bug #1042146 reported by
Stefan Eriksson
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
irclog2html |
Triaged
|
Wishlist
|
Unassigned |
Bug Description
Hi I'm having issues with charset encoding, the Apache webserver is using iso-8859-1 charset but the filesystem on the server is UTF-8
so my *log.html are encoded with UTF-8 even though I edit irclog2html.py to have UTF-8 headers its not enough to show "åäö" correcly, is there a way to implement that irclog2html.py output iso-8859-15 encoded html files even though the OS is using UTF-8 as default, maybe a new flag or something?
I have "
<meta http-equiv=
but still have to do this to make it display ok:
iconv -f utf-8 -t iso-8859-15 -c \#channel.
to get it working.
summary: |
- encode log.html file with utf-8 encoding. + encode log.html file with specific encoding unrelated to default locale. |
description: | updated |
description: | updated |
description: | updated |
summary: |
- encode log.html file with specific encoding unrelated to default locale. + option to specify desired output file encoding |
To post a comment you must log in.
I feel should mention that the easiest (and most correct) way to fix your issue would be to fix your Apache configuration (AddDefaultCharset UTF-8), because the charset specified in the Content-Type header generated by Apache always overrides whatever <meta charset> you have in the HTML body.
I'm willing to consider a patch to irclog2html that would let the user specify the desired output encoding. The backend already supports this (see the `charset` class attribute of the various style classes), and the code already knows how to handle characters that cannot be represented in the output charset (by encoding them as numerical character references). All that is missing would be a command-line option like --output- charset= UTF-8 that would override formatter.charset if specified. The trickiest bit is probably handling the default charset -- currently every style can have a different default, and my unreasonable attachment to backwards compatibility inclines me to keep it that way.
Incidentally, you can get irclog2html to produce ISO-8859-1 HTML files today by choosing any of the older styles (simplett, tt, simpletable, table) with irclog2html --style stylename.