Visual tag to represent narrow non-breaking spaces

Reported by Nicolas Delvaux on 2010-07-22
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Low
Nicolas Delvaux
diacritic
Invalid
Undecided
Unassigned

Bug Description

Bug #81281 implemented the [nbsp] representation for non-breaking spaces.

In French typography, narrow non-breaking spaces are needed before ";!?" chars (perhaps also before ":»" and after "«", this is not clear for these).
I found this is also used at least in Mongolian and in Persian.
This char (U+202F) is not supported every where yet (but no problem with good unicode fonts as found in Ubuntu default install for example). The best would be to have an automatic fall-back on nbsp (U+00A0) when U+202F is unsupported, but this is not a Launchpad issue.

So here we need a "[nnbsp]" tag which correspond to \u202F
"[nnbsp]" is a bit odd, a localised version may be better (eg "[fine]" in French), but I don't know if this fit Launchpad policy...

Related branches

description: updated
Nicolas Delvaux (malizor) wrote :

So here is a patch.
I propose to use "[nbthin]" instead of "[nnbsp]", it seems a bit better to me.
The "problem" is that this way it doesn't match the "narrow non-break space" acronym, but [nnbsp] is really too close to [nbsp] IMHO.

If you prefer [nnbsp], the patch is really easy to adapt.

Otherwise, feedback is welcome! ;-)

Nicolas Delvaux (malizor) wrote :

Here is another patch, which also implement the '[thinsp]' tag (which correspond to \u2009).
The 'Thin Space' char is used in some languages and used as a thousands separator for measures made with SI units.
(I still prefer the '[nbthin]' tag for nnbsp, even if nnbsp is not exactly a "no break thin space"...)

I wonder if this is also relevant to implement tags for all (or some other) types of spaces (see http://en.wikipedia.org/wiki/Space_%28punctuation%29#Table_of_spaces ).

verdy_p (verdy-p) wrote :
Download full text (5.0 KiB)

In fact the tag to display in a localized version of the interface could as well be [fine] when translating French. It will of course map to the Unicode NNBSP character.

And the fallback character used in font renderers is FIRST the THINSP character when it is mapped in a font (because fonts do not encode the breaking/non-breaking property, and becuse this property is implemented instead by the renderer or layout engine).

Otherwise, the renderer or layout engine will fallback to use one haf of the advance width used by the standard SPACE mapped in the font.

The fallback to NBSP (U+00A0) will be implemented by non graphic renderers such as terminal/console emulations or programs that export to plain text using an encoding that does not have any thin space, because it is more important to preserve any space width but to keep first the non-breaking property.

For GetText, a patch could be added so that it will return strings after conversion to a target encoding other than Unicode (but I don't think this is the job of Gettext, because such encoding conversions are part of transcoding libraries).

It's not the job of GetText to change the encoding, it just needs to return the text as it was encoded in the source .po/.pot files before they were compiled to an efficient binary form. It GetText processes only source .po/.pot files, these files should be encoded only in an encoding that is compatible with the ISO 646 IRV (for detecting its syntax), so that it can correctly detect newlines, equal signs, backslashes, and hashes starting commment lines, and their directives encoded just after that with a limited subset of ASCII.

That's why GetText can be used transparently on .po/.pot ressource bundle files encoded as Windows or Unix text files (CR or CR+LF), using only ASCII or any ISO 8859 part, or Windows ANSI or OEM/PC codepage, or any Chinese GB standard, or any Indian ISCII standard... With minor changes (including simple 1-to-1 bijective remappings), it also works with ressource bundles encoded in any EBCDIC codepage.

What this means is that it's up to the application using those ressources to know which encoding they were encoded with. Most of them however should be encoded with Unicode and all applications today should be able to transcode from Unicode back to the 'natural" encoding used by the application and initially used in their source bundles. And the same programs should also work without changes using UTF-8 at least, even if this means that they must support characters with variable lengths, or be able to work internally using UTF-16 with 16-bit "chars".

(Why did not the C/C++ standards honor this, and still does not define a standard type with a minimum size of 16-bit for working with UTF-16 code units, or a standard type with a minimum size of 21-bit for working with UTF-32 code units, is a complete mystery for me, when all other languages have adopted datatypes with standard sizes, and notably a standard 16-bit fixed size for what they call a "char", treated differently from other integers.)

Note that the separator for thousands is definitely not THINSP (because it is breakable), but also NNBSP (alias the French "fine").

If ...

Read more...

Nicolas Delvaux (malizor) wrote :

I created a branch which implements '[nbthin]' (it's my 'nbthin.patch' + some minor tweaks in 'doc/browser-helpers.txt').

I need some feedback before requesting merge (especially about '[thinsp]' support and/or whether I should convert this bug report to "Visual tags to represent all types of spaces").

Hi, thanks for working on this, and sorry for the late input — I've been busy with conferences.

The first question I have is: how do you insert this character otherwise? I.e. on Windows, Ubuntu, or MacOS/X? Is it mapped to a certain physical key? Are there problems with browsers "eating" this character as well? FWIW, that was the reason we allowed [nbsp] in the first place: it was essential for French, and web browsers would convert it to a regular space instead.

As for the gettext discussion, do note that there is Content-type PO header which also specifies the charset. And, gettext does the conversion from that encoding to the locale encoding a user uses.

Changed in rosetta:
status: New → Incomplete

(Also, please change the status back to 'New' when you respond to my questions above so it is brought to my attention — thanks!)

Nicolas Delvaux (malizor) wrote :

@Данило Шеган: With the French default layout (at least on Ubuntu), you can insert this character via "Alt Gr + v".

After some googling, I fund that you can also insert it in some Windows programs (eg MS Word) via "shift + ctrl + space" or "ctrl + space (on Writer, only on Windows apparently).

So this is supported quite widely.

On Ubuntu "Alt Gr + v" work well through Firefox. (and no eating problem)
This is also supposed to work with modern browsers (including IE 8) but some tests (again, thanks Google) say that it is not *displayed* rightly in, at least, IE 7, Safari 4 and Chrome 2 (I don't know if insertion works with these).

As said by verdy_p, fundamentally French typography require nnbsp, not nbsp.

So I think we need a tag for it in rosetta ([nbthin]?).
And, even if there seems to be no more "char eating" problem, we need it to easily spot the difference between
" " and " ".
It's much easier for reviewers ;-)

Changed in rosetta:
status: Incomplete → New

The discussion in bug #36977 seems to indicate that what you actually need is to replace [nbsp] implementation. What would [nbsp] be used for if we kept it?

Also, it seems it's not really a problem of input, so what we need to do is implement a "graphical" display like we have for spaces or newlines instead.

Nicolas Delvaux (malizor) wrote :

Yes, fundamentally we may replace all [nbsp] by nnbsp.

But currently nnbsp is not widely used or even known, it's not even on our translation guidelines (eg gnome-l10n-fr, lp-l10n-fr, ubuntu-l10n-fr...; even if these last 2 will surely change when this bug will be fixed).

As said in the first post and by verdy_p, even if nnbsp support seems to be ok now, there might be some bugs with some fonts or rendering engines.
And as nnbsp support is quite new in modern browsers, we may need to fallback on [nbsp] for webapps (to support IE 6/7 or such horrible things).

So I think that [nbsp] is still needed for compatibility purpose.

About a "graphical" display, this is a good idea.
But, unlike spaces or newlines, all keyboard layout do not have a mapping for nnbsp.
So it may still be necessary to input nnbsp via [nbthin] in some cases.

And, even if the "char eating" bug was fixed in Firefox, I'm not sure it was in other browser (I also tested with the last chromium build and it works).

Download full text (4.7 KiB)

> Message du 02/08/10 10:21
> De : "Nicolas Delvaux" <email address hidden>
> A : <email address hidden>
> Copie à :
> Objet : [Bug 608631] Re: Visual tag to represent narrow non-breaking spaces
>
>
> Yes, fundamentally we may replace all [nbsp] by nnbsp.
>
> But currently nnbsp is not widely used or even known, it's not even on
> our translation guidelines (eg gnome-l10n-fr, lp-l10n-fr, ubuntu-l10n-
> fr...; even if these last 2 will surely change when this bug will be
> fixed).
>
> As said in the first post and by verdy_p, even if nnbsp support seems to be ok now, there might be some bugs with
some fonts or rendering engines.
> And as nnbsp support is quite new in modern browsers, we may need to fallback on [nbsp] for webapps (to support IE
6/7 or such horrible things).
>
> So I think that [nbsp] is still needed for compatibility purpose.

Why? applications that don't support NNBSP can replace them automatically when exporting from Launchpad. Translators
won't have any issue, as the datbase will internally store NNBSP in its history. And there will be no difficulty to
enter it (there's already a way to input NBSP in the editor so that it is preserved in the input form, to avoid the
browser bugs that replace them by standard spaces. So they will be clearly visible.

And applications that are using resources can also preprocess the returned texts before using them, if their
renderer cannot support this character. note also that browser shave made very significant progresses during the
last months, and because of security issues, almost all users are forced to update their browser as soon as
possible. Who uses now a deprecated browser version ? Browsers are integrating the support of NNBSP even if the
selected font does not support it (in fact this support is now part of its text renderer or text layout engine).

> About a "graphical" display, this is a good idea.
> But, unlike spaces or newlines, all keyboard layout do not have a mapping for nnbsp.
> So it may still be necessary to input nnbsp via [nbthin] in some cases.

Not needed. In the online editor of Lauchpad you'll see [nbthin], exactly like you currently see [nbsp]. You don't
need any special keyboard assignment for entering [nbsp], as this is part of the Javascript support integrated in
the Launchpad edit form.

If some users can type NNBSP directly with their custom keyboard layout, no problem: the editor will still dispaly
the colored [nbthin] indicator. same thing if they are copy-pasting texts in the editor.

And anyway in the exports, such characters would be represented in the .po file as \u2009.

Unfortunately, in the latest Microsoft fonts for Windows Seven, Microsoft decided to drop this character from its
best font for UI "Segoe UI" (despite it was present since now very long in Times New Roman, and Arial).

Why ? because the character is handled automatically in the OS'es builtin text renderers, and reuses automatically
the same glyph as "THINSP", treating it as unbreakable for layout purpose, because it is part of the layout engine.
Fontss DO NOT need to include such duplicate mapping for the same glyph as they can't manage themselves the
unb...

Read more...

verdy_p (verdy-p) wrote :

> Message du 02/08/10 10:21
> De : "Nicolas Delvaux" <email address hidden>
> A : <email address hidden>
> Copie à :
> Objet : [Bug 608631] Re: Visual tag to represent narrow non-breaking spaces
>
>
> Yes, fundamentally we may replace all [nbsp] by nnbsp.
>
> But currently nnbsp is not widely used or even known, it's not even on
> our translation guidelines (eg gnome-l10n-fr, lp-l10n-fr, ubuntu-l10n-
> fr...; even if these last 2 will surely change when this bug will be
> fixed).
>
> As said in the first post and by verdy_p, even if nnbsp support seems to be ok now, there might be some bugs with
some fonts or rendering engines.
> And as nnbsp support is quite new in modern browsers, we may need to fallback on [nbsp] for webapps (to support IE
6/7 or such horrible things).

The character is not so new. It was just still missing in Uniode 3.2 and was added in Unicode, now years ago.
Unicode 6 is about to be released (there's already been Unicode 4.1, Unicode 5, Unicode 5.1, and the current Uniode
5.2... much time in fact when compared to the development of web browsers to include updated tables for line-
breaking properties, and for font renderers and text layout engines to add support for many new scripts and many new
punctuations and symbols).

Note that browsers and font renderers or layour engines are updated much more frequently than fonts (it's much
easier because OpentType font development is a really complex task that requires very specific skills and expertise,
also because almost all high-quality fonts are generally heavily protected by their licence).

Nicolas Delvaux (malizor) wrote :
Download full text (3.5 KiB)

Le lundi 02 août 2010 à 19:42 +0000, verdy_p a écrit :
>
>
> > Message du 02/08/10 10:21
> > De : "Nicolas Delvaux" <email address hidden>
> > A : <email address hidden>
> > Copie à :
> > Objet : [Bug 608631] Re: Visual tag to represent narrow non-breaking spaces
> >
> >
> > Yes, fundamentally we may replace all [nbsp] by nnbsp.
> >
> > But currently nnbsp is not widely used or even known, it's not even on
> > our translation guidelines (eg gnome-l10n-fr, lp-l10n-fr, ubuntu-l10n-
> > fr...; even if these last 2 will surely change when this bug will be
> > fixed).
> >
> > As said in the first post and by verdy_p, even if nnbsp support seems to be ok now, there might be some bugs with
> some fonts or rendering engines.
> > And as nnbsp support is quite new in modern browsers, we may need to fallback on [nbsp] for webapps (to support IE
> 6/7 or such horrible things).
> >
> > So I think that [nbsp] is still needed for compatibility purpose.
>
> Why? applications that don't support NNBSP can replace them automatically when exporting from Launchpad. Translators
> won't have any issue, as the datbase will internally store NNBSP in its history. And there will be no difficulty to
> enter it (there's already a way to input NBSP in the editor so that it is preserved in the input form, to avoid the
> browser bugs that replace them by standard spaces. So they will be clearly visible.
>
> And applications that are using resources can also preprocess the returned texts before using them, if their
> renderer cannot support this character. note also that browser shave made very significant progresses during the
> last months, and because of security issues, almost all users are forced to update their browser as soon as
> possible. Who uses now a deprecated browser version ? Browsers are integrating the support of NNBSP even if the
> selected font does not support it (in fact this support is now part of its text renderer or text layout engine).
This may require much more work than to simply keep [nbsp] for a
while...

We can start using [nbthin] everywhere and, if we find that it causes
trouble in some apps, we can use instead [nbsp] while the underlying bug
is being fixed.

This may be longer but IMHO this is easier.

> > About a "graphical" display, this is a good idea.
> > But, unlike spaces or newlines, all keyboard layout do not have a mapping for nnbsp.
> > So it may still be necessary to input nnbsp via [nbthin] in some cases.
>
> Not needed. In the online editor of Lauchpad you'll see [nbthin], exactly like you currently see [nbsp]. You don't
> need any special keyboard assignment for entering [nbsp], as this is part of the Javascript support integrated in
> the Launchpad edit form.
>
> If some users can type NNBSP directly with their custom keyboard layout, no problem: the editor will still dispaly
> the colored [nbthin] indicator. same thing if they are copy-pasting texts in the editor.
>
> And anyway in the exports, such characters would be represented in the
> .po file as \u2009.
Yes, this is exactly what my patch implements. ;-)

But, as an addition, a "visual tag" to represent [nbsp] or [nbthin]
would be great too.
The ...

Read more...

Nicolas Delvaux (malizor) wrote :

(about "visual tags")

In fact, all we need is an explanation of the meaning of the tag.
Like what we currently have for newlines or spaces (eg https://translations.launchpad.net/quickly/0.x/+pots/quickly/fr/10/+translate ) we should display something like this when the corresponding tag is used:

"[nbthin] represents a narrow non-break space. Learn more about its usage in your language guidelines."

What do you think?

There are a lot of uncertainities here for me. For instance, it seems it's already possible to use this without any special support from Launchpad. How used is it? Why is it not in any of the guidelines, and especially, is it being proposed upstream as well? I'd be happy to simplify this for French translators, but I wouldn't like to accept this before we get upstream and Ubuntu l10n team buy in first. And as everybody agrees on this bug, it's not necessary to be able to use it.

Also, displaying an explanation is going to be hard because this character would only appear in translations (descriptions for newlines and spaces come from them being in the original English string). And how would we let people learn about it when they are translating an untranslated message?

So, for now, until we get a more widespread recommendation that this is desired (and especially, agreement regarding both nbsp and nnbsp: which is used and when), I wouldn't like to introduce more code for us to maintain even if it's such a small bit. When there's upstream and Ubuntu l10n buy-in, please let me know and we can discuss introducing the one true way to do these things (i.e. either nbsp or nnbsp, and both only if that's really a must).

For now though, closed with "Won't fix", at least until there is enough widespread support for using it (for instance, if you can give me a count of upstream GNOME/KDE translated strings that are using NNBSP as opposed to NBSP, and that's a reasonably big ratio, I'd very happilly change my mind). If you solely want to work on a nice visual tag for displaying them (instead of inputting them), please re-open the bug as well :)

PS. As a side-note, we are doing test-driven development in Launchpad, so you'd have to provide appropriate tests for this feature as well if we were to accept it.

Changed in rosetta:
status: New → Won't Fix
Nicolas Delvaux (malizor) wrote :

I'm admin of lp-l10n-fr.
Even if I (and other) really want to push NNBSP, I won't change our guidelines before this bug is fixed.

Why?

Because most of our work here is to review translations.
If you can easily spot the difference between
"Bonjour !" (space) and
"Bonjour !" (NNBSP)
when one is just above the other, it's not as easy in the common case.

So without a tag to represent NNBSP I won't recommend it's usage.
And without a "reasonably big" use of NNBSP you won't implement a tag for it.

Damned, we are trapped! ;-)

I contacted the gnomefr team to know their feelings about this.
I will do the same with ubuntu-l10n-fr.

Otherwise, about a "graphical only" tag, the problem is that we may *need* to display an explanation for it (because it's abstract and it may represent any kind of things). Perhaps a tooltip may be sufficient and easier to implement?

Nicolas Delvaux (malizor) wrote :

Well, apparently Launchpad display the same char for space and NNBSP.
Copy this and paste it to the "Add comment" field, here you will see the difference:

"Bonjour !" (space) and
"Bonjour !" (NNBSP)

Nicolas, the reason why it wouldn't make much sense to implement right now is relatively simple: this is a feature that would only be used by French translators, and it would involve adding some special hacks in a few places. Launchpad is very complex and adding anything without proven value is going to cause us maintenance nightmares in the long run. I.e. we are probably going to break it even without realizing.

As for the explanation for the "graphical" tag: the nature of these are entirely different to those we have today (which are for English messages: this one would be on the translation instead). So, that means that we'd have to implement it for translations as well, and that's a little bit more work because we would want it to work in the future with more things we might introduce :)

Also, bug comments don't display the narrow nbsp as being narrower because they are using fixed-width fonts for display. It will probably depend on the fonts a user is using anyway, so just a simple display tag might be a good first step.

To get us out of the "trapped" position, let's wait for responses from gnome-l10n-fr and ubuntu-l10n-fr. Even if they just agree that it's a good idea, we should help you implement it for Launchpad — as long as there is a consensus about it :)

verdy_p (verdy-p) wrote :
Download full text (3.9 KiB)

I don't think that the visual tag should display any graphic in the HTML form. My opinion is that the colored [nbsp] is perfect where it is within the editable text area; but it should be great if, above or below the editable text area, a warning was clearly displayed explaining what this tag means : this would avoid that unaware translators replace it with a standard space, and would also explain why it is displayed like this, with a link to a more complete page explaining the "issue" (notably why the form needs to display this tag, because of some browsers bugs replacing the actual character by a SPACE, or simply because it allows easier identification of places where NBSP is used instead of SPACE).

So the solution found for the NBSP character (replaced by the visual tag [nbsp]) should also apply to any other character that has been replaced by a visual tag in the edit form. This will apply to NNBSP (replaced by the visual tab [nbsp]), or to the BiDi controls that are sometimes needed as well within ressources for Arabic or Hebrew (they will be replaced by a visual tag as well, even if the form still displays this tag with the appropriate Bidi control directly within the <span> element that replaces the character)

For example the LEFT-TO-RIGHT MARK (LRM) character (U+2009) would also be replaced by this tag in the edit form:
  <span class="visualtag" x-char="&#x2009;">[lrm]&#x2009;</span>
where the CSS stylesheet used by the edit form defines:
  visualtag {
    border: 1px solid #00F;
    background: #EEF;
    color: #00F;
    padding: 1px;
    line-height: 1.2;
    font-size: 85%;
    font-weight:bold;
  }

This would apply as well to RIGHT-TO-LEFT MARK (rlm) U+200F:
  <span class="visualtag" x-char="&#x200F;">[rlm]&#x200F;</span>

Just like the NON-BREAKING SPACE (NBSP) U+00A0:
  <span class="visualtag" x-char="&#x00A0;">[nbsp]</span>

And the NARROW NON-BREAKING SPACE (NNBSP) U+2009 (alias "fine" in French):
  <span class="visualtag" x-char="&#x2009;">[nbthin]</span>

And the THIN SPACE (THINSP) U+200A (alias "ultrafine" in French, different as it is breakable, unlike from the French "fine"):
  <span class="visualtag" x-char="&#x200A;">[thin]</span>

And the ZERO-WIDTH JOINER (ZWJ) U+200D (needed in some orthographies):
  <span class="visualtag" x-char="&#x200D;">[zwj]</span>

And the ZERO-WIDTH NON-JOINER (ZWNJ) U+200C (needed in some orthographies):
  <span class="visualtag" x-char="&#x200C;">[zwnj]</span>

However, no tag should be used for the COMBINING GRAPHEME JOINER (CGJ) which is needed in extremely rares cases, but that will be used just before a non-combined diacritic (Unicode added it to block canonical reorderings), so that this diacritic would no longer display correctly if it was replaced by a visual tag, unless all the following diacritics are ALSO explicitly replaced as well with a visual tag (based on the dotted circle U+25CC ‎◌) like:
  <span class="visualtag" x-char="&#x0300;">&#x25CC;&#x0300;</span>

The Javascript in the HTML edit form should also be able to do the conversion between actual characters and visual tags in both directions (thanks to the x-char="..." attribute which hides the actual character that can ...

Read more...

verdy_p (verdy-p) wrote :

Данило Шеган:

The bug comment above **perfectly** displays the difference between SPACE and NNBSP, even when using a monospaced font, because my browsers (Google Chrome, or Safari, or Opera, or Firefox, or IE8, all in their last versions : I did not test older browsers) will not find NNBSP in the default monospaced font selected in this page, and will CORRECTLY continue by looking for fallbacks in other, NON-monospaced fallback fonts where NNBSP will be found, or by performing a display-only substitution using THINSP which will be found in any one of these fonts, or by using a final fallback using the half of the width specified in the first font for the SPACE character (all this is made in the text-renderer or layout engine, the browser MUST NOT alter the DOM in the HTML page as this is not needed and in fact this would be a BOGOUS non-conforming behavior).

If it's not the case, that's because your browser is bogous and performing incorrect substitutions (this is exactly why we also need to replace NBSP by a visual tag, because some browsers are not just displaying a SPACE when NBSP is not within one font, but are effectively modifying the DOM content of the document, so that when submitting the HTML edit form, this SPACE will be sent back to the server, dropping the NBSP).

Suggestion: upgrade your browser !

All translators should work with up-to-date browsers to make sure that the text will not be garbled. The translated resources need to be as accurate as possible (they should reflect the state of the art for text-encoding). Then these ressources edited in Launchpad will be exported to applications using them, and it's up to applications to adapt themselves so that they will process these strings correctly (they may have to use their own local substitutions, but this won't affect the ressoruces themselves). Note that new translations are submitted to new applications that will only integrate these ressources within their most recent distributions, so the applications themselves must be corrected (submit bug reports to the developers of these applications, if they don't display corectly these ressources, and ask them to provide their own fallback mechanisms)

In other words, we really need visual tags for various reasons:
- avoiding bugs in browsers
- helping users see the difference (due to lack of differenciation with some fonts).
- maling ressources as accurate as possible
- helping tracking bugs in applications that can't render texts correctly, or that don't provide a working fallback mechanism.

verdy_p (verdy-p) wrote :

#18: edit

Replace "2009" by "200E" in my comment #28 for the left-to-right mark (LRM).
Sorry about the incorrect Unicode code point (I forgot to correct copy-pasted paragraphs when editing the message).

verdy_p (verdy-p) wrote :

Данило Шеган on 2010-08-03 :

I absolutely don't like your unilateral (and badly thought) decision to change this bug into "Won't fix". This is a very bad decision which ignores the same reasons that were used and really needed when adding visual tags for NBSP.
Please reopen this bug ! notably because there are other "problematic" characters that can't be distinguished by many translators.

I have suggested improvements to the Launchpad GUI to explain these visual tags when they are present. The HTML edit form should display automatically a warning to translators each time a visual tag is inserted in the edit form. This warning needs not be long, just a small sentence with a link to the appropriate page on the Lauchpad site explaining what are those visual tags, and listing all those that are currently known and implemented, and explaining their usage, so that translators will now be aware about what they mean in the GUI, and why they MUST be kept and MUST NOT be replaced by translators by using ASCII characters or by deleting them without cautious.

There are a big number of "problematic" characters. FWIW, even quotes might need explanation for some languages (i.e. Serbian uses two styles officially, yet translations usually go with only one). This is something that is explained in the translation style guide, linked from the top of translation pages. Just like sometimes you'd want to have explanation on why you are using certain terminology. If we put it alll on the translation page, it would be too crowded.

My browser is also fine, and it uses a monospace font which has a NNBSP and renders it as a full width space because it _is_ a _monospace_ font. Thanks for the suggestion, though.

The reasons why it's "won't fix" for now are explained above. I've also described how you can make some progress towards it not being "won't fix", and Nicolas seems to understand the reasons (and is doing something about it). Just posting numerous unrelated comments won't help. Also, the wider the scope, the less chances there are that it will be fixed. So, please keep this bug about NNBSP support.

Please try to be more civil in your communication. Thank you.

Changed in diacritic:
status: New → Invalid
Nicolas Delvaux (malizor) wrote :

> Suggestion: upgrade your browser !

Interesting, I tested with chromium and NNBSP are displayed properly here, whereas it's not the case with Firefox 3.6.8.
But I agree with Данило Шеган, if this is supposed to be a monospace font the bug is in chromium (and FF display NNBSP on other websites).

Anyway I don't understand why a monospace font is necessary here, but this is off topic, sorry ;-)

verdy_p (verdy-p) wrote :

You're wrong, a "monospace" font style will only apply to characters for which it is relevant:

It DOES NOT force any character to be monospaced, notably not those for which this is opposed to their semantics.

So Chromium or Chrome are PERFECTLY RIGHT (they absolutely don't have any bug here !) when it displays the NNSP has a narrow character, escaping the default rules applicable to other characters that don't have any explicit differences of semantics between the case where they are styled in a variable-width or monospaced font.

The "monospaced" font style in CSS is a hint that is meant only to select appropriate fonts, where it is relevant and multiple fonts are candidates. Some scripts (Arabic for example in some of its ligatures, or Japanese kanas which normally used half-width characters instead of square characters used in kanjis) will be definitely incorrectly represented, if they are rendered with monospaced glyphs only (and that's why East-Asians have their own set of fullwidth punctuations, separated from the standard ponctuations like dot and comma).

Characters have semantics, and if this semantic includes width restrictions, they should be preserved as much as possible. If this is not possible, then use a reasonnable "compatibility" fallback for the rendering, but DO NOT change the encoded abstract characters.

Hi all,

This is a very interesting discussion. I want to give my point of view as an admin of ubuntu-fr-l10n.

If we consider French typographic rules, Nicolas is right. And the use of a thin (or narrow) non-breaking space is more elegant.

For now, I think that it's a bad idea to change nbsp in Launchpad for the following reasons:
* all upstream teams (gnome-fr, kde-francophone, debian-fr, ...) are using nbsp and we need to be consistent with upstream policies
* we have to be sure that nnbsp (U+202F) is correctly rendered in all the user agents (i.e. web browsers, GTK or Qt GUI, TTY, help viewers, ...) whatever the font or the charset used

We need to have a discussion about that with upstream teams. If a consensus emerges to use nnbps instead of nbsp then we could reopen this bug.

Nicolas Delvaux (malizor) wrote :

Please note that this bug report is not about changing nbsp in Launchpad, it's just about *adding* the possibility to also use nnbsp in a convenient way.
There is no reason for [nbsp] to be dropped now or in the near future.

Personally I see this bug fix as the first step too nnbsp adoption.
It's obvious that some more testing have to be done before all upstream teams change their guidelines.
That's also why I can take the responsibility to change lp-l10n-fr guidelines to give this a real try so that we will really have something concrete to discuss with other teams.

But it's still the same, I need this bug to be fixed for this test.

A patch was proposed but rejected for now because it may cause "maintenance nightmares" (well, that's what I understood ;-)).
I don't really get it in fact, I can't imagine how someone can break this by mistake (or this mistake will also break [nbsp] and [tab]) so I don't see how this patch can add maintenance.

If this test in lp-l10n-fr do not work, the [nbthin] patch can be dropped if you want, at least while bugs are being fixed.

Nicolas Delvaux (malizor) wrote :

An update about this...

I ran many tests last month and I made a report about the narrow no-break space support in the free software ecosystem.
It's in French so I'm not sure it's useful in this bug report, but you can find the PDF here: http://malaria.perso.sfr.fr/fines/
(of course we discuss it on some l10n-teams mailing-list)

So, the important thing I noticed is that some toolkits/environments have rendering bugs but some just work.
All important bugs were reported (mainly Qt: http://bugreports.qt.nokia.com/browse/QTBUG-13280 ) and I fixed another (KBD/TTY: http://git.altlinux.org/people/legion/packages/kbd.git?p=kbd.git;a=commit;h=50f674d1775bc75f799c583b887c3329088ff620 ).
So, the situation is getting better.

But the interesting part (at least for this bug report) is mainly that all GTK apps support the narrow no-break space (on GNU/Linux, on Windows XP/Vista/Seven...).
So all GTK apps that host their translations on Launchpad could start using it now... if this bug get a fix.

Do you really need that a project as huge as Gnome starts to use nnbsp widely to just implement a fix?
(anyway, I will argue in favour of it's adoption in Gnome at the beginning of the next cycle)

Otherwise, I also noticed that there is in fact no easy way to input a nnbsp with Windows (it requires to mess around with REGEDIT...) and also that there is no short-cut for it on GNU/Linux with a Canadian layout (I have to verify this last one).
So we really need a tag such as [nbthin] to workaround those input problems.

The patch I submitted just add 6 lines to the code (I don't take documentation into account).
The code is the same that with other tags, so I can't imagine how it can break something that may have not break without it.
Could you please re-consider your opinion on this?

Download full text (3.8 KiB)

У пон, 20. 09 2010. у 18:10 +0000, Nicolas Delvaux пише:
> Do you really need that a project as huge as Gnome starts to use nnbsp
> widely to just implement a fix?

It's not a fix: it's a feature request. Launchpad fully supports nnbsp.
You are just having trouble inputting it, and I haven't seen conclusive
evidence suggesting browsers don't support it.

> (anyway, I will argue in favour of it's adoption in Gnome at the
> beginning of the next cycle)

Excellent. Please let us know how it goes.

> Otherwise, I also noticed that there is in fact no easy way to input a
> nnbsp with Windows (it requires to mess around with REGEDIT...) and
> also that there is no short-cut for it on GNU/Linux with a Canadian
> layout (I have to verify this last one).
> So we really need a tag such as [nbthin] to workaround those input
> problems.

I am simply not convinced: as someone who has worked on Serbian
localization from the ground-up for free software, I am familiar with
what it takes to get special characters supported in translations. For
instance, Serbian uses „these“ kind of quotes. No keyboard layout other
than the Serbian keyboard layouts in XFree and X.org support it (which I
helped develop). The way to fix a problem of the missing characters is
to enable their input: if the character is the right one to use, why
should you not use it as part of your daily input in French? (if you
don't like my example, think of many other Unicode characters that are
typographically more appropriate for text, but rarely used: em- and
en-dashes, minus, ellipsis, etc)

Is it not something that could be automatically produced instead? For
example, perhaps a simple post-processing step on French .po files
export would do the trick? Would such post-processing be possible (iow,
is it well defined)?

If something is supposed to be a better way of doing things, then we
should not limit that feature to Launchpad. Entire free desktop stack
should benefit from it. If it's missing in Canadian layout, perhaps
that's where it needs to be added first.

> The patch I submitted just add 6 lines to the code (I don't take
> documentation into account).

The patch you submitted is not really complete. And it touches stuff it
shouldn't touch. But, I am sure we can get it in order for landing.
And then, if it ends up in Launchpad, what's to guarantee that:
 - it will receive widespread use
 - we'll know what to do with it in the future
 - we won't keep getting requests for characters for specific languages
simply because people don't want to solve the problem at the right level
(i.e. keyboard input)

> The code is the same that with other tags, so I can't imagine how it
> can break something that may have not break without it.

A reminder: [nbsp] was introduced because of browser bugs. [tab] is
there because TAB key switches you to the next input field on a web
form.

> Could you please re-consider your opinion on this?
>
As I stated earlier, we can perhaps consider having a "special way to
*display* nnbsp". But, what about other spacing characters? Why is
nnbsp more important than them? For it to really become more important,
someone has to use them first :)

My opinion ...

Read more...

Also, let me repeat another thing: if [nnbsp] is the right thing to use *instead* of [nbsp], then we should get rid of the support for [nbsp]: there's no reason to make it easier to input a character that should not be used! (I know it'd be nice to keep it for compatibility reasons, but I am just pointing at what I believe is logical)

Nicolas Delvaux (malizor) wrote :
Download full text (5.5 KiB)

> You are just having trouble inputting it, and I haven't seen conclusive
> evidence suggesting browsers don't support it.

Oh, sorry, I did some testing with web browsers too (indeed, I forgot to
add this critical part in my previous comment, shame on me)
Here are some non working browsers (screenshots are in French, so please
read "nnbsp" instead of "espace fine insécable" ;-)):

- Up to Internet Explorer 8 on Windows Xp (and predecessors) → it
display a square instead of a nnbsp
( http://malaria.perso.sfr.fr/fines/images/fines_IE8_xp.png )
(note that it works with Vista and Seven, here the bug is in the OS
build-in renderer not in the browser itself)

- Because of a bug in Qt
( http://bugreports.qt.nokia.com/browse/QTBUG-13280 ) no Qt browser
currently support nnbsp: nothing is displayed.
I tested with konqueror
( http://malaria.perso.sfr.fr/fines/images/fines_konqueror.png ) and
Rekonq ( http://malaria.perso.sfr.fr/fines/images/fines_rekonq.png )

Same bug with Safari on Mac OS for example.

Basically, nnbsp is currently well supported in Firefox (all OS) and in
all GTK browsers (I did not test, but I was told that Chrome support it
too).
You can try your browser with this web-page for example:
http://malaria.perso.sfr.fr/fines/test_nnbsp.html

So, we may reasonably think that a tag may be needed for both input and
display of the nnbsp char. ;-)

> I am simply not convinced: as someone who has worked on Serbian
> localization from the ground-up for free software, I am familiar with
> what it takes to get special characters supported in translations. For
> instance, Serbian uses „these“ kind of quotes. No keyboard layout other
> than the Serbian keyboard layouts in XFree and X.org support it (which I
> helped develop). The way to fix a problem of the missing characters is
> to enable their input: if the character is the right one to use, why
> should you not use it as part of your daily input in French? (if you
> don't like my example, think of many other Unicode characters that are
> typographically more appropriate for text, but rarely used: em- and
> en-dashes, minus, ellipsis, etc)

But nnbsp is a special case.
Here reviewers need to figure out if the space in "Test !" is a nnbsp or
a regular space.
It's indeed doable without a tag, but you need a compatible browser, no
fixed-width font and you also need trained eyes.

It's really not comparable with „these“ ;-)

> Is it not something that could be automatically produced instead? For
> example, perhaps a simple post-processing step on French .po files
> export would do the trick? Would such post-processing be possible (iow,
> is it well defined)?

Well, perhaps it's doable.
But it can't be automatic for now, at least because not all frameworks
(mainly Qt) support nnbsp.

And, personally, I prefer translating things the good way directly.
Such an automation should just warn the user that he might has forgotten
something at line X, otherwise the translator should be trusted.
(anyway, it's a bit off-topic)

> The patch you submitted is not really complete. And it touches stuff it
> shouldn't touch. But, I am sure we can get it in order for landing.
> And then, if it ends up...

Read more...

У сре, 22. 09 2010. у 00:31 +0000, Nicolas Delvaux пише:
>
> However, I'm really sorry to have forgotten browsers tests in my
> previous comment (so you might want to re-re-consider your
> opinion ;-))

Yeah, I actually do. Ok, so I won't have too many objections to adding
nbthin (though "non-breaking thin" is probably not a good moniker :),
and it'd be nice to pop into #launchpad-dev on FreeNode IRC to discuss
it.

I am "danilos" there. If I am not around or non-responsive, "henninge"
or "jtv" might be as much (if not more :) help. But, anyone else on
that channel can guide you in preparing proper tests and writing proper
documentation, which seems to be the most important bits missing in your
existing patch.

Changed in rosetta:
status: Won't Fix → Triaged
importance: Undecided → Low

У сре, 22. 09 2010. у 00:31 +0000, Nicolas Delvaux пише:
>
> However, I'm really sorry to have forgotten browsers tests in my
> previous comment (so you might want to re-re-consider your
> opinion ;-))

Yeah, I actually do. Ok, so I won't have too many objections to adding
nbthin (though "non-breaking thin" is probably not a good moniker :),
and it'd be nice to pop into #launchpad-dev on FreeNode IRC to discuss
it.

I am "danilos" there. If I am not around or non-responsive, "henninge"
or "jtv" might be as much (if not more :) help. But, anyone else on
that channel can guide you in preparing proper tests and writing proper
documentation, which seems to be the most important bits missing in your
existing patch.

Nicolas Delvaux (malizor) wrote :

Ok, I will try to spot one of you. However it's hard to schedule, my free time does not seem to be compatible with your availability...
I will find a way. ;-)

У пон, 27. 09 2010. у 19:35 +0000, Nicolas Delvaux пише:
> Ok, I will try to spot one of you. However it's hard to schedule, my free time does not seem to be compatible with your availability...
> I will find a way. ;-)

Oh, then don't worry about finding us too much. #launchpad-dev should
be a useful place to get help almost 24h any weekday, and you're already
pretty close to a proper patch. You can probably find most information
you need on https://dev.launchpad.net/ but you are going to need less
time if you actually ask people around. :)

If that doesn't work for you either, you can just go and submit a merge
proposal (to merge your branch into lp:launchpad/devel), and we can work
through there (it's going to be async communication so it's again going
to take longer, but I've already made you wait long enough :).

Changed in rosetta:
assignee: nobody → Nicolas Delvaux (malaria)
status: Triaged → In Progress

Danilo : please don't accept this feature in Launchpad for now.
This will be very confusing for our contributors and it could lead to several problems.

 I've already exposed my point of view in another comment, but let me try to explain it more clearly.

 Using NNBSP instead NBSP leads to a very little visual improvement.
 NNBSP has never been used in French translations.
 And the more important, NNBSP is *not* supported by KDE/Qt applications.
 In KDE/Qt nnbsp is not display at all!
 In TTYs NNBSP is replaced by a small lozenge.

Before trying to use it in French translations we have to be sure that upstreams teams are agree to change their policies about non breaking space usage. We have to be sure that all applications have a correct support (or at least a fallback mechanism to nbsp) of nnbsp. We have to be sure that all the conversion process between po files and specific formats are treating nnbsp correctly.

Changed in rosetta:
milestone: none → 10.11
tags: added: qa-needstesting
Changed in rosetta:
status: In Progress → Fix Committed
verdy_p (verdy-p) wrote :
Download full text (5.3 KiB)

Reply to comment #35;

It doesnot matter if the character is incorrectly rendered. The decision to include it or not is part of a policy for each project.

And when Bruno Patri said that the character was rendered as a lozenge in TTYs, this makes non-sence, because it is fully dependant on the character set conversion that occurs within the terminal emulator, either on the server site (user's current locale parameter, or system/application locale parameter), or in the client terminal (and with its configured fonts.

Most terminal emulators today support UTF-8 as the encoding of choice on the network, as well as servers (at leasst Linux and most FreeBSD distributions, and since many years now, Sun/Oracle Solaris or IBM AIX as well).

The main problem was not the presence of NNBSP (which could already be entered, but the lack of accurate identification if it is rendered : we need a visual clue to make the difference between distinct spaces.

Then it's up to each translation project to determine which characters they accept. Most browsers now have accurate rendering, so there's no reason of refusing it (correct display of whitespaces is in fact a minimum demanded now for HTML5 which descibes the minimum subset of Unicode character properties that browsers should support (this includes the recognition of the non-breaking property, as well as the whitespace property, both of tem being STABLE, and that must be recognized as well for implementing IDNA safely).

For those environments that can't support Unicode encoded output and that are performing charset conversions, it will be part of the system libraries for charset conversions. Terminals should not have to worry about these characters. After all the same terminals will also have problems to render Chinese or Russian if they can't support Unicode or extended character sets, and the same projects that are supporting translations to Chinese, Russian, Arabic, Hebrew, Indic scripts, or Thai should not have to prohibit NNBSP for correct display.

Free Fonts and Free rendering libraries are now available, and should be updated to support these characters, even if a font does not map NNBSP but only THINSP (including the recent fonts shipped with Windows 7 such as "Segoe UI" which only maps THINSP, where "Times New Roman" maps NNBSP since long now : It is no longer needed for Windows 7 because its builtin renderer can automatically map NNBSP to the same glyph as THINSP, and caan otherwise emulate all whitespace characters present in Unicode 5.0 at least).

Qt is possibly late in its support, byt I think this is because it does not really implement the text renderer itself but depends on an old version of a library which is dependent of the target system for which it was built. Qt on Windows works and renders NNBSP correctly, even with the "Segoe UI" font that does not map the character. such mapping can also be performed on Linux/*nix by X11 font servers, even if the text renderer libraru does not implement it. Updating XFree86 (or similar) will work correctly to resolve the issue. It is also very simple to update the terminal emulator (XTerm, Telnet) because they are basic user applications which have ...

Read more...

Nicolas Delvaux (malizor) wrote :

@verdy_p: please translate your comment in French and mail it to traduc_AT_traduc_DOT_org

We discuss about the remaining problems with NNBSP support on this mailing-list (it's a kind of coordination list for all French translation projects in FOSS).

Here, it is a bug report about Launchpad. And the relative bug is now fixed.
(The [nnbsp] tag is not yet available, even in edge, but it should land soon).

ps: as a reminder, I made this report about the current support of NNBSP: http://malaria.perso.sfr.fr/fines/fines.pdf
If you observe something different and/or want to improve it, please raise the topic on the aforementioned mailing-list.

I did some QA on qastaging. This should be deployed to actual production server very soon.

tags: added: qa-ok
removed: qa-needstesting

Je l'ai fait mais visiblement cela n'a pas eu d'effet car mon message n'est pas paru sur la liste (je ne l'ai pas reçu, à moins que la liste ne me renvoie pas mes propres messages), et personne n'y a répondu.

> Message du 27/10/10 23:50
> De : "Nicolas Delvaux" <email address hidden>
> A : <email address hidden>
> Copie à :
> Objet : [Bug 608631] Re: Visual tag to represent narrow non-breaking spaces
>
>
> @verdy_p: please translate your comment in French and mail it to
> traduc_AT_traduc_DOT_org
>
> We discuss about the remaining problems with NNBSP support on this
> mailing-list (it's a kind of coordination list for all French
> translation projects in FOSS).
>
>
> Here, it is a bug report about Launchpad. And the relative bug is now fixed.
> (The [nnbsp] tag is not yet available, even in edge, but it should land soon).
>
> ps: as a reminder, I made this report about the current support of NNBSP: http://malaria.perso.sfr.fr/fines/fines.pdf
> If you observe something different and/or want to improve it, please raise the topic on the aforementioned mailing-list.
>
> --
> Visual tag to represent narrow non-breaking spaces
> https://bugs.launchpad.net/bugs/608631
> You received this bug notification because you are a direct subscriber
> of the bug.
>
>

Changed in rosetta:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers