Inline formatting within links are rendered in link text

Bug #1060784 reported by r
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Nyctergatis Markup Engine
Fix Released
Low
Yves Piguet

Bug Description

Consider this Creole:

[[**abc]]

[[__abc]]

[[//abc]]

which renders this HTML:

<p><a href="**abc"><b>abc</b></a><b></b></p>
<p><a href="__abc"><u>abc</u></a><u></u></p>
<p><a href="//abc"><i>abc</i></a><i></i></p>

As you see, the inline formatting "**", "__" and "//" are rendered in the link context instead of being output as such.

If I instead use

[[~*~*abc]]

it gets the link text right, but not the link target:

<p><a href="~*~*abc">**abc</a></p>

I am not sure which option is right in terms of the Creole standard, but the current implementation is clearly wrong.

Revision history for this message
Yves Piguet (yves-piguet) wrote :

The four cases are consistent (in a hyperlink where the target isn't specified explicitly, the literal content of the brackets is used as the target and the text is parsed), but the result doesn't make much sense. I don't like the empty <b></b> etc. after the link, either. At least the result is valid HTML.
Since syntax errors don't exist in Creole, I think the best solution is to discard markup in the link target.

Changed in nme:
status: New → Confirmed
importance: Undecided → Low
assignee: nobody → Yves Piguet (yves-piguet)
Revision history for this message
r (ralfjunker) wrote :

Thanks for looking into this problem!

I stumbled upon this because I auto-created Creole syntax which links to a file named "__abc.txt". Hence I would like to see both the link and the link text show exactly "__abc.txt". I then realized that "__" starts underscore formatting and I should have escaped it to "~_~_abc.txt" to avoid the inline text formatting.

I think we have a few issues here:

* "__" starts underscore formatting but is not properly nested because it is not terminated within the link. Consider this example:

{{{
[[__abc.txt]] def__ ghi
}}}

* Escapes are not removed for the link text but not for the link target:

{{{
[[~_~_abc.txt]]
}}}}

I think that inline formatting can be allowed within [[...]] but should be implicitly terminated within, just as it happens to headings. Example:

{{{
== __test
more test
}}}

So this:

{{{
[[**abc]]

[[__abc]]

[[//abc]]
}}}

would ideally lead to the this:

{{{
<p><a href="abc"><b>abc</b></a></p>
<p><a href="abc"><u>abc</u></a></p>
<p><a href="abc"><i>abc</i></a>/p>
}}}

And these escapes:

{{{
[[~*~*abc]]

[[~_~_abc]]

[[~/~/abc]]
}}}

would render as

{{{
<p><a href="**abc"><b>**abc</b></a></p>
<p><a href="__abc"><u>__abc</u></a></p>
<p><a href="//abc"><i>//abc</i></a>/p>
}}}

Revision history for this message
r (ralfjunker) wrote :

Note: Please disregard curly braces. I believed they would enter code mode. Unfortunately, Launchpad does not have a message preview. :-(

Revision history for this message
Yves Piguet (yves-piguet) wrote :

In your case, with the current NME code, you could generate something like

[[__foo|{{{__foo}}}]]

It isn't uncommon for urls to contain tildes, so I think it's better to avoid interpreting them in a special way. What about this:

- if the link target and the link text are separate, continue as in the current implementation where the target is used verbatim and the markup in the link text is interpreted the usual way;

- if there is no separate link text, use verbatim what's between the brackets.

So to summarize, if you want to display the link target, don't use a pipe; if you want anything special, the link target comes first and is followed by a pipe and any markup you want.

For unterminated markup, it's handled like with other inline markup, such as

**foo //bar **baz //foobar

NME fixes the markup nesting. Each Creole markup toggles the corresponding style (** = bold on/off, // = italic on/off, etc.) The style is reset at the end of each paragraph. Titles are paragraphs, unlike links. I prefer to keep the rules as simple as possible, especially to handle what might be considered as input errors, so that it's easier to understand what happens and how to fix the input.

Revision history for this message
Yves Piguet (yves-piguet) wrote :

I think the following code, to be inserted in NME.c at line 2083, should implement what I've described above.

 else
 {
  // no separate link text or image alt text: write link verbatim
  for (k = 0; k < context->linkLength; )
  {
   if (outputFormat->charHookFun)
    CheckError(outputFormat->charHookFun(context->linkOffset + k,
      context,
      outputFormat->charHookData));
   if (outputFormat->encodeCharFun)
    CheckError(outputFormat->encodeCharFun(context->src + context->linkOffset,
      context->linkLength, &k,
      context,
      outputFormat->encodeCharData));
   else
   {
    context->dest[context->destLen++] = context->src[context->linkOffset + k];
    if (isFirstUTF8Byte(context->src[context->linkOffset + k]))
     context->destLenUCS16++;
    k++;
    context->col++;
   }
   CheckError(checkWordwrap(context, outputFormat));
  }
  // skip to end of link, before the end markup
  context->srcIndex = j;
 }

Changed in nme:
status: Confirmed → Fix Committed
Revision history for this message
r (ralfjunker) wrote :

Thanks for the quick fix, Yves! I have tested revision #69 and found it working as described. It passes all my tests, no regressions!

Your rationale is reasonable, especially with regards "~" being frequently used within links.

I only want to mention that

[[abc|**def]]

still outputs an extra pair of opening and closing inline tags:

<a href="abc"><b>def</b></a><b></b>

Other than that, thanks again for your excellent work!

Revision history for this message
Yves Piguet (yves-piguet) wrote :

You're welcome, Ralf!

For the superfluous empty html markup, it's the same as with other nesting mismatch, such as

//abc**def//

The end tag ]] or // forces NME to end the span of bold text to have properly nested HTML, and a start bold tag is output again because there might be more bold text (all Creole/NME style tags are on/off switches, and the style is reset at the end of paragraphs). If there is no more text before the end of the paragraph, the span is empty.

Optimizing that would require a second pass or lookahead. The current implementation is an acceptable compromise between robustness wrt any input, output correctness and implementation simplicity, I think.

Changed in nme:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.