Zim

Indented verbatim blocks are not parsed correctly

Bug #570615 reported by dotancohen
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zim
Fix Released
High
Unassigned

Bug Description

When Verbatim formatted text is indented, the formatting is lost and the text gets wrapped in three apostrophe characters. How to reproduce:
1) Type this test line by line, note the comments after each line:
<enter>
foo<enter>
bar<enter>
<enter>
2) Highlight the text and format as Verbatim
3) Click anywhere to loose highlighting
4) Highlight the text, including lines before and after, and press tab to indent
5) Close Zim (including tray icon), restart and open notebook

What happens:
Turns from this:
    foo
    bar
to:
    '''
    foo
    bar
   '''
without any formatting.

Note that this text was heavily edited after testing, so if you read the bug in an email the description has changed completely.

Tags: wiki-source
Revision history for this message
dotancohen (dotancohen) wrote :

This occurs on Zim 0.46, Kubuntu 9.10

description: updated
Changed in zim:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Fabian Stanke (fmos) wrote :

It sees that indenting of verbatim is generally inconsistent. If I understood correctly, verbatim areas are parsed as a single paragraph independent of line breaks and empty lines within the area. Indenting a line within the area however breaks the area apart which IMHO is incorrect behavior.

To give an example, starting from the following page:

__________snip__________
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2010-05-21T20:09:26.334377

====== TestPage ======
Created Freitag 21 Mai 2010

'''
foo
bar
baz
'''
__________snap__________

And then indenting the "bar" line, turns it into

__________snip__________
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2010-05-21T20:09:26.334377

====== TestPage ======
Created Freitag 21 Mai 2010

'''
foo
'''

 '''
 bar
 '''

'''
baz
'''

__________snap__________

I believe it would be preferable to not use the indent tag in verbatim areas, but instead insert the "raw" tab (or spaces).

Changed in zim:
status: Confirmed → In Progress
Revision history for this message
dotancohen (dotancohen) wrote :

The problem is when having Verbatim lines in lists, which are indented. The Verbatim text should be at the same indent level as the list item to which it belongs, thus indenting Verbatim text is an important feature.

Revision history for this message
Fabian Stanke (fmos) wrote :

I understand. Would it make sense to indent the entire Verbatim paragraph when the user presses tab inside the Verbatim area? In that case, indented ''' markers could be valid syntax for indented Verbatim paragraphs, while indentation of the Verbatim lines beyond that of the ''' markers could be rendered "raw". Of course, the user will not be able to enter Verbatim ("raw") tabs, but spaces could suffice for indenting e.g. in (Verbatim) code snippets.

Revision history for this message
dotancohen (dotancohen) wrote :

> Would it make sense to indent the entire Verbatim paragraph when
> the user presses tab inside the Verbatim area?

Although that would be an acceptable workaround for my particular use case, I do not think that it is general enough to be a real solution.

Revision history for this message
Fabian Stanke (fmos) wrote :

I could not try out a lot of Wikis, but to my knowledge, there is no support for indented preformatted blocks in dokuwiki or Mediawiki.

Now we are discussing two things:
1) What happens when the user presses tab inside a verbatim block?
2) How could an indented verbatim block be represented in wiki format?

> I do not think that it is general enough to be a real solution.
Supposing this relates to 1), I agree that breaking the verbatim apart into differently indented multiple verbatims is probably more general. Taking the development of bug #297932 that empty spaces around verbatims might be omitted (omittable) in the future backs this up.

So the remaining question really is, if the encoding of indented verbatim block I suggested is valid and desired. If that is the case, I will try and put a patch together for review.

tags: added: wiki-source
Revision history for this message
dotancohen (dotancohen) wrote :

> 1) What happens when the user presses tab
> inside a verbatim block?

Indent it, like with lists. I am a big keyboard user, so I would be the first to complain if this were unexpected behaviour!

> 2) How could an indented verbatim block
> be represented in wiki format?

Tabs in the source code, with the ''' markers untabbed. This would allow verbatim code to have different levels of indentation, which is important in code.

> So the remaining question really is, if the encoding
> of indented verbatim block I suggested is valid
> and desired.

As I understand it, yes.

Thanks!

Revision history for this message
Fabian Stanke (fmos) wrote :

> Tabs in the source code, with the ''' markers untabbed. This would allow verbatim code to have different levels of indentation, which is important in code.

There is the problem. This contradicts the purpose of preformatted blocks. The parser (and this is expected behaviour) does not actually parse the content of those blocks since they are ... preformatted. Indenting with tabs inside a preformatted block therefore is inherently different from text or list indenting done by the parser. You would not be happy with the result.

The only clean way that I can see to match the two is indenting the actual block (outside) e.g. by indenting the ''' markers and adding their indent to a possible indent inside the block (used in code). Do you understand what I mean? Maybe I'm making things more complicated than they are. If that is the case, please help me to resolve my disorientation.

To rephrase the remaining question:
Is my suggestion of tabbed ''' markers an acceptable solution for (outside) indented pre blocks, is there a better way, or do we not want (outside) indented pre blocks at all (and confine to inside, "raw", indents that do not match list indents, like most other wiki engines)?

Revision history for this message
dotancohen (dotancohen) wrote :

> This contradicts the purpose of preformatted blocks.
>

In a technical way? What is "purpose" in this sense?

> The parser (and this is expected behaviour) does not actually parse the
> content of those blocks since they are ... preformatted. Indenting with tabs
> inside a preformatted block therefore is inherently different from text or list
> indenting done by the parser. You would not be happy with the result.

I'm lost. What would be the result?

> The only clean way that I can see to match the two is indenting the actual
> block (outside) e.g. by indenting the ''' markers and adding their indent to a
> possible indent inside the block (used in code).

Then how would you represent, say, an if statement with multiple levels of indentation?
if (this==that) {
    do.something()
}

Like this?:
'''
if (this==that) {
'''
    '''
    do.something()
    '''
'''
}
'''

It seems unwieldy.

Revision history for this message
Fabian Stanke (fmos) wrote :

> In a technical way? What is "purpose" in this sense?

In the sense that by employing a verbatim block, the user explicitly asks the parser to keep out of that area. That way the user can be sure that anything he types in will be printed in that exact way and not auto-magically converted to italic text. E.g. if you enter //foo// and close and reopen zim, the slashes will be gone and the word printed in italics. If you do the same in a verbatim block, no such thing will happen.

> What would be the result?

I'm not a GTK expert, but as far as I understood, the result depends on two variables: the tab widths and the indent widths. The result will look as expected only if those two are the same, which in general they are not. In my case (zim head), the tab width is slightly more than twice the indent distance of one indent level.

> It seems unwieldy.

It is indeed. But it is more "general" than my first suggestion. And it is what happens now! Only that it breaks the parser, because it writes, but cannot read the indented ''' marks (original bug report).

Following my first suggestion, your example would look like this in case it is unindented:
'''
if (this==that) {
    do.something()
}
'''

And like this, if it is indented:
    '''
    if (this==that) {
        do.something()
    }
    '''

But as we discussed above, we would have to agree on what happens, if the user presses tab inside such a block. Moving the entire block is not the right way to go as you pointed out. Inserting a tab character in the text and not changing the indentation would be better. But then, how does one change the indentation of the entire block?

As my feeling is now, the unwieldy solution is the way to go until we have some genius alternative not yet mentioned here. Do you agree?

Revision history for this message
dotancohen (dotancohen) wrote :

> E.g. if you enter //foo// and close and reopen zim, the slashes
> will be gone and the word printed in italics
>

That should not happen at all. I would expect those slashes to be escaped, changing "//foo//" to italic "foo" is unexpected and might even be dataloss. If someone wants to edit the wiki syntax himself, Zim is _designed_ to be easy to get to the storage files for that purpose. That is why data is not in dotfiles, and why they remain plaintext files.

> I'm not a GTK expert, but as far as I understood, the result
> depends on two variables: the tab widths and the indent widths.
>

Tab width I understand, but what is "indent width"? If the user indents with tabs, why should the indent width be different than the tab width?

And what of the use case of indentation with spaces?

> we would have to agree on what happens, if the user presses
> tab inside such a block.

So long as there is no selection, expected behaviour would be to indent the single line that the cursor is on. If there is a selection, then indent all the lines that have at least partial selection. This is how text editors work, and I just confirmed in Gedit.

> But then, how does one change the indentation of the
> entire block?
>

He selects the entire block, then presses tab.

> As my feeling is now, the unwieldy solution is the way to
> go until we have some genius alternative not yet
> mentioned here. Do you agree?

Sure, if it produces the expected behaviour then that's what is important. But the escaping must be fixed, i.e. //foo// should _not_ be italicized. Should I file a new issue on that?

Thanks, Fabian!

Revision history for this message
Fabian Stanke (fmos) wrote :

> Tab width I understand, but what is "indent width"?

Zim realises the indent with the left_margin tag, which is comparable to the margin-left CSS tag. The following example should clarify the issue.

<html>
<body>
<pre>No indent, no tab</pre>
<pre style="margin-left:25px;">Indented, no tab</pre>
<pre>&#0009;No indent, one tab</pre>
<pre style="margin-left:25px;">&#0009;Indented, one tab</pre>
</body>
</html>

(Note that HTML renders tabs only in a verbatim block, while Zim allows tabs in formatted text as well.)

> And what of the use case of indentation with spaces?

That should work independently on top of tabs and indents without problems.

> Sure, if it produces the expected behaviour then that's what is important.

OK, then I'll put together a patch implementing that behaviour as soon as I can allocate some time for it.

> Should I file a new issue on that?

I think, that would be appropriate.

Revision history for this message
dotancohen (dotancohen) wrote :

> Zim realises the indent with the left_margin tag, which is
> comparable to the margin-left CSS tag.
>

AGH! _That_ is why my indentations are ruined! Please, use tabs, not CSS, to represent tabs! Otherwise we get messes like this. Even copying [] lists from zim and pasting into a text editor was loosing the indentation, I did not understand why.

Please, use tabs, not CSS, to represent tabs! Zim-wiki is supposed to be semantic!

> I'll put together a patch implementing that behaviour as soon as I
> can allocate some time for it.

Thank you very much! Your attention to Zim and to this issue are amazing.

>> Should I file a new issue on that?
> I think, that would be appropriate.

https://bugs.launchpad.net/zim/+bug/585300

Revision history for this message
Fabian Stanke (fmos) wrote :

> Please, use tabs, not CSS, to represent tabs!

That would be much more messy, since we couldn't align line-broken list items properly. The first character in the second line of a list item would be at the same horizontal offset at the bullet in the first line.

> Even copying [] lists from zim and pasting into a text editor was loosing the indentation

If you need to do that, you should probably open the txt file even in that same editor and do your copy and pasting from there.

> Zim-wiki is supposed to be semantic!

Well, Zim-wiki is also WYSIWYG and that probably has priority in conflicting cases like this one.

> Your attention to Zim and to this issue are amazing.

I'm completely new to Zim and haven't (yet) contributed anything, but thanks for the feedback.

> https://bugs.launchpad.net/zim/+bug/585300

I will comment on that over there.

Revision history for this message
dotancohen (dotancohen) wrote :

> That would be much more messy, since we couldn't align line-broken
> list items properly.
>

I think that you mean "word wrap", not "broken" :)

> The first character in the second line of a list item
> would be at the same horizontal offset at the bullet in the first line.
>

Well, in that case _do_ use CSS but in a addition to the tab. You could do it by putting the tabs in there, then styling with a negative value on the CSS text-indent property. This would not only give correct positioning, it would also solve a bug with RTL lists that I have not filed yet.

> If you need to do that, you should probably open the
> txt file even in that same editor and do your copy and
> pasting from there.
>

If I need to do that, they why use Zim at all? :)

> Well, Zim-wiki is also WYSIWYG and that probably has priority
> in conflicting cases like this one.

What I'm seeing is not what I'm getting, but we will discus that in the other bug, which is more appropriate.

Thanks.

Revision history for this message
Jaap Karssenberg (jaap.karssenberg) wrote :

Not sure I understand the whole discussion going on her.

1) Zim does uses real indenting in the editor, not tabs (I think this is what you refer to by CSS !? keep in mind zim does not use HTML insternally) -- this behavior will _not_ be changed as it is the only way to properly wrap text in an indented section like a list item
2) For paragraphs and verbatim blocks the indenting currently applies to the whole block, single lines with extra indenting still use tabs. (This might need fixing for consistency.)

I see two distinct issues in this bug report. Current description is about indenting of the whole verbatim block, which apparently is broken. The second is about mixing indented verbatim lines in a list, which is in the original description and is referred to in the title.

Please make sure description and title are in sync and descriptive of the issue under discussion. Please untangle the discussion for issues with indenting lines inside a block and indenting the whole block.

Revision history for this message
dotancohen (dotancohen) wrote :

> I see two distinct issues in this bug report. Current description is
> about indenting of the whole verbatim block, which apparently is
> broken. The second is about mixing indented verbatim lines in a
> list, which is in the original description and is referred to in the title.
>

I'll let you take it from here, if you want to make a separate bug report on the second issue. As a user, I care that my verbatim blocks don't get mangled, that I can place them in lists and indent them, and that copying the text to / from Zim includes proper indentation tabs and spaces. As you prefer to separate bug reports on technical solutions as opposed to user error, I'll let you decide how to file them. Thanks.

Fabian Stanke (fmos)
summary: - Verbatim formatting lost in [] lists
+ Indented verbatim blocks are not parsed correctly
description: updated
Revision history for this message
Fabian Stanke (fmos) wrote :

I have changed the title to be in sync with the description.

> ... single lines with extra indenting still use tabs. (This might
> need fixing for consistency.)

Mixing indented and unindeted lines within a block/paragraph is now subject of bug report:
https://bugs.launchpad.net/zim/+bug/586296

Revision history for this message
Fabian Stanke (fmos) wrote :

I have written a patch that implements support for indented verbatim blocks.
It also includes updates to all output formats and to the corresponding unit tests.

Revision history for this message
Jaap Karssenberg (jaap.karssenberg) wrote : Re: [Bug 570615] Re: Indented verbatim blocks are not parsed correctly

On Thu, May 27, 2010 at 4:20 PM, Fabian Moser
<email address hidden> wrote:
> I have written a patch that implements support for indented verbatim blocks.
> It also includes updates to all output formats and to the corresponding unit tests.

Thanks for the patch, looks good to me. Only thing is that after
merging with current trunk test suite fails for the pageview test -
could you take a look at that? (Another failure for plain text is due
to new changes in the trunk.)

Regards,

Jaap

Revision history for this message
Fabian Stanke (fmos) wrote : Re: [Bug 570615] Indented verbatim blocks are not parsed correctly

Am 30.05.2010 10:54, schrieb Jaap Karssenberg:
> Thanks for the patch, looks good to me. Only thing is that after
> merging with current trunk test suite fails for the pageview test -
> could you take a look at that? (Another failure for plain text is due
> to new changes in the trunk.)

Unfortunately, the test suite fails for the pageview test already for a
clean checkout (without any changes from me). I didn't have the
opportunity to look into that yet to file a proper report. I cannot
effectively investigate the effect of my patch unless the test succeeds
for clean trunk)

FYI, the test suite fails with the following messages (sorry for the
last one being German, it says "seg fault"):

Test serialization of the page view textbuffer ... ok
runTest (tests.pageview.TestTextView) ...
!! Two GtkWarnings expected here for gdk display !!
/home/fabian/Entwicklung/Zim/zim/zim/gui/pageview.py:2158: GtkWarning:
gdk_drawable_get_screen: assertion `GDK_IS_DRAWABLE (drawable)' failed
  elif not gtk.TextView.do_key_press_event(self, event):
/home/fabian/Entwicklung/Zim/zim/zim/gui/pageview.py:2158: GtkWarning:
gdk_screen_get_root_window: assertion `GDK_IS_SCREEN (screen)' failed
  elif not gtk.TextView.do_key_press_event(self, event):
/home/fabian/Entwicklung/Zim/zim/zim/gui/pageview.py:2158: GtkWarning:
gdkdrawable-x11.c:874 drawable is not a pixmap or window
  elif not gtk.TextView.do_key_press_event(self, event):
Speicherzugriffsfehler (Speicherabzug geschrieben)

Revision history for this message
Jaap Karssenberg (jaap.karssenberg) wrote :

On Mon, May 31, 2010 at 10:26 AM, Fabian Moser
<email address hidden> wrote:
> Unfortunately, the test suite fails for the pageview test already for a
> clean checkout (without any changes from me). I didn't have the
> opportunity to look into that yet to file a proper report. I cannot
> effectively investigate the effect of my patch unless the test succeeds
> for clean trunk)

This is reported already as bug #539313. However this is not the error which
I see with your patch.

The problem I see is that indenting runs off in the parsing roundtrip
test for pageview.

If you comment out the "TestTextView" class in tests/pageview.py you
will be able to run this test and see the error I intended.

Regards,

Jaap

Revision history for this message
Fabian Stanke (fmos) wrote :

Thank you for the hint.

I changed the patch such that the result passes all unit tests except for the one mentioned above (the segfault). This new patch acts on the current trunk (rev 261).

Cheers,
Fabian

Revision history for this message
Jaap Karssenberg (jaap.karssenberg) wrote :

Patch committed in rev265 - thanks !

Changed in zim:
status: In Progress → Fix Committed
Revision history for this message
Jaap Karssenberg (jaap.karssenberg) wrote :

Fix released in zim 0.47

Changed in zim:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.