Zim

Do not parse input as wikicode

Bug #585300 reported by dotancohen on 2010-05-25
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zim
Undecided
Unassigned

Bug Description

Please so not parse input as wikicode.

Currently, if the user enters //foo// then reopens zim, the slashes will be gone and the word printed in italics. That should not happen at all. I would expect those slashes to be escaped, changing "//foo//" to italic "foo" is unexpected and might even be dataloss. If someone wants to edit the wiki syntax himself, Zim is _designed_ to be easy to get to the storage files for that purpose. That is why data is not in dotfiles, and why they remain plaintext files.

Real life example, on my own data:
I stored this code in Zim:
$ du -b --max-depth 1 | sort -nr | perl -pe 's{([0-9]+)}{sprintf "%.1f%s", $1>=2**30? ($1/2**30, "G"): $1>=2**20? ($1/2**20, "M"): $1>=2**10? ($1/2**10, "K"): ($1, "")}e'

However, Zim parsed it and stores this:
$ du -b --max-depth 1 | sort -nr | perl -pe 's{([0-9]+)}{sprintf "%.1f%s", $1>=230? ($1/230, "G"): $1>=220? ($1/220, "M"): $1>=210? ($1/210, "K"): ($1, "")}e'

The "**" was parsed into Bold. That is dataloss, making this a serious issue.

The problem here is that there is no way to store this in wiki text cleanly unless it is escaped. This is what the "verbatim" style is for. This is a design limitation of the wiki syntax.

Changed in zim:
status: New → Invalid
Fabian Stanke (fmos) wrote :

I agree with what you, but would like to add a different aspect:

In my opinion, the confusion (I'm not calling it a problem) here is that Zim is both, WYSIWYG and Wiki at the same time. If it weren't WYSIWYG, the user would immediately recognize the problem and avoid it by using a verbatim block. Letting the user enter something like //foo// and not showing him the result immediately is not WYSIWYG any more. I believe, in such a case Zim should turn the text into italics right away.

One more "user-friendly", less "black magic" alternative would be to pop up a dialogue asking the user if he wants to put that in italics or wrap it into a verbatim block. But this is of course a lot of "sugar" and might be too much for some...

Fabian Stanke (fmos) wrote :

@dotancohen:
For your use-case you should definitely use a verbatim block.
The "dataloss" mention may be incorrect, because you should be able to "recover" your data from the txt files. The stars are most probably still in there if you didn't remove the bold formatting.

dotancohen (dotancohen) wrote :

> The problem here is that there is no way to store this in wiki text
> cleanly unless it is escaped.
>

Then escape it. Escaping _all_ user data before it goes into SQL is standard practice, why should it be any different if the wiki/text file suffers from "wiki-formatting injection" attacks as well? Escaping data before storage is _not_ a problem.

I agree that it might be nice to have an _option_ to let the user enter wiki-formatted text for formatting, but in that case the text should be converted immediately.

> This is what the "verbatim" style is for. This
> is a design limitation of the wiki syntax.
>

There is no way for the user to know that certain strings cannot be represented in certain types of formatting. Nor should there be. Is this really a design limitation, or an oversight? It can easily be fixed by escaping input.

> Letting the user enter something like //foo// and not showing
> him the result immediately is not WYSIWYG any more.

That is part of the problem, yes. The other part is that there happen to be very valid strings of data that Zim cannot represent.

> For your use-case you should definitely use a verbatim block.

How can I then cut and paste lines to Zim? Verbatim works only on single lines, indentation is broken (I know that you are working on that) and has many other limitations. The only advantage is that it is a workaround to this issue. Instead, this issue should be fixed with escaping.

> The "dataloss" mention may be incorrect, because you should be
> able to "recover" your data from the txt files.

No, I've already manually removed the links as they were polluting my tree with pages that did not exist. The data is long gone.

On Tue, May 25, 2010 at 5:00 PM, dotancohen
<email address hidden> wrote:
> There is no way for the user to know that certain strings cannot be
> represented in certain types of formatting. Nor should there be. Is this
> really a design limitation, or an oversight? It can easily be fixed by
> escaping input.

Since the design allows directly inserting wiki syntax it is a
limitation of the design that when this was unintended this can have
side effects. The wiki syntax tries to avoid common character
sequences, so the only real conflict is with code snippets using these
same character sequences. This is why we added verbatim formatting as
a way to escape blocks of input.

> How can I then cut and paste lines to Zim? Verbatim works only on single
> lines,

Verbatim definitively works on multiple lines, just try it. You can
paste the lines, select them and format them as verbatim.

> indentation is broken (I know that you are working on that) and
> has many other limitations.

Indenting indeed needs to be fixed, but I fail to see the other
limitations. Verbatim is especially intended for inserting code
snippets into pages.

>> The "dataloss" mention may be incorrect, because you should be
>> able to "recover" your data from the txt files.
>
> No, I've already manually removed the links as they were polluting my
> tree with pages that did not exist. The data is long gone.

You might still be able to recover that if you use version control by
rolling back individual pages or inspecting changes to see what lines
need fixing.

-- Jaap

dotancohen (dotancohen) wrote :

> Since the design allows directly inserting wiki syntax it is a
> limitation of the design that when this was unintended this can have
> side effects.

How about making the direct inserting of wiki syntax optional? Please.

> Verbatim definitively works on multiple lines, just try it. You can
> paste the lines, select them and format them as verbatim.

Then I go to edit the code, and after a newline Zim is not in verbatim.

> You might still be able to recover that if you use version control

I don't use version control, but that's my own fault for trusting critical data to a 0.46 release software! I do have full system backups, but they are encrypted and a pain to decompress. I'm not sore at Zim for "causing" this, but I do expect it to be fixed. Whether it is a programming error, a bad design decision, or an oversight, it is still unexpected behaviour and likely to cause dataloss.

HansBKK (hansbkk) wrote :

I would personally like to see an option to disable all rendering, perhaps on a per-notebook basis, but ideally per-file or per-filesystem-branch. But probably not part of Jaap's intended use-cases for Zim.

dotancohen (dotancohen) wrote :

Jaap, can this issue be reopened as an RFE for an option to disable parsing input as wikicode?

Christoph Zwerschke (cito) wrote :

See also https://bugs.launchpad.net/zim/+bug/946229 which would solved this issue.

@dotan: wouldn't that be a double of bug #946229 ? If not maybe it would be better to open a new report with a new description instead of keeping full discussion here.

Btw. I stopped reading the mail thread about this because it got to long for my attention span - please make sure the resulting bug reports are concise

dotancohen (dotancohen) wrote :

@Jaap: Yes, it does look like bug #946229 is a viable replacement for reopening. I will do my best to keep the reasoning concise.

Thanks.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers