Export of single-page-with-subpages as MHTML causes Bug dialog
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Zim |
New
|
Undecided
|
Unassigned |
Bug Description
Zim version 0.65
Running in Ubuntu 16.04 LTS on Intel i5-4670K
Experimenting with exporting Zim pages as MHTML, trying to find a way to package a Zim notebook so that I could upload it to Google Drive and view it on my cell phone.
In the File>Export dialog, specified
-- Single page
-- Include subpages
Format: MHTML
Template: Default
entered an output file location, clicked OK, and then got the "Looks like you found a bug" dialog. Copied text and pasted it here:
This is zim 0.65
Platform: posix
Locale: en_US UTF-8
FS encoding: UTF-8
Python: (2, 7, 12, 'final', 0)
Gtk: (2, 24, 30)
Pygtk: (2, 24, 0)
Zim revision is:
branch: zim-trunk
revision: 805 <email address hidden>
date: 2015-11-01 15:42:45 +0100
======= Traceback =======
File "/usr/lib/
destroy = self.do_
File "/usr/lib/
for p in exporter.
File "/usr/lib/
for p in exporter.
File "/usr/lib/
self.
File "/usr/lib/
self.
File "/usr/lib/
return call_default(self, signal, args)
File "/usr/lib/
return method(*args)
File "/usr/lib/
processor.
File "/usr/lib/
self.
File "/usr/lib/
self.
File "/usr/lib/
self.
File "/usr/lib/
value = expr(context)
File "/usr/lib/
value = getattr(value, p)
File "/usr/lib/
return self.heading or self.basename
File "/usr/lib/
head, body = self._split_head()
File "/usr/lib/
tree = self._tree.copy()
File "/usr/lib/
return ParseTree(
File "/usr/lib/
parser.
ParseError: not well-formed (invalid token): line 7, column 0
=====
So then I canceled the export.
I ran Zim -D and after several iterations isolated the page that contained the invalid token, which was a single character, ASCII hex 1A, or Ctrl-z.
I often copy a snippet of text from a Web page and paste it into a Zim page, and apparently the source text contained the Ctrl-z.
So, this brought up some questions for me:
-- When Zim pastes text into a notebook page, is there some kind of text filter in the pipeline that can remove 'invalid tokens' before inserting the text? I guess I'm thinking about a paste-as-text scenario.
-- Has someone developed a plug-in for cleaning text in Zim pages? Is that something that one could script for oneself in Python?
-- Might be nice if the export code gave more information on what caused it to halt, something like "Invalid character on page XXX, line NNN. Export halted."
Anyway, I deleted the offending character, which permitted the export to complete, resulting in a file having a .mht extension. Unfortunately, when I tried to open the file in Chrome (Version 55.0.2883.87 (64-bit)), it displayed only a blank page. Similarly, Firefox (52.0.2, 64-bit) would present an Open dialog requesting whether to open the MHT file, then display only a blank page, and then present the Open dialog again, resulting in another blank page, etc.
Sort of ignorant here--is MHTML a disused method for sort of encapsulating a group of HTML pages?
What I want to do is create a documentation tree of HTML pages that contain crosslinks to other pages in the tree, and put this up in the cloud (such as at Google Drive), and have the links still work relative to the location of the root page of the tree, on the device (such as a
smartphone) containing an offline copy of the tree.
Forgive me, this is probably off-topic in the context of the bug report, just spent a couple of hours experimenting with this in Zim, thought it wouldn't hurt to ask.
Regards,
Al Bell