Export of single-page-with-subpages as MHTML causes Bug dialog

Bug #1725834 reported by Alex Bell on 2017-10-21
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

Zim version 0.65
Running in Ubuntu 16.04 LTS on Intel i5-4670K

Experimenting with exporting Zim pages as MHTML, trying to find a way to package a Zim notebook so that I could upload it to Google Drive and view it on my cell phone.

In the File>Export dialog, specified
-- Single page
-- Include subpages
Format: MHTML
Template: Default
entered an output file location, clicked OK, and then got the "Looks like you found a bug" dialog. Copied text and pasted it here:

This is zim 0.65
Platform: posix
Locale: en_US UTF-8
FS encoding: UTF-8
Python: (2, 7, 12, 'final', 0)
Gtk: (2, 24, 30)
Pygtk: (2, 24, 0)
Zim revision is:
  branch: zim-trunk
  revision: 805 <email address hidden>
  date: 2015-11-01 15:42:45 +0100

======= Traceback =======
  File "/usr/lib/python2.7/dist-packages/zim/gui/widgets.py", line 3106, in do_response
    destroy = self.do_response_ok()
  File "/usr/lib/python2.7/dist-packages/zim/gui/exportdialog.py", line 62, in do_response_ok
    for p in exporter.export_iter(selection):
  File "/usr/lib/python2.7/dist-packages/zim/export/exporters/mhtml.py", line 49, in export_iter
    for p in exporter.export_iter(pages):
  File "/usr/lib/python2.7/dist-packages/zim/export/exporters/files.py", line 179, in export_iter
    self.template.process(lines, context)
  File "/usr/lib/python2.7/dist-packages/zim/templates/__init__.py", line 174, in process
    self.emit('process', output, context)
  File "/usr/lib/python2.7/dist-packages/zim/signals.py", line 376, in emit
    return call_default(self, signal, args)
  File "/usr/lib/python2.7/dist-packages/zim/signals.py", line 224, in call_default
    return method(*args)
  File "/usr/lib/python2.7/dist-packages/zim/templates/__init__.py", line 178, in do_process
    processor.process(output, context)
  File "/usr/lib/python2.7/dist-packages/zim/templates/processor.py", line 81, in process
    self.__call__(output, self.main, context)
  File "/usr/lib/python2.7/dist-packages/zim/templates/processor.py", line 130, in __call__
    self._loop(output, element, context)
  File "/usr/lib/python2.7/dist-packages/zim/templates/processor.py", line 170, in _loop
    self.__call__(output, element, context) # recurs
  File "/usr/lib/python2.7/dist-packages/zim/templates/processor.py", line 111, in __call__
    value = expr(context)
  File "/usr/lib/python2.7/dist-packages/zim/templates/expression.py", line 124, in __call__
    value = getattr(value, p)
  File "/usr/lib/python2.7/dist-packages/zim/export/template.py", line 475, in title
    return self.heading or self.basename
  File "/usr/lib/python2.7/dist-packages/zim/export/template.py", line 418, in heading
    head, body = self._split_head()
  File "/usr/lib/python2.7/dist-packages/zim/export/template.py", line 449, in _split_head
    tree = self._tree.copy()
  File "/usr/lib/python2.7/dist-packages/zim/formats/__init__.py", line 318, in copy
    return ParseTree().fromstring(xml)
  File "/usr/lib/python2.7/dist-packages/zim/formats/__init__.py", line 295, in fromstring
ParseError: not well-formed (invalid token): line 7, column 0

So then I canceled the export.

I ran Zim -D and after several iterations isolated the page that contained the invalid token, which was a single character, ASCII hex 1A, or Ctrl-z.

I often copy a snippet of text from a Web page and paste it into a Zim page, and apparently the source text contained the Ctrl-z.

So, this brought up some questions for me:
-- When Zim pastes text into a notebook page, is there some kind of text filter in the pipeline that can remove 'invalid tokens' before inserting the text? I guess I'm thinking about a paste-as-text scenario.
-- Has someone developed a plug-in for cleaning text in Zim pages? Is that something that one could script for oneself in Python?
-- Might be nice if the export code gave more information on what caused it to halt, something like "Invalid character on page XXX, line NNN. Export halted."

Anyway, I deleted the offending character, which permitted the export to complete, resulting in a file having a .mht extension. Unfortunately, when I tried to open the file in Chrome (Version 55.0.2883.87 (64-bit)), it displayed only a blank page. Similarly, Firefox (52.0.2, 64-bit) would present an Open dialog requesting whether to open the MHT file, then display only a blank page, and then present the Open dialog again, resulting in another blank page, etc.

Sort of ignorant here--is MHTML a disused method for sort of encapsulating a group of HTML pages?
What I want to do is create a documentation tree of HTML pages that contain crosslinks to other pages in the tree, and put this up in the cloud (such as at Google Drive), and have the links still work relative to the location of the root page of the tree, on the device (such as a
smartphone) containing an offline copy of the tree.

Forgive me, this is probably off-topic in the context of the bug report, just spent a couple of hours experimenting with this in Zim, thought it wouldn't hurt to ask.
Al Bell

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers