gtkdoc-fixxref broken by compressed documentation

Bug #77138 reported by Yeti
4
Affects Status Importance Assigned to Milestone
debhelper (Debian)
Fix Released
Unknown
debhelper (Ubuntu)
Fix Released
Undecided
Unassigned
gtk-doc (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: gtk-doc-tools

One of the primary function of gtkdoc-fixxref is to point references to standard GLib, Gtk+, etc. symbols (types, functions, macros, etc.) in generated documentation to the system documentation if present. In other words, if another library documentation references GtkWidget, it is changed to hyperlink to GtkWidget description in Gtk+ documentation.

This does not work at all on Ubuntu: all cross-library hyperlinks are missing if some documentation is rebuilt there.

The reason is that gtkdoc-fixxref looks for index.sgml (line 121) which is of course not there because Debian tends to compress random files in /usr/share/doc, so only index.sgml.gz is present, which is first not looked for and second it would have to be decompressed (the cross-linking information is retrieved from it).

But instead of tweaking gtk-doc-tools, I would dare to suggest to just stop compressing index.sgml in affected packages libglib2.0-doc, libatk1.0-doc, libpango1.0-doc, libgtk2.0-doc, libgtkglext1-doc, etc.

Revision history for this message
Stefan Sauer (ensonic) wrote :

As one of the gtk-doc contributors I second that request. For consistency reason it would also be great if you don't symlink the docs around. Please leave glib/gtk+ docs under $prefix/share/gtk-doc/html/.

Revision history for this message
Loïc Minier (lool) wrote :

I fully agree that we should keep the documentation below /usr/share/gtk-doc and I would like to gradually convert back -doc packages to ship files in their natural location.

Concerning index.sgml, I suppose one could avoid zipping the file if possible, but would you be willing to parse zipped index.sgml files? After all, these are not that hard to support from Perl and do save some space, and there probably will always be some package which forgot to blacklist index.sgml from compression.

Revision history for this message
Yeti (yeti) wrote :

> After all, these are not that hard to support from Perl

This way of thinking is the source of all the problems. Sure, it is not hard to add compression support in a one particular case. But then then is another case, then a couple of them, yet another, a few more, and it never stops. The complexity of every simple private or one-shot script that reads these files is considerably increased.

Does grep support searching in compressed files? Why HTML files aren't compressed? Extending every HTML browser in the world to handle compressed files surely isn't so hard. The largest 8 compressed files in /usr/share/doc on my Ubuntu desktop are PDF files -- and do xpdf or evince support compressed PDF? Nope.

If Debian developers invested the same effort, that went into compression of files in /usr/share/doc, extending every program that needs to read them (failing for every program that is not in Debian and some of those that are) and fixing related bugs... if this effort was invested into compression on the file system level, we would have a working compression in several file systems by now and Debian would save more disk space.

So, please just stop compressing the files at least in this case. Adding decompression is a problem. Not because it is hard, but because it encourages compression and compression creates a burden for other people. I don't want to force everyone who needs to read index.sgml (which *is* supposed to be machine-readable) to implement decompression too.

Revision history for this message
Loïc Minier (lool) wrote :

> The complexity of every simple private or one-shot script that reads these files is considerably increased.

This is why we use wrappers, libraries, or whatever to handle input formats, instead of writing a parser for each program. If one needs to access gtk-doc files, then one is supposed to use a gtk-doc file data access layer.
  Unix programs are supposed to divide work in small specialized commands, so would one be able to pass the index.sgml on stdin, it would be easy to pipe a gunzip to handle compressed files, like we do for -- say -- man pages; I'm sure you wouldn't want all your man pages to be uncompressed.

I do get your argument that the parsers have to handle two (simple) cases instead of one but I do argue that might be the consequence of other design issues as other examples show this is trivially possible.

> Does grep support searching in compressed files?

The trivial zgrep wrapper shows this isn't an issue; besides, why would one want to grep instead of using beagle, tracker, or an ad hoc command to access a specialized format?

> Why HTML files aren't compressed?

Why are most PNG files compressed?

> do xpdf or evince support compressed PDF? Nope.

I can't speak for xpdf (which I don't have around), but evince certainly does, as well as compressed DVI and PS.

Back to the point: the situation is that some index.sgml files *are* compressed -- or you wouldn't be complaining -- the easiest short term solution which instantaneously fixes the real life problem would have been to add compression support to gtk-doc.

I also argued that it would act as a safety net in case someone forgets not to zip an index.sgml.

Finally, this has a size benefit for distributors (this is the general use case of compressing the files).

It seems your final point of view is that compressed index.sgml should not be supported to force people to use non-compressed files to force a single possible format, so my conclusion so far is that we should aim at excluding index.sgml files from compression and that such a transition will happen slowly over the next years.

Revision history for this message
Yeti (yeti) wrote :

>> The complexity of every simple private or one-shot script that reads these files
>> is considerably increased.

> This is why we use wrappers, libraries, or whatever to handle input formats,
> instead of writing a parser for each program.

Many of these files are not in `formats'. They are general text files: structured, unstructured, and often ad-hoc or partially structured. No wrapper will do unless it's a fopen() (leaving aside open(), mmap(), etc.) wrapper.

> The trivial zgrep wrapper shows this isn't an issue;

No, the necessity to learn different commands (if someone wrote them) because none of the standard Unix text tools is usable (on supposedly text files) shows that it is a serious issue.

> besides, why would one want
> to grep instead of using beagle, tracker, or an ad hoc command to access
> a specialized format?

Beagle is not useful for searching because it is incapable of imposing conditions on the file name. And of course it does not replace any other text tool beside grep. Regarding specialized formats, if you have ever tried to find anything in gnuplot help, you probably noticed the fastest method is to just grep the gih file.

I don't understand what's good on complicating of reading the files. This does not depend on how specialized the format is.

>> Why HTML files aren't compressed?

So, why they aren't?

> Why are most PNG files compressed?

Ah, now we are getting somewhere. PNG files are not compressed. Show me a single png.gz file.

And please don't try to argue PNG files are compressed internally, because this is completely irrelevant. PNG specifications says how the information is stored, it can be compressed, or frobnicated or whatever. But PNG files are stored exactly according to their specification, no mangling is performed. Unlike other files that are mangled.

> evince certainly does, as well as compressed DVI and PS.

evince /usr/share/doc/tetex-doc/programs/dvips.pdf.gz

Result: http://hydra.physics.muni.cz/~yeti/tmp/evince-pdf-gz.png

The same for DVI (it can handle ps.gz though).

---------------------
This discussion is pointless. After all the years of experience with compressed documentation in Debian no one can convince me it makes sense. And conversely, Debian people seem to be incapable of admitting any fundamental problems. Will I ever grow wise enough to let it be? Sigh.

---------------------
> my conclusion so far is that we should aim at excluding index.sgml files from
> compression and that such a transition will happen slowly over the next years.

Thank you.

Revision history for this message
Loïc Minier (lool) wrote :
Download full text (3.3 KiB)

> No, the necessity to learn different commands (if someone wrote them) because none of the standard Unix text tools is usable (on supposedly text files) shows that it is a serious issue.[...]
> Beagle is not useful for searching[...]
> I don't understand what's good on complicating of reading the files.[...]
> And please don't try to argue PNG files are compressed internally[...]

This is twisting the issue to your liking and ignoring the arguments I'm giving: the fact is unix commands are meant to be combined, and *no* you can't expect "grep -r" to search in all possible formats, even formats transporting text such as PDF or MSWord; heck, you can't even grep on HTML files for anything else than small ASCII words.

Stop thinking that the only constraint on the files of a system is to be able to "grep -r" them. I explained why the space savings are an advantage in some cases and that we try to adapt the tools to handle these. By your logic, we wouldn't use tar files because they hide individual files from standard commands such as cp and wget.

The reason I mentionned Beagle or PNG is that we are not living in a text file world; even index.sgml files aren't plain text: they are SGML; try searching for "Loïc" when it's spelled as Loï or whatever. We're living in a format in format in format world (make that recursive).

>> Why HTML files aren't compressed?
> So, why they aren't?

I see you insist on getting an answer as if I didn't reply already: some HTML files are compressed as dpkg -S html.gz will show you; there's no need to compress in the http:// case, this is why you don't see them compressed on the web; I already answered that http was already zipping the contents.

> evince /usr/share/doc/tetex-doc/programs/dvips.pdf.gz

I'm sorry it doesn't work for you under Ubuntu; Debian fixed the list of MIME types listed in the evince.desktop files, and it works fine under Debian. I'm sure this will reach Ubuntu soon.

> After all the years of experience with compressed documentation in Debian no one can convince me it makes sense.
> And conversely, Debian people seem to be incapable of admitting any fundamental problems.

This is because you don't wont to read the actual arguments people are making: saving space is still an issue for example in live CDs, embedded systems (where one still has to ship some files such as copyrights), or simply to email them; zipping still provides a speed advantage when the CPU is mostly idle while the disk is a bottleneck. I already wrote this, but you simply ignore the existence of the advantages.

Now, as I already said, I can live with the fact that the inventors of gtk-doc index.sgml files do not want to impose to tools the support of zipped files, and we can diverge from the default of zipping everything to explicitly exclude these files (even if I think it's not the best solution in the interest of Debian and our users), but please do not argue that "Debian can not convince it makes sense" or that "Debian is incapable of admitting any fundamental problem". I think I proved above that there are valid use cases for compression, it's omnipresent (in protocols, file types, or in file systems as you n...

Read more...

Revision history for this message
Yeti (yeti) wrote :

> This is because you don't wont to read the actual arguments people are making:
> saving space is still an issue for example in live CDs, embedded systems
> (where one still has to ship some files such as copyrights), or simply to email them;
> zipping still provides a speed advantage when the CPU is mostly idle while the
> disk is a bottleneck. I already wrote this, but you simply ignore the existence
> of the advantages.

Sorry, you don't want to read my arguments. By compressing on the file system level considerably more space could be saved *and* it would be completely transparent for any program reading (or even writing) the files. What you argue for is just a complex kluge with a great breakage and annoyance potential, not a real solution.

Revision history for this message
Yeti (yeti) wrote :

What the...

Please forget bug report has been ever submitted. Thank you for your time.

Revision history for this message
Pedro Villavicencio (pedro) wrote :

Ok, thanks for let us know.

Changed in gtk-doc:
status: New → Invalid
Revision history for this message
Mathias Hasselmann (hasselmm) wrote :

Guess Live-CD support is quite important, but I wonder if the -doc packages - which provide the index files - really are shipped with live CDs. So far I considered the -doc packages an option developers choose for convenience.

<offtopic>Since we talk about convenice: May I have gnome-doc and gnome-dbg meta packages please? Finding and choosing all of them by hand is quite boring.</offtopic>

Revision history for this message
Loïc Minier (lool) wrote :

@Mathias: Indeed, probably most index.sgml.gz are not on the live CDs, it's just that the tool by default will compress most files and we need to specifically exclude index.sgml by hand. The live CD example was one of why we need to save space and compress files below /usr/share/doc by default.

Concerning gnome-dbg and gnome-doc: first, please file a separate bug; second, there's a gnome-dbg and by pure coincidence we also discussed a gnome-doc meta package recently on #gnome-debian (where the Debian packaging of GNOME packages is often discussed). I think this makes sense, and we might add such a meta package soon.

Revision history for this message
David Schleef (dschleef) wrote :

Could someone explain why, in this case, the policy of "compressing doc files" overrides the policy of "cooperating with upstream"? Or, more importantly, the policy of "compressing doc files" overrides "not breaking stuff".

Revision history for this message
Loïc Minier (lool) wrote :

There's no such override.

Short summary of where we stand today:
1) I personally still think packages should move back to /usr/share/gtk-doc; this doesn't trigger the bug, only gtk-doc below /usr/share/doc has this issue; this can only happen as per package fixes; I did this in some packages I maintain
2) I still think it wouldn't be too hard to handle compressed indexes transparently, but yes: this is only a wishlist feature request; it would instantaneously fix this bug too
3) these files were blacklisted in debhelper's dh_compress in 7.0.10; this bug should slowly disappear as this debhelper moves to Ubuntu and packages get rebuilt against it

Changed in debhelper:
status: Unknown → Fix Released
Revision history for this message
Stefan Sauer (ensonic) wrote :

 don't get those anymore:
 find /usr/share/doc -name "index.sgml.gz"

shall we close the bug?

Revision history for this message
Yeti (yeti) wrote :

I cannot verify it as I do not have any Ubuntu machine any more so, close freely if it seems fixed.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Fixed in 7.0.10 and higher.

Changed in debhelper (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.