Comment 5 for bug 77138

Revision history for this message
Yeti (yeti) wrote :

>> The complexity of every simple private or one-shot script that reads these files
>> is considerably increased.

> This is why we use wrappers, libraries, or whatever to handle input formats,
> instead of writing a parser for each program.

Many of these files are not in `formats'. They are general text files: structured, unstructured, and often ad-hoc or partially structured. No wrapper will do unless it's a fopen() (leaving aside open(), mmap(), etc.) wrapper.

> The trivial zgrep wrapper shows this isn't an issue;

No, the necessity to learn different commands (if someone wrote them) because none of the standard Unix text tools is usable (on supposedly text files) shows that it is a serious issue.

> besides, why would one want
> to grep instead of using beagle, tracker, or an ad hoc command to access
> a specialized format?

Beagle is not useful for searching because it is incapable of imposing conditions on the file name. And of course it does not replace any other text tool beside grep. Regarding specialized formats, if you have ever tried to find anything in gnuplot help, you probably noticed the fastest method is to just grep the gih file.

I don't understand what's good on complicating of reading the files. This does not depend on how specialized the format is.

>> Why HTML files aren't compressed?

So, why they aren't?

> Why are most PNG files compressed?

Ah, now we are getting somewhere. PNG files are not compressed. Show me a single png.gz file.

And please don't try to argue PNG files are compressed internally, because this is completely irrelevant. PNG specifications says how the information is stored, it can be compressed, or frobnicated or whatever. But PNG files are stored exactly according to their specification, no mangling is performed. Unlike other files that are mangled.

> evince certainly does, as well as compressed DVI and PS.

evince /usr/share/doc/tetex-doc/programs/dvips.pdf.gz

Result: http://hydra.physics.muni.cz/~yeti/tmp/evince-pdf-gz.png

The same for DVI (it can handle ps.gz though).

---------------------
This discussion is pointless. After all the years of experience with compressed documentation in Debian no one can convince me it makes sense. And conversely, Debian people seem to be incapable of admitting any fundamental problems. Will I ever grow wise enough to let it be? Sigh.

---------------------
> my conclusion so far is that we should aim at excluding index.sgml files from
> compression and that such a transition will happen slowly over the next years.

Thank you.