lynx -dump fails if filename is not *.html

Bug #1112568 reported by David Biesack on 2013-02-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lynx (Ubuntu)
Undecided
Unassigned

Bug Description

lynx -dump url is supposed to strip out the HTML markup and format the text,
but it is not working; it is just echoing the output.

The input is simple HTML:

$ echo >| curl.out <<EOF
<html>
<body><p>Some text</p></body>
</html>
EOF

but when I do

    $ lynx -d curl.out

lynx just echos the HTML markup.

It works if I do:

  $ mv curl.out test.html
  $ lynx -d test.html

I.e. it appears the command requires the file name to end in .html

I can use

  $ lynx -dump -stdin < curl.out

as a workaround but the default should work.

(This is in the context of another script which fetches a web resource
via curl, then looks at the file content and does different thing based
on the contents; it is not always html, so that is why my file name is curl.out)

I'm on Ubuntu using:

    $ lynx --version
    Lynx Version 2.8.8dev.9 (12 Jun 2011)
    libwww-FM 2.14, SSL-MM 1.4.1, GNUTLS 2.10.5, ncurses 5.9.20110404(wide)
    Built on linux-gnu Nov 19 2012 15:52:46

On Fri, Feb 01, 2013 at 02:59:16PM -0000, David Biesack wrote:
> Public bug reported:
>
> lynx -dump url is supposed to strip out the HTML markup and format the text,
> but it is not working; it is just echoing the output.
>
> The input is simple HTML:
>
> $ echo >| curl.out <<EOF
> <html>
> <body><p>Some text</p></body>
> </html>
> EOF
>
> but when I do
>
> $ lynx -d curl.out
>
> lynx just echos the HTML markup.

man lynx:

       -force_html
              forces the first document to be interpreted as HTML.

--
Thomas E. Dickey <email address hidden>
http://invisible-island.net
ftp://invisible-island.net

Thanks for the tip, Thomas.

I still find the behavior confusing; i.e. that the name is important rather than the content.
  "-dump dumps the formatted output of the default document"
implies the document is HTML, does not mention other possible interpretations or file formats,
and does not mention -force_html or the *.html convention or why there is
a need for -force_html.

If you do not wish to change the behavior, please consider updating the man page
to document the conventions/assumptions that lynx -dump makes.

On Mon, Feb 04, 2013 at 02:41:15PM -0000, David Biesack wrote:
> Thanks for the tip, Thomas.
>
> I still find the behavior confusing; i.e. that the name is important rather than the content.
> "-dump dumps the formatted output of the default document"
> implies the document is HTML, does not mention other possible interpretations or file formats,
> and does not mention -force_html or the *.html convention or why there is
> a need for -force_html.
>
> If you do not wish to change the behavior, please consider updating the man page
> to document the conventions/assumptions that lynx -dump makes.

yes - documentation is a good thing to improve :-)

--
Thomas E. Dickey <email address hidden>
http://invisible-island.net
ftp://invisible-island.net

Thomas Dickey (dickey-his) wrote :

This was addressed in Debian #254603

Thomas Dickey (dickey-his) wrote :

fwiw, that was 4 years ago, and was in the previous (2.8.8) release.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=254603

Changed in lynx (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.