lynx -dump fails if filename is not *.html

Bug #1112568 reported by David Biesack on 2013-02-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lynx (Ubuntu)
Undecided
Unassigned

Bug Description

lynx -dump url is supposed to strip out the HTML markup and format the text,
but it is not working; it is just echoing the output.

The input is simple HTML:

$ echo >| curl.out <<EOF
<html>
<body><p>Some text</p></body>
</html>
EOF

but when I do

    $ lynx -d curl.out

lynx just echos the HTML markup.

It works if I do:

  $ mv curl.out test.html
  $ lynx -d test.html

I.e. it appears the command requires the file name to end in .html

I can use

  $ lynx -dump -stdin < curl.out

as a workaround but the default should work.

(This is in the context of another script which fetches a web resource
via curl, then looks at the file content and does different thing based
on the contents; it is not always html, so that is why my file name is curl.out)

I'm on Ubuntu using:

    $ lynx --version
    Lynx Version 2.8.8dev.9 (12 Jun 2011)
    libwww-FM 2.14, SSL-MM 1.4.1, GNUTLS 2.10.5, ncurses 5.9.20110404(wide)
    Built on linux-gnu Nov 19 2012 15:52:46

On Fri, Feb 01, 2013 at 02:59:16PM -0000, David Biesack wrote:
> Public bug reported:
>
> lynx -dump url is supposed to strip out the HTML markup and format the text,
> but it is not working; it is just echoing the output.
>
> The input is simple HTML:
>
> $ echo >| curl.out <<EOF
> <html>
> <body><p>Some text</p></body>
> </html>
> EOF
>
> but when I do
>
> $ lynx -d curl.out
>
> lynx just echos the HTML markup.

man lynx:

       -force_html
              forces the first document to be interpreted as HTML.

--
Thomas E. Dickey <email address hidden>
http://invisible-island.net
ftp://invisible-island.net

Thanks for the tip, Thomas.

I still find the behavior confusing; i.e. that the name is important rather than the content.
  "-dump dumps the formatted output of the default document"
implies the document is HTML, does not mention other possible interpretations or file formats,
and does not mention -force_html or the *.html convention or why there is
a need for -force_html.

If you do not wish to change the behavior, please consider updating the man page
to document the conventions/assumptions that lynx -dump makes.

On Mon, Feb 04, 2013 at 02:41:15PM -0000, David Biesack wrote:
> Thanks for the tip, Thomas.
>
> I still find the behavior confusing; i.e. that the name is important rather than the content.
> "-dump dumps the formatted output of the default document"
> implies the document is HTML, does not mention other possible interpretations or file formats,
> and does not mention -force_html or the *.html convention or why there is
> a need for -force_html.
>
> If you do not wish to change the behavior, please consider updating the man page
> to document the conventions/assumptions that lynx -dump makes.

yes - documentation is a good thing to improve :-)

--
Thomas E. Dickey <email address hidden>
http://invisible-island.net
ftp://invisible-island.net

Thomas Dickey (dickey-his) wrote :

This was addressed in Debian #254603

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers