lynx -dump fails if filename is not *.html
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lynx (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
lynx -dump url is supposed to strip out the HTML markup and format the text,
but it is not working; it is just echoing the output.
The input is simple HTML:
$ echo >| curl.out <<EOF
<html>
<body><p>Some text</p></body>
</html>
EOF
but when I do
$ lynx -d curl.out
lynx just echos the HTML markup.
It works if I do:
$ mv curl.out test.html
$ lynx -d test.html
I.e. it appears the command requires the file name to end in .html
I can use
$ lynx -dump -stdin < curl.out
as a workaround but the default should work.
(This is in the context of another script which fetches a web resource
via curl, then looks at the file content and does different thing based
on the contents; it is not always html, so that is why my file name is curl.out)
I'm on Ubuntu using:
$ lynx --version
Lynx Version 2.8.8dev.9 (12 Jun 2011)
libwww-FM 2.14, SSL-MM 1.4.1, GNUTLS 2.10.5, ncurses 5.9.20110404(wide)
Built on linux-gnu Nov 19 2012 15:52:46
On Fri, Feb 01, 2013 at 02:59:16PM -0000, David Biesack wrote:
> Public bug reported:
>
> lynx -dump url is supposed to strip out the HTML markup and format the text,
> but it is not working; it is just echoing the output.
>
> The input is simple HTML:
>
> $ echo >| curl.out <<EOF
> <html>
> <body><p>Some text</p></body>
> </html>
> EOF
>
> but when I do
>
> $ lynx -d curl.out
>
> lynx just echos the HTML markup.
man lynx:
-force_html
forces the first document to be interpreted as HTML.
-- invisible- island. net -island. net
Thomas E. Dickey <email address hidden>
http://
ftp://invisible