Manpages only recognise HTTP links but not HTTPS links

Bug #1841930 reported by Dan
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Manpage Repository
Confirmed
Medium
Unassigned
w3m
New
Unknown
w3m (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

If you compare the last 3 sections between these 2 manual pages:

- https://manpages.ubuntu.com/manpages/bionic/man1/numfmt.1.html
- https://manpages.ubuntu.com/manpages/disco/man1/numfmt.1.html

The first one has HTTP links for the license, the documentation, and for reporting bugs. While in the second ones, the links use HTTPS.

In the first, it seems that the system is detecting them as links and converting them into HTML <a> elements. While in the second, the links are not detected at all.

Joshua Powers (powersj)
Changed in ubuntu-manpage-repository:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hey, thanks for the report.
It seems it has become a hobby of mine to come by and fix small issue on the manpages until we can find more time to tackle the longer backlog.

In regard to this your report is absolutely correct and the rendered html file represents that. I compare trusty to noble.

T:
Report numfmt bugs to <a href="mailto:<email address hidden>"><email address hidden></a>

N:
GNU coreutils online help: &lt;https://www.gnu.org/software/coreutils/&gt;

Sadly this conversion seems to be in perl and I'm not the biggest perl magician.
What I've found looking through the code comes down to this ...

What I've found is that it extracts the deb file to get the man page.
And then throws it into w3mman like:

COLUMNS=100 /usr/lib/w3m/cgi-bin/w3mman2html.cgi "local=/tmp/testdir/tmpdir/usr/share/man/man1/numfmt.1.gz"

On Xenial that has the expected non https content with a href:

"""
<b>REPORTING</b> <b>BUGS</b>
       GNU coreutils online help: &lt;<a href="http://www.gnu.org/software/coreutils/">http://www.gnu.org/software/coreutils/</a>&gt;
"""

And the same with the coreutils content of noble with https is
<b>REPORTING</b> <b>BUGS</b>
       GNU coreutils online help: &lt;https://www.gnu.org/software/coreutils/&gt;
"""

So this is at least identifying where it falls apart.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Simple reproducer

$ cat test
.TH TEST "1"
.SH NAME
Test
.SH "REPORTING BUGS"
Test http URL: <http://www.gnu.org>
.br
Test https URL: <https://www.gnu.org>

$ COLUMNS=100 /usr/lib/w3m/cgi-bin/w3mman2html.cgi "local=/tmp/testdir/test"
...
       Test http URL: &lt;<a href="http://www.gnu.org">http://www.gnu.org</a>&gt;
       Test https URL: &lt;https://www.gnu.org&gt;
...

TL;DR: We need to find a way to make w3mman2html to do this right

Revision history for this message
Christian Ehrhardt  (paelzer) wrote (last edit ):

I thought I might bluntly need just a new w3m, but even 0.5.3+git20230121-2 (git version as there was no release) in noble behaves the same.

Again I'm not a perl magician, but I see in there:
  s@(http|ftp)://[\w.\-/~]+[\w/]@<a href="$&">$&</a>@g;

So that is trivial isn't it:
This gets it working for my testcase

root@n:~# diff -Naur /usr/lib/w3m/cgi-bin/w3mman2html.cgi.orig /usr/lib/w3m/cgi-bin/w3mman2html.cgi
--- /usr/lib/w3m/cgi-bin/w3mman2html.cgi.orig 2024-01-30 08:08:50.278360949 +0000
+++ /usr/lib/w3m/cgi-bin/w3mman2html.cgi 2024-01-30 08:09:10.542507674 +0000
@@ -162,7 +162,7 @@
     next;
   }

- s@(http|ftp)://[\w.\-/~]+[\w/]@<a href="$&">$&</a>@g;
+ s@(https|http|ftp)://[\w.\-/~]+[\w/]@<a href="$&">$&</a>@g;
   s@\b(mailto:|)(\w[\w.\-]*\@\w[\w.\-]*\.[\w.\-]*\w)@<a href="mailto:$2">$1$2</a>@g;
   s@(\W)(\~?/[\w.][\w.\-/~]*)@$1 . &file_ref($2)@ge;
   s@(include(<\/?[bu]\>|\s)*\&lt;)([\w.\-/]+)@$1 . &include_ref($3)@ge;

This seems too easy, there must be another trap I'm missing ...
Looking through Ubuntu and Debian bugs against w3m ...
I feel it is suspicious (in me overlooking something) that it isn't filed against https://bugs.launchpad.net/ubuntu/+source/w3m or https://bugs.debian.org/cgi-bin/pkgreport.cgi?repeatmerged=yes&src=w3m nor in https://github.com/tats/w3m/issues which you'd assume as https is quite common.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Reported: https://github.com/tats/w3m/issues/292
PR: https://github.com/tats/w3m/pull/293

Let us see if that is the trivial fix it looks like or any more complex ...

Changed in w3m (Ubuntu):
status: New → Confirmed
Changed in w3m:
status: Unknown → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.