`tidy -utf` treated as `tidy -u -t -f` instead of as `tidy -utf8`, with no error

Bug #1914865 reported by Stelios Parnassidis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tidy-html5 mirror
New
Unknown
tidy-html5 (Ubuntu)
Confirmed
Low
Unassigned

Bug Description

the entry:
 uppercase-tags: no
in user's config file ~/.tidyrc has no effect when running `tidy -utf`:
tag names keep coming out in uppercase on output.

== Test Case ==
$ cat .tidyrc
uppercase-tags: no
$ cat test.html
<h1>heading
<H2>subheading</h3>
<P>text</p>
$ tidy -utf test.html
...
<BODY>
<H1>heading</H1>
<H2>subheading</H2>
<P>text</P>
</BODY>
...

$ lsb_release -a
Description: Ubuntu 20.10
Release: 20.10

$ apt-cache policy tidy
tidy:
  Installed: 2:5.6.0-11
  Candidate: 2:5.6.0-11
  Version table:
 *** 2:5.6.0-11 500
        500 http://ftp.ntua.gr/ubuntu groovy/universe amd64 Packages
        100 /var/lib/dpkg/status

Revision history for this message
Stelios Parnassidis (stelix) wrote :

the use of the '-utf' option turns the 'uppercase-tags' to 'yes' ...

Revision history for this message
Stelios Parnassidis (stelix) wrote :

sorry wrong reporting:

used '-utf' instead of '-utf8' and have the above effect.
still i think some warning about not existing option -utf should
be given ...

Bryce Harrington (bryce)
description: updated
Revision history for this message
Bryce Harrington (bryce) wrote :

This is really confusing behavior on the part of html5-tidy's handling of command line arguments, but probably has to be this way for historical reasons. From the output you can see what's going on:

$ echo "" | tidy > /tmp/test.html
$ tidy -utf /tmp/test.html > /dev/null
HTML Tidy: unknown option: t
HTML Tidy: unknown option: f
...
$ echo $?
0

It's interpreting "-utf" as "-u -t -f"; -t doesn't exist so gets a warning, but -u is a synonym for -upper:

$ tidy -h
 ...
 -upper, -u force tags to upper case

The -f option does exist so it's weird that it gives an 'unknown option' warning for that too, but it does require a writeable filename so the usage is wrong here anyway.

So, it does indeed give warnings on the invalid '-utf' option but they're hard to spot in the output, and notably it does not exit with a non-zero error code, so the error wouldn't be caught within a toolchain or script. This is a legit problem but kind of a confusing corner case so I'm setting priority to Low. This would be best reported and addressed upstream, since there might be users depending on the current behavior and changing it might have unintended consequences.

summary: - uppercase-tags: no in ~/.tidyrc conf has no effect
+ `tidy -utf` treated as `tidy -u -t -f` instead of as `tidy -utf8`, with
+ no error
Changed in tidy-html5 (Ubuntu):
importance: Undecided → Low
Revision history for this message
Bryce Harrington (bryce) wrote :

Stelios, I've forwarded this issue to upstream here:

  https://github.com/htacg/tidy-html5/issues/921

Changed in tidy-html5 (Ubuntu):
status: New → Triaged
Changed in tidy-html5-mirror:
status: Unknown → New
Revision history for this message
Bryce Harrington (bryce) wrote :

Upstream confirms the existing odd behavior is required for historical reasons that is difficult to change at this point.

A bash alias might make sense as a workaround if this is bothersome. For Ubuntu I don't think this is crucial enough to carry a delta versus upstream, that could cause inconsistencies for users familiar with its behavior on other distros.

Changed in tidy-html5 (Ubuntu):
status: Triaged → Won't Fix
Revision history for this message
Bryce Harrington (bryce) wrote :

Reopening since upstream has introduced a patch to address the issue.
We should evaluate including that at least in devel Ubuntu (I'm doubtful it qualifies for SRU).

Changed in tidy-html5 (Ubuntu):
status: Won't Fix → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.