wget using "if-modified-since" is not idempotent and corrupts downloaded copy of website on second use

Bug #1618288 reported by Carl Hauser
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
wget (Ubuntu)
New
Undecided
Unassigned

Bug Description

I use wget to copy a web site from one server to another, adjusting file suffixes and paths.

Since updating to 16.04 LTS from 14.04 the command that I used previously has begun corrupting the destination site on second and subsequent invocations.

The options relevant to the problem seem to be -N (use timestamping), -k (convert links) and -E (adjust extensions). The problem arises with linked files whose names do not end in .html. On the first invocation everything is good: file foo.txt is downloaded and linked as foo.txt. On the second invocation the wget log (option -v) suggests that it has examined foo.txt on the server, but then it reports "File '<copylocation>/foo.txt.html' not modified on server. Omitting download." and then it changes the link in the referring file to foo.txt.html.

I think this is a bug. Do others have an opinion?

Workaround: include the option "--no-if-modified-since" which seems to restore the old, correct behavior.

Thanks.

P.S. The full command that misbehaves is: wget -nH -r -E -k -N -x -l inf -P <destination for copy> "http://<source web site>"

affects: ubuntu → wget (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.