wget2 --mirror leaves the specified host

Bug #2058082 reported by Seth Arnold
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
wget2 (Ubuntu)
New
Undecided
Unassigned

Bug Description

Hello, I wanted a mirror of the irc logs hosted on https://irclogs.ubuntu.com/ and started the project with:

wget --mirror https://irclogs.ubuntu.com/

This worked okay but was very slow, as there's probably hundreds of thousands of links to traverse.

I switched to wget2 to get the multiple simultaneous connections, and ran with:

wget2 --mirror https://irclogs.ubuntu.com

I assumed that wget2 would try to accomplish the same thing: mirror that site *and only that site*.

What actually happened was that it followed a link on that site to ubuntu.com and downloaded two and a half million files like this:

$ find ubuntu.com/ -ls | head -20
 11190888 995121 drwxr-xr-x 48 sarnold sarnold 2417914 Mar 16 01:47 ubuntu.com/
  6717440 37 -rw-r--r-- 1 sarnold sarnold 73591 Mar 16 01:23 ubuntu.com/security?q=&package=epiphany&offset=40
  6717456 29 -rw-r--r-- 1 sarnold sarnold 73469 Mar 16 01:23 ubuntu.com/security?q=&package=openssh&offset=40
  6717468 37 -rw-r--r-- 1 sarnold sarnold 73687 Mar 16 01:23 ubuntu.com/security?q=&package=webkitgtk&offset=0
  6717648 29 -rw-r--r-- 1 sarnold sarnold 73527 Mar 16 01:23 ubuntu.com/security?q=&package=openssh&offset=0
  6717662 29 -rw-r--r-- 1 sarnold sarnold 73555 Mar 16 01:23 ubuntu.com/security?q=&package=grub2-unsigned&offset=60
  6717758 37 -rw-r--r-- 1 sarnold sarnold 73625 Mar 16 01:23 ubuntu.com/security?q=&package=openssh&offset=80
  6717786 37 -rw-r--r-- 1 sarnold sarnold 73693 Mar 16 01:23 ubuntu.com/security?q=&package=php8.0&offset=0
  6717790 37 -rw-r--r-- 1 sarnold sarnold 73591 Mar 16 01:23 ubuntu.com/security?q=&package=grub2-unsigned&offset=80
  6717980 29 -rw-r--r-- 1 sarnold sarnold 73435 Mar 16 01:23 ubuntu.com/security?q=&package=epiphany&offset=80
  6717984 37 -rw-r--r-- 1 sarnold sarnold 73589 Mar 16 01:23 ubuntu.com/security?q=&package=openssh&offset=20
  6717986 37 -rw-r--r-- 1 sarnold sarnold 73649 Mar 16 01:23 ubuntu.com/security?q=&package=awstats&offset=40
  6718000 29 -rw-r--r-- 1 sarnold sarnold 73495 Mar 16 01:23 ubuntu.com/security?q=&package=grub2-unsigned&offset=60
  6718034 37 -rw-r--r-- 1 sarnold sarnold 73649 Mar 16 01:23 ubuntu.com/security?q=&package=mozjs60&offset=60
  6718176 29 -rw-r--r-- 1 sarnold sarnold 73555 Mar 16 01:23 ubuntu.com/security?q=&package=vlc&offset=0
  6718210 37 -rw-r--r-- 1 sarnold sarnold 73629 Mar 16 01:23 ubuntu.com/security?q=&package=vlc&offset=20
  6718248 37 -rw-r--r-- 1 sarnold sarnold 73617 Mar 16 01:23 ubuntu.com/security?q=&package=vlc&offset=60
  6718266 37 -rw-r--r-- 1 sarnold sarnold 73673 Mar 16 01:23 ubuntu.com/security?q=&package=mozjs60&offset=60
  6718292 37 -rw-r--r-- 1 sarnold sarnold 73593 Mar 16 01:23 ubuntu.com/security?q=&package=vlc&offset=40
  6718354 29 -rw-r--r-- 1 sarnold sarnold 73545 Mar 16 01:23 ubuntu.com/security?q=&package=vlc&offset=60

This is unexpected and unpleasant.

Thanks

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: wget2 1.99.1-2.1
ProcVersionSignature: Ubuntu 5.4.0-166.183-generic 5.4.252
Uname: Linux 5.4.0-166-generic x86_64
NonfreeKernelModules: lkp_Ubuntu_5_4_0_166_183_generic_101 zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu27.27
Architecture: amd64
CasperMD5CheckResult: skip
Date: Sat Mar 16 01:50:30 2024
SourcePackage: wget2
UpgradeStatus: Upgraded to focal on 2020-01-24 (1512 days ago)

Revision history for this message
Seth Arnold (seth-arnold) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.