Hello, I wanted a mirror of the irc logs hosted on https://irclogs.ubuntu.com/ and started the project with:
wget --mirror https://irclogs.ubuntu.com/
This worked okay but was very slow, as there's probably hundreds of thousands of links to traverse.
I switched to wget2 to get the multiple simultaneous connections, and ran with:
wget2 --mirror https://irclogs.ubuntu.com
I assumed that wget2 would try to accomplish the same thing: mirror that site *and only that site*.
What actually happened was that it followed a link on that site to ubuntu.com and downloaded two and a half million files like this:
$ find ubuntu.com/ -ls | head -20
11190888 995121 drwxr-xr-x 48 sarnold sarnold 2417914 Mar 16 01:47 ubuntu.com/
6717440 37 -rw-r--r-- 1 sarnold sarnold 73591 Mar 16 01:23 ubuntu.com/security?q=&package=epiphany&offset=40
6717456 29 -rw-r--r-- 1 sarnold sarnold 73469 Mar 16 01:23 ubuntu.com/security?q=&package=openssh&offset=40
6717468 37 -rw-r--r-- 1 sarnold sarnold 73687 Mar 16 01:23 ubuntu.com/security?q=&package=webkitgtk&offset=0
6717648 29 -rw-r--r-- 1 sarnold sarnold 73527 Mar 16 01:23 ubuntu.com/security?q=&package=openssh&offset=0
6717662 29 -rw-r--r-- 1 sarnold sarnold 73555 Mar 16 01:23 ubuntu.com/security?q=&package=grub2-unsigned&offset=60
6717758 37 -rw-r--r-- 1 sarnold sarnold 73625 Mar 16 01:23 ubuntu.com/security?q=&package=openssh&offset=80
6717786 37 -rw-r--r-- 1 sarnold sarnold 73693 Mar 16 01:23 ubuntu.com/security?q=&package=php8.0&offset=0
6717790 37 -rw-r--r-- 1 sarnold sarnold 73591 Mar 16 01:23 ubuntu.com/security?q=&package=grub2-unsigned&offset=80
6717980 29 -rw-r--r-- 1 sarnold sarnold 73435 Mar 16 01:23 ubuntu.com/security?q=&package=epiphany&offset=80
6717984 37 -rw-r--r-- 1 sarnold sarnold 73589 Mar 16 01:23 ubuntu.com/security?q=&package=openssh&offset=20
6717986 37 -rw-r--r-- 1 sarnold sarnold 73649 Mar 16 01:23 ubuntu.com/security?q=&package=awstats&offset=40
6718000 29 -rw-r--r-- 1 sarnold sarnold 73495 Mar 16 01:23 ubuntu.com/security?q=&package=grub2-unsigned&offset=60
6718034 37 -rw-r--r-- 1 sarnold sarnold 73649 Mar 16 01:23 ubuntu.com/security?q=&package=mozjs60&offset=60
6718176 29 -rw-r--r-- 1 sarnold sarnold 73555 Mar 16 01:23 ubuntu.com/security?q=&package=vlc&offset=0
6718210 37 -rw-r--r-- 1 sarnold sarnold 73629 Mar 16 01:23 ubuntu.com/security?q=&package=vlc&offset=20
6718248 37 -rw-r--r-- 1 sarnold sarnold 73617 Mar 16 01:23 ubuntu.com/security?q=&package=vlc&offset=60
6718266 37 -rw-r--r-- 1 sarnold sarnold 73673 Mar 16 01:23 ubuntu.com/security?q=&package=mozjs60&offset=60
6718292 37 -rw-r--r-- 1 sarnold sarnold 73593 Mar 16 01:23 ubuntu.com/security?q=&package=vlc&offset=40
6718354 29 -rw-r--r-- 1 sarnold sarnold 73545 Mar 16 01:23 ubuntu.com/security?q=&package=vlc&offset=60
This is unexpected and unpleasant.
Thanks
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: wget2 1.99.1-2.1
ProcVersionSignature: Ubuntu 5.4.0-166.183-generic 5.4.252
Uname: Linux 5.4.0-166-generic x86_64
NonfreeKernelModules: lkp_Ubuntu_5_4_0_166_183_generic_101 zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu27.27
Architecture: amd64
CasperMD5CheckResult: skip
Date: Sat Mar 16 01:50:30 2024
SourcePackage: wget2
UpgradeStatus: Upgraded to focal on 2020-01-24 (1512 days ago)