Comment 10 for bug 2029930

Revision history for this message
halfgaar (wiebe-halfgaar) wrote :

I have just been able to reproduce it by running this on the same server the website example.org is hosted:

cd /tmp
while true; do rm -rf www.example.org/ && wget --mirror --page-requisites https://www.example.org 2>&1 | grep -F GB/s; done

The 'grep GB/s' makes a point of showing the improbable speeds. It's not unusual to see several dozens or hundreds of GB/s there, like this:

2023-11-19 19:33:57 (116 GB/s) - ‘www.example.org/wp-json/oembed/1.0/embed?url=https:%2F%2Fwww.example.org%2F2023%2F06%2F05%2Fredacted’ saved [2362/2362]

That's 2 kB, downloaded at 116 GB/s. Obviously a timer issue. This illustrates it's not hard to occasionally hit some super high TB/s value, which would trigger the crash.

With the PPA, running this grepping on TB/s, it took about 30 minutes, but eventually:

2023-11-19 19:34:16 (4.00 TB/s) - ‘www.example.org/wp-json/oembed/1.0/embed?url=https:%2F%2Fwww.example.org%2F2023%2F08%2F19%2Fredacted%2F’ saved [2201/2201]

This is all crawling a Wordpress site. There seems to be a higher chance of crawling a dynamic site vs static files, possibly because it doesn't start counting bytes until it has seen the first, and dynamic sites tend to only operate from memory at that point. That's probably also the reason it tends to happen on 2 kB-ish files, because it fits in one or two TCP packets.

Probably a simple PHP scripts that prints a few kBytes is enough.