one --accept-regex expression negates another

Bug #1937874 reported by Bill Yikes
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
wget (Ubuntu)
New
Undecided
Unassigned

Bug Description

This command should theoretically fetch all PDFs on a page:

$ wget -v -d -r --level 1 --adjust-extension --no-clobber --no-directories\
       --accept-regex 'administrative-orders/.*/administrative-order-matter-'\
       --accept-regex 'administrative-orders.*.pdf'\
       --accept-regex 'administrative-orders.page[^&]*$'\
       --directory-prefix=/tmp\
       'https://www.ncua.gov/regulation-supervision/enforcement-actions/administrative-orders?page=56'

But it fails to grab any of them, giving the output:

---
Deciding whether to enqueue "https://www.ncua.gov/files/administrative-orders/AO14-0241-R4.pdf".
https://www.ncua.gov/files/administrative-orders/AO14-0241-R4.pdf is excluded/not-included through regex.
Decided NOT to load it.
---

That's bogus. The workaround is to remove this option:

--accept-regex 'administrative-orders.page[^&]*$'

But that should not be necessary. Adding an --accept-* clause should never cause another --accept-* clause to become invalidated and it should not shrink the set of fetched files.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.