Ubuntu
urlscan package

Activity log for bug #1930437

Date	Who	What changed	Old value	New value	Message
2021-06-01 15:48:20	Bill Yikes	bug			added bug
2021-06-01 16:43:12	Bill Yikes	description	This yields no output: curl -s 'https://www.veridiancu.org' \| sed -ne '/<form/,/<\/form/p' \| urlscan -n Without the sed filter, urlscan works. But then urlscan dumps all URLs in the whole document. It seems urlscan was only designed to work on whole documents. So perhaps this is not a "bug" but rather a feature request. The workaround would normally be to use urlview instead, but urlview has the limitation of only working interactively. Perhaps the fix here is for urlscan to add a --fuzzyhtml option, and use the guts of urlview to do the processing.	This yields no output: curl -s 'https://www.veridiancu.org' \| sed -ne '/<form/,/<\/form/p' \| urlscan -n Without the sed filter, urlscan works. But then urlscan dumps all URLs in the whole document. It seems urlscan was only designed to work on whole documents. So perhaps this is not a "bug" but rather a feature request. The workaround would normally be to use urlview instead, but urlview has the limitation of only working interactively. Perhaps the fix here is for urlscan to add a --fuzzyhtml option, and use the guts of urlview to do the processing. (edit) This workaround works for urlscan: curl -s 'https://www.veridiancu.org' \| python -c 'from bs4 import BeautifulSoup; import sys; print(BeautifulSoup(sys.stdin.read()).form)' \| urlscan -n which might give a clue about what the problem is.