View Code? Open in Web Editor
NEW
A web crawler oriented to infosec.
webcrawler's Introduction
- ๐ Hi, Iโm @verovaleros
- ๐ Iโm interested in cybersecurity, threat research, and how to make things better and more efficient.
- ๐ฑ Iโm currently learning how to be better in every I do: datasets, dev, automation, all the things o/
webcrawler's People
Contributors
webcrawler's Issues
Now it's only following full path links and ignoring relative links.
If the connection is lost, the HTTP requests will fail. There are a few ways to handle this:
- Identify the issue and re-add the link to the queue
- Add a retry option to go over the links that failed or threw an error
The format of the date is now
Using a comma for the seconds instead of '.'
The idea is that the crawler will automatically download for instance all pdfs, jpgs, etc.