This is a project template for web scrapers, built on:
And requires a connection to the following backends (presumably on the local machine):
Optionally, the project can also use a database (PostgreSQL recommended), although this can be optionally turned off.
The template includes a RateLimitedClient
which inherits from httpx.AsyncClient
. It is used to ensure that requests are made politely, and can be configured with global rate limits or top-level domain specific rate limits.
The implementation of RateLimitedClient
is inspired by (and partially copied from) this discussion
Global configuration for the project is found in config.py
.
Use pytest
to make and run tests.